Speech Recognition - Ready for Prime Time?
by Jarred Walton on April 21, 2006 9:00 AM EST- Posted in
- Smartphones
- Mobile
Health Considerations
There's at least one more area that can be a direct benefit to many people -- I know it has certainly helped me. Typing on the keyboard for many hours every day is not the healthiest of practices. Every keyboard on the market today carries a warning about repetitive stress injuries (RSI), and with good reason. Not everyone will have problems, and not everyone that has problems will experience the same degree of discomfort. However, the more you type and the older you get, the greater your chance for developing RSI from computer use. Needless to say, I am one of the many people in the world who has developed carpal tunnel syndrome (CTS).
There are many things you can do to try and combat carpal tunnel problems. Some people feel that ergonomic keyboards will help, and at least for me I found it to be more comfortable than a regular keyboard. Getting a better chair and desk will also help -- you want a chair and desk that will put your wrists and hands in the proper position in order to minimize strain; if you're not comfortable sitting at your computer, you should probably invest in a new chair at the very least.
Even with modifications to your work area, though, there's a reasonable chance that you'll still have difficulty. You might consider surgery, but while that will generally help 70% of people initially, many find that discomfort returns within a couple years. The simple fact of the matter is that the best way to avoid RSI complications is to eliminate the repetitive activity that's causing the problem in the first place. That means that if typing on a keyboard is giving you CTS, the best way to alleviate the problem is to not type on a keyboard anymore. That makes it rather difficult to write for a living, as you can imagine.
Of course, it usually isn't necessary to completely stop an activity that's causing RSI. The phrase itself gives you an idea of how to avoid difficulties: avoid excessive repetition. Typing 20 pages of text per day on a computer would probably cause anyone to get CTS eventually. Typing 10 pages per day would probably cause problems for many people, but not for everyone. Typing five pages per day would likely only affect a smaller portion of the population. Finally, if you could cut it down to one or two pages per day, most people would be fine. That brings us to the present topic: speech recognition. Used properly, speech recognition has the potential to eliminate a large portion of your typing, among other things.
Languages are complex enough that learning a new language is always difficult. We spend years growing up in an environment, learning the language, learning the rules, developing our own accent, etc. No two people in the world are going to sound exactly alike, and it goes without saying that everyone makes periodic mistakes in grammar and pronunciation while speaking. Programming a computer so that it understands everything that we say, corrects the mistakes, and gets all the grammar correct as well is a daunting task at best. Nevertheless, speech recognition has so many potential benefits that it is considered one of the Holy Grail landmarks that we want to achieve, and research on the problem has been in progress for decades.
As time has passed, computers have gotten faster and the algorithms have improved, and we're at the point now where real-time speech recognition is actually feasible. Mistakes will still be made, and dealing with different accents and/or speech impediments only serves to make things more difficult, but for many people it is now possible to get accuracy higher than 90%. That isn't that great, as it means one or two mistakes per sentence, but it's a good place to start. I've got a couple pieces of software that purport to achieve higher than 90% accuracy rates after training, so that will allow us to perform some real world benchmarks.
There's at least one more area that can be a direct benefit to many people -- I know it has certainly helped me. Typing on the keyboard for many hours every day is not the healthiest of practices. Every keyboard on the market today carries a warning about repetitive stress injuries (RSI), and with good reason. Not everyone will have problems, and not everyone that has problems will experience the same degree of discomfort. However, the more you type and the older you get, the greater your chance for developing RSI from computer use. Needless to say, I am one of the many people in the world who has developed carpal tunnel syndrome (CTS).
There are many things you can do to try and combat carpal tunnel problems. Some people feel that ergonomic keyboards will help, and at least for me I found it to be more comfortable than a regular keyboard. Getting a better chair and desk will also help -- you want a chair and desk that will put your wrists and hands in the proper position in order to minimize strain; if you're not comfortable sitting at your computer, you should probably invest in a new chair at the very least.
Even with modifications to your work area, though, there's a reasonable chance that you'll still have difficulty. You might consider surgery, but while that will generally help 70% of people initially, many find that discomfort returns within a couple years. The simple fact of the matter is that the best way to avoid RSI complications is to eliminate the repetitive activity that's causing the problem in the first place. That means that if typing on a keyboard is giving you CTS, the best way to alleviate the problem is to not type on a keyboard anymore. That makes it rather difficult to write for a living, as you can imagine.
Of course, it usually isn't necessary to completely stop an activity that's causing RSI. The phrase itself gives you an idea of how to avoid difficulties: avoid excessive repetition. Typing 20 pages of text per day on a computer would probably cause anyone to get CTS eventually. Typing 10 pages per day would probably cause problems for many people, but not for everyone. Typing five pages per day would likely only affect a smaller portion of the population. Finally, if you could cut it down to one or two pages per day, most people would be fine. That brings us to the present topic: speech recognition. Used properly, speech recognition has the potential to eliminate a large portion of your typing, among other things.
Languages are complex enough that learning a new language is always difficult. We spend years growing up in an environment, learning the language, learning the rules, developing our own accent, etc. No two people in the world are going to sound exactly alike, and it goes without saying that everyone makes periodic mistakes in grammar and pronunciation while speaking. Programming a computer so that it understands everything that we say, corrects the mistakes, and gets all the grammar correct as well is a daunting task at best. Nevertheless, speech recognition has so many potential benefits that it is considered one of the Holy Grail landmarks that we want to achieve, and research on the problem has been in progress for decades.
As time has passed, computers have gotten faster and the algorithms have improved, and we're at the point now where real-time speech recognition is actually feasible. Mistakes will still be made, and dealing with different accents and/or speech impediments only serves to make things more difficult, but for many people it is now possible to get accuracy higher than 90%. That isn't that great, as it means one or two mistakes per sentence, but it's a good place to start. I've got a couple pieces of software that purport to achieve higher than 90% accuracy rates after training, so that will allow us to perform some real world benchmarks.
38 Comments
View All Comments
Googer - Saturday, April 22, 2006 - link
BMW 7 series Speech recognition is about 50-75% accurate (my guess) and some users have more luck with it than others.Googer - Friday, April 21, 2006 - link
I think you should re-benchmark these on a system that is not overclocked. Overclocking may have contibuted to errouneous test results. It is possible that some of the benchmarks could have been better on a normal system. Also I am surprised this was not tested on a Intel Syststem. Prehaps one of the programs may benefit from the Netburst Architeture with or with out dual core.Also I would love to download the Dication and Normal Voice wav files, so I can understand the differance between them. Thanks for the article, it came in perfect time; Someone who is handicaped was asking me about this last night.
JarredWalton - Friday, April 21, 2006 - link
I'll see about putting up some MP3s of the wave files -- of course, that will open the door for all of you to make fun of how I speak. LOLIn case this wasn't entirely clear in article, this was all done on my system that I use every day for work. It's overclocked, and it's been that way for six months. I run stress tests (Folding at Home -- on both cores) all the time. I would be very surprised if the overclock has done anything to affect accuracy, especially considering that I did run some tests on a couple other systems that were not overclocked, and basically removed them from this article because they would have simply taken more time to put in the article, and they didn't give me any new information.
It's pretty obvious that neither of these algorithms benefit from multiple processing cores -- HyperThreading, dual core, SMP, whatever. I also wasn't sure how much interest there would be from people in this topic, but if a lot of people want to know how this runs on Intel systems I could go back and look at one. One thing worth noting is that SysMark 2004 does include Dragon NaturallySpeaking version 6.5 as one of the tests. Of course, the results are buried in the composite scores.
JarredWalton - Friday, April 21, 2006 - link
MP3 links available:http://www.anandtech.com/multimedia/showdoc.aspx?i...">http://www.anandtech.com/multimedia/showdoc.aspx?i...
Note that DNS only uses WAV files (AFAICT), but uploading 45MB WAV files seems pointless. Convert them to WAVs if you want to try them with Dragon.
Googer - Saturday, April 22, 2006 - link
Excellant job on the dictation/wav files, you are a very good reader and have a nice clear and concice voice. ;ThumbsUP)stelleg151 - Friday, April 21, 2006 - link
Cool article. I hope that voice recognition continues to improve, for I think it could be incredibly useful for areas like HTPC, or as you said messenging while doing other things (gaming).Zerhyn - Friday, April 21, 2006 - link
Have you ever tried out speech recognition and been underwhelmed? To you yearn to play the role of Scotty and call out..?
PrinceGaz - Friday, April 21, 2006 - link
Yes, that was the first thing I noticed before I even started reading the article. Maybe they used speech-recognition software to enter that.I think they should have an editor (or at least let another contributor read what others have written) who has to approve an article before it goes live as the current number of tyops is unforgiveable ;)
JarredWalton - Friday, April 21, 2006 - link
I'm doing my best to catch typos before anything goes live, but after being up all night trying to finish off this article, I went to post and realized I didn't have a title or intro. So, I put one in using Dragon, but my diction goes to put when I'm tired, as does my eyesight and proofing ability. One typo in a 44 word intro (I didn't proof/edit it at all) isn't too bad for the software. Bad for me? Maybe, but mistakes do happpen. :)johnsonx - Friday, April 21, 2006 - link
One nice thing about Dragon, despite the high CPU utilization shown in the article, is that it will run quite happily with very lowly systems. I have a customer who uses it all day long on PentiumIII-850's with only 512Mb RAM (the max for those particular systems). The heaviest user there recently upgraded to a low-end Sempron64 with a gig of RAM, and he says the overall system is far more responsive (of course), but Dragon's operation isn't radically better; it worked great on the PIII, and works great now.