Jump to content

Assistive Technology in Education/Speech Recognition Software

From Wikibooks, open books for an open world

Introduction

[edit | edit source]

The following information is a compilation of material found on Speech recognition software across the internet. After an introduction to what speech-to-text is, and what type of software is available, educational applications for their use are provided.

Definition

[edit | edit source]
This is a typical computer microphone"

Speech recognition which is often referred to as automatic speech recognition or computer speech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to speech recognition where the recognition system is trained to a particular speaker. This is the case for most desktop recognition software. Therefore, for most desktop recognition software there is an element of speaker recognition, which attempts to identify the person speaking which helps the software recognize what is being said. Speech recognition is a broad term which means it can recognize almost anybody's speech. An example of this is a call-center system designed to recognize many voices. Voice recognition is a system trained to a particular user, where it recognizes their speech based on their unique vocal sound.[1]

Speech recognition applications include voice dialing such as the type that is built into many cell phones, call routing such as the type that you experience when you call a call-center[2], domotic appliance control and content-based spoken audio search such as the type used by the government to pick up key words spoken on a wire tap, simple data entry which is like the type used in phone surveys[2], preparation of structured documents such as a medical report, speech-to-text processing such as a type you would use to write a letter or email[3], and in aircraft cockpits in terms of Direct Voice Input).[1][4]

History

[edit | edit source]

The first speech recognizer appeared in 1952 and consisted of a device for the recognition of single spoken digits [5] [6]Another early device was the IBM Shoebox, exhibited at the 1964 New York World's Fair.[1]

One of the most notable domains for the commercial application of speech recognition in the United States has been health care and in particular the work of the medical transcriptionist (MT). According to industry experts, at its inception, speech recognition (SR) was sold as a way to completely eliminate transcription rather than make the transcription process more efficient, hence it was not accepted. It was also the case that SR at that time was often technically deficient. Additionally, to be used effectively, it required changes to the ways physicians worked and documented clinical encounters, which many if not all were reluctant to do. The biggest limitation to speech recognition automating transcription, however, is seen as the software. The nature of narrative dictation is highly interpretive and often requires judgment that may be provided by a real human but not yet by an automated system. Another limitation has been the extensive amount of time required by the user and/or system provider to train the software.[1]

A distinction in ASR is often made between "artificial syntax systems" which are usually domain-specific and "natural language processing" which is usually language-specific. Each of these types of application presents its own particular goals and challenges.[1]

Software

[edit | edit source]

Microsoft

[edit | edit source]

Windows Speech Recognition is a speech recognition application included in Windows Vista and more recently, Windows 7.[7]

Features

[edit | edit source]
Vista Logo

Windows Speech Recognition allows the user to control the computer by giving specific voice commands. The program can also be used for the dictation of text so that the user can control their Vista or Windows 7 computer.[7][6]

Applications that don't present obvious "commands" can still be controlled by asking the system to overlay numbers on top of interface elements; the number can subsequently be spoken to activate that function. Programs needing mouse clicks in arbitrary locations can also be controlled through speech; when asked to do so, a "mousegrid" of nine zones is displayed, with numbers inside each. The user speaks the number, and another grid of nine zones is placed inside the chosen zone. This continues until the interface element to be clicked is within the chosen zone.[7]

Windows Speech Recognition has a fairly high recognition accuracy and provides a set of commands that assists in dictation.[citation needed] A brief speech-driven tutorial is included to help familiarize a user with speech recognition commands. Training could also be completed to improve the accuracy of speech recognition.[7]

Currently, the application supports several languages, including English (U.S. and British), Spanish, German, French, Japanese and Chinese (traditional and simplified).[8] Support for additional languages is[7]

History

[edit | edit source]

In 1993, Microsoft hired Xuedong Huang from Carnegie Mellon University to lead its speech efforts. Microsoft has been involved in research on speech recognition and text to speech.[9] The company's research eventually led to the development of the Speech API (SAPI).[7]

Speech recognition technology has been used in some of Microsoft's products, including Microsoft Dictation (a research prototype that ran on Windows 9x). It was also included in Office XP, Office 2003[10], Microsoft Plus! for Windows XP, Windows XP Tablet PC Edition, and Windows Mobile (as Microsoft Voice Command)[11]. However, prior to Windows Vista, speech recognition was not mainstream. In response, Windows Speech Recognition was bundled with Windows Vista and released in 2006, making the operating system the first mainstream version of Microsoft Windows to offer fully-integrated support for speech recognition.[7]

Technical details

[edit | edit source]

Windows Speech Recognition relies on Microsoft SAPI version 5.3 (included in Windows Vista) to function[9]. The application also utilizes Microsoft Speech Recognizer 8.0 for Windows as its speech profile engine.[7]

Apple

[edit | edit source]
Apple Computer logo

MacSpeech was a company that developed speech recognition software for Apple Macintosh computers. In 2008, its previous flagship product, iListen, was replaced by Dictate, which is now built around Nuance's licensed Dragon NaturallySpeaking engine. MacSpeech was established in 1996 by current CEO Andrew Taylor.[12] MacSpeech was the only company that developed voice dictation systems for the Macintosh. Its full product line was devoted to speech recognition and dictation.[13]

The first commercial voice dictation product for Mac OS X was IBM's ViaVoice , but ScanSoft, the company that had exclusive global distribution rights to ViaVoice, merged with Nuance and stopped developing ViaVoice for Macintosh. (The first dictation software for Mac OS 9 was Articulate System's PowerSecretary.) [14][13]

At the 2008 MacWorld Expo, MacSpeech's newly-revealed Dictate was a winner of the MacWorld 2008 Best of Show award.[15][13]

Dragon NaturallySpeaking

[edit | edit source]

Dragon NaturallySpeaking is a speech recognition software package developed and sold by Nuance Communications for Windows personal computers. The latest release of Dragon Naturally Speaking is version 11.0, released in August 2010. As with the previous version (10.1), this package supports both 32-bit and 64-bit editions of Windows XP, Vista and 7.[16][17][18]The four editions to this latest release are Home, Premium (formerly known as "Preferred"), Professional, and Legal. Nuance Communications claim these newest versions are faster and 15% more accurate "right out of the box" as compared with Dragon 10. [2]

Features

[edit | edit source]

NaturallySpeaking utilizes a minimal user interface. As an example, dictated words appear in a floating tooltip as they are spoken, and when the speaker pauses, the program transcribes the words into the active window at the location of the cursor. The software has three primary areas of functionality: dictation, text-to-speech and command input. Not only is the user able to dictate and have their speech transcribed as written text, or have a document synthesized as an audio stream, but they are also able to issue commands that are recognized as such by the programme. In addition, voice profiles can be accessed through different computers in a networked environment, although the audio hardware and configuration must be identical on both machines.[19][18]

History

[edit | edit source]

Dr. James and Janet Baker founded Dragon Systems in 1982 to release products centred around their voice recognition prototype.[20] DragonDictate was first released for DOS, and utilized hidden Markov models, which is a method of using statistics for the recognition of speech. At the time, the hardware was insufficiently powerful to address the problem of word segmentation, and DragonDictate was unable to determine the boundaries of words during continuous speech input. Users were forced to pronounce one word at a time, each clearly separated by a small pause. DragonDictate was based on a trigram model, and is known as a discrete speech recognition engine.[21] [22][18]

Dragon Systems released NaturallySpeaking 1.0 as their first continuous dictation product in 1997.[23] The company was then purchased in June 2000 by Lernout & Hauspie, a corporation that had been involved in financial scandals as reported by the New York Times.[24] Following the bankruptcy of Lernout & Hauspie, the rights to the Dragon product line were acquired by ScanSoft. In 2005, ScanSoft launched a de facto acquisition of Nuance Communications, and rebranded itself as Nuance.[25][26][18]

Other Software

[edit | edit source]

Other software can be found on Wikipedia by clicking here

Educational Applications

[edit | edit source]

Dragon Speech Recognition Premium (formerly known as "Preferred) and Professional solutions are compliant with the Americans with Disabilities Act (ADA) Section 508 requirements. These products are among others that Nuance Communications offers through Education licensing at "academic pricing" to qualifying educational individuals and educational establishments (http://www.nuance.com/for-business/by-industry/education/education-validation/eligibility_definitions/index.htm). Further, Nuance offers a variety of software licensing programs such as their Open License Program (OLP) for volume needs. The value is a cost-effectiveness over desktop products through the efficiencies of a business to business relationship.

Speech Recognition software is widely available to everyone at a fairly reasonable price. Therefore, teachers need to look at how they can use this type of software to enhance their curriculum. There are number ways that the use of this type of software can improve the education of students. Some of the ways are listed here.

Helping Students with Physical Disabilities

[edit | edit source]

Helping students with physical disabilities be successful in the classroom can be a challenge to any teacher. Finding ways for these students to do the same activities as other students can take a lot time, and requires that teachers fully understand the limitations of their students. What can be most challenging is keeping in mind that these students still have the same or better mental abilities then the other students in a class.

The use of speech recognition software allows students that have little or no motor skills in their arms and hands to be able to produce typed reports, manage software, and perform research with a computer, just like other non-disabled students. [27]

Helping Students with Learning Disabilities

[edit | edit source]
Students writing a paper

Student with learning disabilities struggle to learn in a variety of ways. Some have problems reading and writing. Although speech-to-text software does not help these students improve their ability to spell, the software allows students to write without worrying about spelling. Having students place their ideas down in writing can help teachers to work with students to improve their grammar. Improving a students' grammar in their writing, helps the student fix the grammar in his/her speech as well. [27][28]

For many students with learning disabilities that can spell, the process of typing can be frustrating. Therefore speech-to-text can be helpful in helping them speed up their writing process. When students that have learning disabilities have attention span issues, sitting down to type a paper can be very difficult, therefore these speech-to-text software can help these students push their writing to new levels. [27][28]

Read Instruction

[edit | edit source]

Advances in Speech Recognition Software have created an environment in which students can read to a computer and the computer can evaluate their reading abilities. Studies on this individual learning technique show potential, but the software is still not up to the level it needs to be for complete instruction. Current software requires supervision from teachers that can assist students that are having problems. This technology is not new, and it has improved but many of the problems that it faced in the past are still being overcome. [29].[30] The advantage to this type of use of the technology is that it gives teachers the ability to differentiate the instruction to each student, and it allows students that have advanced reading levels not to be held back by students that have learning disabilities. [31]

Language Learning

[edit | edit source]

One of the most creative uses for Speech Recognition Software is to help language learning. Language software is available that can check a student's ability to speak languages. For example, a student learning Spanish can be asked state specific words in Spanish. The computer can then evaluate their ability to speak the words properly. Other ways this software can work is in translate a passage from their first language to Spanish. In this case students can be asked to silently read the passage in their native language then stating to the computer what the phrase would be in Spanish. Finally, the software can speak to the student in Spanish and then evaluate the students response to the original statement to determine if it was correct. In all of these cases, each question would have had to have been programmed into the computer. Although in the future, computers may have the ability to evaluate a students response and then respond back with its own customized response.[32]

Future Design

[edit | edit source]

The future of speech recognition is one numerous invitations. Although the technology has been around for over 50 years, there are hundreds potential applications and improvements to the software that will take place. Some items that are on the horizon include universal translators that can be used to help with language barriers[33], improved reading instruction software, one-on-one teaching tools with better interaction, and voice activated research tools[34]

Some of the most dramatic possibilities for speech recognition are that someday computers may be able to interpret our comments and respond back with their own ideas. At this point speech recognition may become speech understanding. [3] Imagine saying to your alarm clock wake me up at 6:00 am, and have it say back to you, I just connected to your datebook in your cell phone and see that your first appointment tomorrow is at 7:00 am, 30 minutes from here. Do you want get up at 5:30 instead? Although this sounds like science fiction, the possibility for these communications and interpretations are here today, they just need to be merged together.

An example of this can be seen in this video where a paper of glasses can be turned into a audio and video recorder and as the author points out eventually these will be able to connect to our smart phones so that we can communicate with our glasses so that an image of our computer screen can appear on the inside of ours glasses and we then can manipulate our desktop environment displayed our glasses through voice commands creating a completely hands free, computer anywhere.[35][36]

References

[edit | edit source]
  1. a b c d e http://en.wikipedia.org/wiki/Speech_recognition
  2. a b http://www.lumenvox.com/company/edu/
  3. a b http://electronics.howstuffworks.com/gadgets/high-tech-gadgets/speech-recognition.htm
  4. http://cslu.cse.ogi.edu/HLTsurvey/ch1node4.html
  5. Davies , K.H., Biddulph, R. and Balashek, S. (1952) Automatic Speech Recognition of Spoken Digits, J. Acoust. Soc. Am. 24(6) pp.637 - 642
  6. a b http://www.microsoft.com/windowsxp/using/setup/expert/moskowitz_02september23.mspx
  7. a b c d e f g h http://en.wikipedia.org/wiki/Windows_Speech_Recognition
  8. Windows Speech Recognition in Windows Vista
  9. a b Talking Windows: Exploring New Speech Recognition And Synthesis APIs In Windows Vista
  10. Using speech recognition for the first time in Office - Help and How-to - Microsoft Office Online
  11. Speech Recognition for the Pocket PC :: May 2002
  12. MacSpeech - Speech Recognition Solutions for Mac OS - The MacSpeech Story
  13. a b c http://en.wikipedia.org/wiki/MacSpeech
  14. [1]
  15. Macworld | Editors' Notes | Macworld Expo Best of Show award winners
  16. "Nuance product support for Microsoft Windows Vista". Retrieved 2009-12-15.
  17. "Nuance product support for Microsoft Windows 7". {{cite web}}: Cite has empty unknown parameter: |1= (help)
  18. a b c d http://en.wikipedia.org/wiki/Dragon_NaturallySpeaking
  19. http://en.wikipedia.org/wiki/Dragon_NaturallySpeaking
  20. "Dragon Systems history". Retrieved 2010-02-03.
  21. "DragonDictate product information". Retrieved 2010-02-03.
  22. http://en.wikipedia.org/wiki/Dragon_NaturallySpeaking
  23. "Dragon NaturallySpeaking 1.0 released". Retrieved 2010-02-03.
  24. "Dragon Systems purchased by Lernout & Hauspie". New York Times. 2001-05-07. Retrieved 2010-02-03.
  25. "ScanSoft and Nuance to Merge". 2005-05-09. Retrieved 2010-02-03.
  26. http://en.wikipedia.org/wiki/Dragon_NaturallySpeaking
  27. a b c http://www.rehabtool.com/forum/discussions/97.html
  28. a b http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6VCJ-3XMGN91-3&_user=10&_coverDate=08%2F31%2F1999&_rdoc=1&_fmt=high&_orig=search&_sort=d&_docanchor=&view=c&_searchStrId=1215028011&_rerunOrigin=google&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=fc3ce5bd58895ec5faac3b22472080a2
  29. http://www.eric.ed.gov/ERICWebPortal/custom/portlets/recordDetails/detailmini.jsp?_nfpb=true&_&ERICExtSearch_SearchValue_0=ED292059&ERICExtSearch_SearchType_0=no&accno=ED292059
  30. http://www.eric.ed.gov/ERICWebPortal/custom/portlets/recordDetails/detailmini.jsp?_nfpb=true&_&ERICExtSearch_SearchValue_0=EJ738601&ERICExtSearch_SearchType_0=no&accno=EJ738601
  31. http://www.neirtec.org/reading_report/report.htm
  32. http://www.speechtechmag.com/Articles/Column/The-Human-Factor/Speech-Recognition-in-Education-Unexploited-Opportunities-29807.aspx
  33. http://ebiquity.umbc.edu/blogger/2006/11/01/darpa-speech-to-speech-research/
  34. http://www.worldthinktank.net/art128.shtml
  35. http://www.feld.com/wp/archives/2010/01/speech-recognition-is-only-part-of-the-future.html
  36. http://my.advisor.com/doc/05918
[edit | edit source]