Moving at the Speed of Creativity by Wesley Fryer

Converting text to and from speech for accessibility and convenience

I received a question from a technology director this week regarding different options available for text-to-speech conversion software and web-applications. I’m posting what I know here in hopes that others will be able to chime in with more options.

The ability to convert speech-to-text and text-to-speech is an amazing technology development with multiple applications. The software solutions I’ve heard the most about when it comes to speech to text functionality is Dragon NaturallySpeaking and IBM ViaVoice. I have not used either product personally, but have heard it takes some time to “train” the software to the nuances of your voice. The software engine of Dragon Naturallyspeaking was integrated into the Macintosh application MacSpeech, which won the 2008 MacWorld “best in show award in January and reportedly requires less “training time” to use. Applications like oral history projects which require the transcription of spoken interviews (The Vietnam Center and Archive at Texas Tech University does this on a large scale) have not yet been able to reliably use software like Dragon for transcriptions with different individuals (from what I understand) because each person’s voice is quite different and each software program requires “training” for each voice. Having full-text transcriptions of audio files, including podcasts, is VERY important from an accessibility / 508 perspective. If it was possible, I would love to provide people with full-text transcriptions of my spoken podcasts, automatically generated by computer software. Some podcast producers, like “The Mighty Mommy,” DO provide text transcriptions but that is possible because the podcast is READ from an original script. That’s not the case with my podcasts, so at this point they are NOT accessible to people who are hearing impaired. 🙁 Our accessibility committee for the K-12 Online Conference continues to explore options for audio-to-text conversions as well as multi-language subtitling. dotSUB is the main project of which I’m aware which has a distributed model for generating multi-lingual translations of web-posted videos. It would be great if these functionalities were fully automated and 100% accurate using web-based software solutions, but at this point I don’t think “we have the technology” yet to do those things affordably and reliably.

The above links represent the extent of my current knowledge on speech to text technologies. Text TO SPEECH technologies have multiple options available as well, and I’ve personally worked with some of these a bit more.

When it comes to converting text to speech, Talkr is the main website I’m familiar with (and have integrated here on my blog) which creates (for free) an audio podcast of text entries on any blog. You can see these entries at the bottom of each blog post, and a subscription link to this text-to-voice converted podcast channel is available as a link in my blog’s right sidebar titled, “listen to posts via talkr.com.” Talkr is free and works, but the voice quality and pacing doesn’t sound quite as natural as some other alternatives I’ve heard. Odiogo is another web-based service which converts text into audio files. it is marketed as a way to:

Turn readers into listeners, and transform your blog into a high quality, ad-supporting broadcast that can vastly expand your audience reach!

I’m NOT interested in ad-supported podcasting at this point, but it’s good to know other alternatives to Talkr are available. If you know of other website services which provide this functionality, please leave a comment to this post and share the link.

AT&T Natural Voices is a technology developed by AT&T labs to convert text into naturally sounding audio files, and is used in Wizzard Software which is used to develop applications for IBM ViaVoice software. The demo site for AT&T Natural Voices lets visitors enter a text string of up to 300 characters and immediately create a WAV audio file of the text. Use of the website is restricted in some ways, but the documentation indicates “A class project is probably OK” to use the site. If you are going to ask a large group of students to use the site for a project, you might contact the development group and ask for explicit permission. Users can choose between multiple versions of English, Spanish, French and German and select different voices, both male and female for text to speech conversion:

Macintosh OS X has built-in accessibility features to convert text to speech, but does not provide a way to convert that spoken text into an audio file in a seamless way. I have experimented with recording converted text-audio using Audio Hijack Pro, but this is more of a workaround solution to this functional need. WireTap Studio is another Macintosh application which provides this functionality, but is still a workaround rather than a straightforward, seamless way to convert text into speech and record the results as an audio file. I haven’t experimented with text-to-audio software options on the Windows platform.

Several free add-ons (extensions) for the FireFox web browser are available which convert text into speech on both Windows and Macintosh computers, but I have not played with any of these either to date. Speak It and CLiCk, Speak are two examples. CLiCk, Speak is open source and supports multiple languages. Both are free.

That basically exhausts my current knowledge about speech to text and text-to-speech software and web-based solutions. What have I missed or left out? The applications of these software tools for learners are wide ranging and only limited by our imaginations. I think continued development of text and voice conversion technologies will continue to be exciting and worth following in the years ahead!

The Center for Applied Special Technology (CAST) has the slogan “universal design for learning,” and has an impressive array of products focused on enhancing accessibility for learners, including the CAST Universal Design for Learning (UDL) Book Builder. This web-based environment permits teachers (or learners of any age or category) to create digital books for web-based sharing FOR FREE. These digital books include voice synthesized characters which provide discussion prompts for understanding in the books. Explore the model books on the website to get a better idea what this looks like in action. These digital books look wonderful for early childhood and primary-age classroom teachers and students. Currently 77 different digital books are available on the website in the digital book library. A subscription or free account is NOT required to utilize these books, but if you want to create your own books you DO need to register for a free account.

These tools can be used in tandem as well. The image below shows a UDL digital book, being used with CLiCk, Speak software. These are GREAT web resources and applications for reading teachers to use with electronic whiteboards!

Technorati Tags:
education, reading, speech, software, text, convert, conversion, accessibility

Posted

March 7, 2008

edtech, literacy

Wesley Fryer

Tags:

Comments

4 responses to “Converting text to and from speech for accessibility and convenience”

Wesley Fryer

March 7, 2008

One other tool I learned about from Liz Kolb’s K12Online07 presentation on “Cell Phones as Classroom Learning Tools” was Jott.com, which converts voice to text. I haven’t used this yet but plan to in upcoming weeks.
Ryan Collins

March 8, 2008

1. On the mac to convert text to speech, you can do it form the command line:

say -f inputfile.txt -o output.aiff

Under Leopard, use the Alex voice and it sounds very good.

2. Also under OS X, in Safari (or any Cocoa application) you can select text, go to the application menu -> Services -> Speech -> Start Speaking Text and it will read it to you.
Josh

March 12, 2008

A few pieces, depending on who you’re looking to regarding the audio portion, many text to speech engines are not appropriate for visually impaired or blind users because they are mouse driven (rather than by keyboard shortcuts. For those who are not visually impaired but need text to speech, other alternatives which the user controls are Read Please, Hal Reader and Speakonia, each of these is free and can be found via http://mainevrc.googlepages.com/freetexttospeechsoftware. Commercially , Read and Write Gold will create and audio file of a text selection under its Speechmaker function.
Though speech to text software programs require training, my understanding is that they’re quite accurate for the user once trained. Feature which appears to have been overlooked in Vista is th speech to text feature built in (and is quite good right out of the box).