Speech to Text… For a price

I’ve tried to use dictation software before. I believe it was Dragon Naturally Speaking. I spent hours reading paragraphs aloud, repeating phrases and correcting it. During one attempt, I spent two hours a day training the software to recognize my speech patterns. The end result though, was that I still had to talk like a robot to get it to recognize even 2/3 of the ramblings coming out of my mouth. My speech patterns just don’t lend themselves well to dictation I guess.

That’s why CastingWords fascinates me. Submit your audio, and real human beings transcribe it for you! It’ll be delivered as basic text, perfect for inclusion into a blog or web page.

Of course, there’s a cost. $.75 per minute. That’s not per minute that they work on it, rather per minute that you blabber on. So a 10 minute podcast would cost $7.50. That ain’t necessarily cheap, but at the same time it’s a fantastic service they’re providing. It makes your podcast immensely more searchable and greatly increases the chance of people finding your content. Also, it would be incredibly valuable for podcasts of interviews and conference sessions.

They actually do have a rate for podcasters as well. If you have a regularly produced podcast, the price drops down to $.45 a minute. Ten minute podcast is $4.50 then, which while still steep is a little easier to get down. Of course, your one hour conference session is going to get mighty pricey.

Regardless, it’s a great service to add to the toolbox.

By | 2007-07-05T15:11:20+00:00 July 5th, 2007|Musings|8 Comments

About the Author:


  1. Karen Janowski 7/6/2007 at Jul 06, 07 | 5:38 am

    When did you last try speech recognition? The reality is that speech recognition has improved significantly even since two years ago. Dragon Naturally Speaking v. 9 has excellent accuracy right out of the box. Otherwise you can train your voice in about five – ten minutes. I use it myself for writing evaluations and teach students with written output issues how to use it.
    Speech recognition is built into VISTA and has been found to have excellent accuracy as well.
    Try it again, you will be impressed! There really is no comparison between what is available now and what was available a few years back.
    (It’s also built into Office 2003 and the accuracy is excellent as well.)

  2. Casey Hales 7/6/2007 at Jul 06, 07 | 11:19 am


    I, too, tried Dragon Naturally Speaking in hopes I could use it with my students who so dislike writing. It performed much like Karen is saying, however my problem was not so much with its inability to recognize my speech, but I personally found it difficult to use due to that composing in one head and typing was not the same as doing the same thing, verbally; at least for me.
    As for Castingwords, what some people don’t realize is just how long a minute is. Having spent 20 years in broadcasting and having written and produced 100’s of commercials, a lot can be said in a single minute. So, what may sound like an expensive service, a lot can be covered in a short amount of time as long as you don’t ramble like I am. 🙂


  3. Paul Wilkinson 7/6/2007 at Jul 06, 07 | 3:13 pm

    What about this


    if you are looking to be able to search audio.

    I also tried this one


    but wasn’t impressed with the results. The first few hits contained offensive material. The idea is cool though.

  4. Brian B. 7/6/2007 at Jul 06, 07 | 8:16 pm

    Steve, I also gave up on speech recognition a few years back (ok, it’s been more like seven). I see what others have posted about the new stuff, but I wanted to comment that what you posted about seems to be a bargain. I have heard of people charging $10/page for transcribing interviews for qualitative researchers. One study could run you several hundred dollars at the rate. Sounds like Castingwords would beat that price though!

  5. Tom Turner 7/7/2007 at Jul 07, 07 | 1:42 pm


    All you have to do is ask…I’ll do it at a MUCH lower cost to you. Will work for iPhone or a video iPod. 🙂

  6. […] able to get all three classes blogging! I call that success, logging having a slot right up there! Steve Dembo recently posted about the Dragon software that I am using to transcribe. I’ve had the same […]

  7. Andy Allen 7/30/2007 at Jul 30, 07 | 7:15 am

    I think that the cooler technology is photo image recognition. Taking audio to text is pretty cool, and useful for search engines, but I can’t wait when my pictures and videos will automatically label themselves.

    Think of the day when you take a picture with your digital camera, that has a built in GPS, that geolocates your picture so you now know it was taken at the grand canyon. You upload this to your computer, and it ‘recognizes’ that the faces in the picture are of your two best friends, and tags the pictures with labels you can easily search on, like date, time, location, items in the picture, etc.

    Note, images.Google.com is now starting to recognize faces, just not who they are.

    In terms of 2007, there is still a huge need for words to text, in fact a lot of stenographers do the closed captioning on TV (as well as court and medical transcriptions). Unfortunately, a lot of that is going abroad instead of previously served stay at home parent market (generalization).

  8. linda 6/21/2008 at Jun 21, 08 | 5:16 am

    this site you can leave really aggree with you Barry. I m working for a similar kind of company called WiseTypist http://www.wisetypist.com. We also offer the similar service but at 30% less than their price. You may go with their service really for smaller works but coming to volume you houls really check once with our company

    Thank and regards,

Comments are closed.