a12n-forum Mailing List Archive: [A12n-forum] Searchable audio[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]
Christopher Culy of FXPAL passed on some interesting info re searchable audio files which is beyond what I have suggested earlier using speech-to-text. An article last year by Jon Udell in InfoWorld (12 Dec. 02) entitled "The power of voice: Fast-Talk Communications brings full-text search to audio recordings" at http://www.infoworld.com/article/02/12/13/021216apfastalk_1.html gives a profile of a product that has since changed its name from Fast-Talk to Nexidia (website http://www.nexidia.com ). This is somewhat more technical (and at the moment a great deal more expensive) than what one might consider for most African language & ICT applications now - though it could eventually be very useful in oral history repositories in Africa such as CELHTO that are digitizing resources. Beyond that it shows the potential for increased use of audio in ICT. Don Osborn Bisharat.net [excerpts from Udell's article follow] "Fast-Talk Communications' revolutionary phonetic indexing and search technology brings the magic of full-text search to the formerly opaque realms of audio recordings and video soundtracks. "What Fast-Talk sells is an engine and a software development kit, not an end-user product. The kit includes a 'technology demo,' however, which is a fully functional tool that has changed how I work in a dramatic way. "The Fast-Talk engine can work with multiple audio formats, using pluggable 'media accessors' to encapsulate them. The technology demo supports only WAV files, which it indexes to create PAT (phonetic audio track) indexes. If you want to search video, Fast-Talk recommends using VirtualDub, an open-source program, to extract the audio track as a WAV file. You can use Fast-Talk's demo to index pre-existing WAV files or, as I did, to index a WAV file while recording. This near-real-time indexing meant I was able to begin searching the index as soon as the 45-minute conversation ended. That was true because Fast-Talk's phonetic technology is orders of magnitude faster than the conventional alternative: speech-to-text translation followed by text indexing. "Like many great innovations, Fast-Talk is simple to describe. Phonemes are the basic units of sound in a language, and North American English has 39 of them. You can look up a word's phonetic spelling in the Carnegie Mellon dictionary (see Kevin Lenzo's Web site at www.speech.cs.cmu.edu/cgi-bin/cmudict ). 'Dictionary, for example, works out to 'D IH K SH AH N EH R IY.' Fast-Talk's indexer recognizes phonemes and notes the time of their occurrence. The searcher converts text input to phoneme strings, looks for them, and returns their time-codes. It's as simple -- and brilliant -- as that. "To succeed in the real world, Fast-Talk will have to work well with whatever raw material it can get -- and it does. Although it is tuned for North American English, the international nature of our industry made it inevitable that I would push those limits."[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index] Last Updated: Wed Mar 14 23:48:31 2007 |
a12n-forum is hosted on Kabissa - Space for Change in Africa
Your feedback is important. Click here to send a message to the Kabissa team.
Terms of Use | Privacy Notice | Web Site Credits © 1999-2006, Kabissa or its affiliates