Mailing List Hosted on Kabissa - Space for Change in Africa

a12n-forum Mailing List Archive: [A12n-forum] Searchable audio

[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

  • Subject: [A12n-forum] Searchable audio
  • From: "Don Osborn" <dzo@xxxxxxxxxxxx>
  • Date: Thu, 18 Dec 2003 21:09:59 +0100
Christopher Culy of FXPAL passed on some interesting info re searchable
audio files which is beyond what I have suggested earlier using
speech-to-text.  An article last year by Jon Udell in InfoWorld (12 Dec. 02)
entitled "The power of voice: Fast-Talk Communications brings full-text
search to audio recordings" at
http://www.infoworld.com/article/02/12/13/021216apfastalk_1.html gives a
profile of a product that has since changed its name from Fast-Talk to
Nexidia (website http://www.nexidia.com ).

This is somewhat more technical (and at the moment a great deal more
expensive) than what one might consider for most African language & ICT
applications now - though it could eventually be very useful in oral history
repositories in Africa such as CELHTO that are digitizing resources.  Beyond
that it shows the potential for increased use of audio in ICT.

Don Osborn
Bisharat.net

[excerpts from Udell's article follow]

"Fast-Talk Communications' revolutionary phonetic indexing and search
technology brings the magic of full-text search to the formerly opaque
realms of audio recordings and video soundtracks.

"What Fast-Talk sells is an engine and a software development kit, not an
end-user product. The kit includes a 'technology demo,' however, which is a
fully functional tool that has changed how I work in a dramatic way.

"The Fast-Talk engine can work with multiple audio formats, using pluggable
'media accessors' to encapsulate them. The technology demo supports only WAV
files, which it indexes to create PAT (phonetic audio track) indexes. If you
want to search video, Fast-Talk recommends using VirtualDub, an open-source
program, to extract the audio track as a WAV file. You can use Fast-Talk's
demo to index pre-existing WAV files or, as I did, to index a WAV file while
recording. This near-real-time indexing meant I was able to begin searching
the index as soon as the 45-minute conversation ended. That was true because
Fast-Talk's phonetic technology is orders of magnitude faster than the
conventional alternative: speech-to-text translation followed by text
indexing.

"Like many great innovations, Fast-Talk is simple to describe. Phonemes are
the basic units of sound in a language, and North American English has 39 of
them. You can look up a word's phonetic spelling in the Carnegie Mellon
dictionary (see Kevin Lenzo's Web site at
www.speech.cs.cmu.edu/cgi-bin/cmudict ). 'Dictionary, for example, works out
to 'D IH K SH AH N EH R IY.' Fast-Talk's indexer recognizes phonemes and
notes the time of their occurrence. The searcher converts text input to
phoneme strings, looks for them, and returns their time-codes. It's as
simple -- and brilliant -- as that.

"To succeed in the real world, Fast-Talk will have to work well with
whatever raw material it can get -- and it does. Although it is tuned for
North American English, the international nature of our industry made it
inevitable that I would push those limits."



[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index]

Last Updated: Wed Mar 14 23:48:31 2007

a12n-forum is hosted on Kabissa - Space for Change in Africa

Your feedback is important. Click here to send a message to the Kabissa team.

Terms of Use | Privacy Notice | Web Site Credits © 1999-2006, Kabissa or its affiliates