<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Why didn&#8217;t phonetic audio indexing prevail?</title>
	<atom:link href="http://blog.jonudell.net/2008/09/19/why-didnt-phonetic-audio-indexing-prevail/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.jonudell.net/2008/09/19/why-didnt-phonetic-audio-indexing-prevail/</link>
	<description>Strategies for Internet citizens</description>
	<lastBuildDate>Sun, 12 Feb 2012 18:22:41 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: Jack</title>
		<link>http://blog.jonudell.net/2008/09/19/why-didnt-phonetic-audio-indexing-prevail/#comment-129070</link>
		<dc:creator><![CDATA[Jack]]></dc:creator>
		<pubDate>Sun, 05 Jul 2009 01:16:33 +0000</pubDate>
		<guid isPermaLink="false">http://jonudell.wordpress.com/?p=638#comment-129070</guid>
		<description><![CDATA[A hybrid of word lattice and phonetic lattice indexing has been shown to perform pretty well. See Jonathan Mamou&#039;s paper in SIGIR &#039;06 as well as work by Peng Yu and Frank Seide at MSR Asia.

With respect to text, my recent master&#039;s thesis (not published yet...) focused on the utility of transcript snippets in speech retrieval compared to relevance visualizations. The not so surprising result was that users preferred text to be present in the search interface even though they performed just as well without text. They wanted to see the content, even if the snippets did not improve search performance.]]></description>
		<content:encoded><![CDATA[<p>A hybrid of word lattice and phonetic lattice indexing has been shown to perform pretty well. See Jonathan Mamou&#8217;s paper in SIGIR &#8217;06 as well as work by Peng Yu and Frank Seide at MSR Asia.</p>
<p>With respect to text, my recent master&#8217;s thesis (not published yet&#8230;) focused on the utility of transcript snippets in speech retrieval compared to relevance visualizations. The not so surprising result was that users preferred text to be present in the search interface even though they performed just as well without text. They wanted to see the content, even if the snippets did not improve search performance.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sandra</title>
		<link>http://blog.jonudell.net/2008/09/19/why-didnt-phonetic-audio-indexing-prevail/#comment-126640</link>
		<dc:creator><![CDATA[Sandra]]></dc:creator>
		<pubDate>Thu, 29 Jan 2009 05:52:13 +0000</pubDate>
		<guid isPermaLink="false">http://jonudell.wordpress.com/?p=638#comment-126640</guid>
		<description><![CDATA[FastTalk became Nexidia, which I believe is the largest audio indexing software company based on phonetic indexing.]]></description>
		<content:encoded><![CDATA[<p>FastTalk became Nexidia, which I believe is the largest audio indexing software company based on phonetic indexing.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Len Lynch</title>
		<link>http://blog.jonudell.net/2008/09/19/why-didnt-phonetic-audio-indexing-prevail/#comment-125572</link>
		<dc:creator><![CDATA[Len Lynch]]></dc:creator>
		<pubDate>Wed, 15 Oct 2008 04:08:07 +0000</pubDate>
		<guid isPermaLink="false">http://jonudell.wordpress.com/?p=638#comment-125572</guid>
		<description><![CDATA[&quot;It’s really hard for me to understand why these potential benefits have not been exploited.&quot;

Great question Jon.

Possible sources where insight might be gleaned from, participants with: http://www.podscope.com/ and the former http://podzinger.com/ ?

Both focused on passionate early podcast audiences. Podscope uses (used to use?) phonetic tech. Podzinger did not.

Podzinger was birthed at BBN using tech developed from phone system (possible over generalization), then spun-off, attempting monetize the service, one way to exploit it.

It may only be a coincidence that the service that started with a phonetic approach is still standing...

About 2 years ago, I used both of these services in attempts to find podcast subject matter that was of interest. Neither were particularly effective. I recall using podzinger more often. Since not much came of it, I stopped using them...

The examples you provided with Google audio indexing appear to perform much better than I recall either of these services at that time.]]></description>
		<content:encoded><![CDATA[<p>&#8220;It’s really hard for me to understand why these potential benefits have not been exploited.&#8221;</p>
<p>Great question Jon.</p>
<p>Possible sources where insight might be gleaned from, participants with: <a href="http://www.podscope.com/" rel="nofollow">http://www.podscope.com/</a> and the former <a href="http://podzinger.com/" rel="nofollow">http://podzinger.com/</a> ?</p>
<p>Both focused on passionate early podcast audiences. Podscope uses (used to use?) phonetic tech. Podzinger did not.</p>
<p>Podzinger was birthed at BBN using tech developed from phone system (possible over generalization), then spun-off, attempting monetize the service, one way to exploit it.</p>
<p>It may only be a coincidence that the service that started with a phonetic approach is still standing&#8230;</p>
<p>About 2 years ago, I used both of these services in attempts to find podcast subject matter that was of interest. Neither were particularly effective. I recall using podzinger more often. Since not much came of it, I stopped using them&#8230;</p>
<p>The examples you provided with Google audio indexing appear to perform much better than I recall either of these services at that time.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ariane Nabeth</title>
		<link>http://blog.jonudell.net/2008/09/19/why-didnt-phonetic-audio-indexing-prevail/#comment-125378</link>
		<dc:creator><![CDATA[Ariane Nabeth]]></dc:creator>
		<pubDate>Mon, 22 Sep 2008 11:18:12 +0000</pubDate>
		<guid isPermaLink="false">http://jonudell.wordpress.com/?p=638#comment-125378</guid>
		<description><![CDATA[Interesting thread. 
Using multiple phonetic mapping for the search text could probably improve the search recall for rare or new names and expressions. Maybe it could just work for audio search like orthographic corrections work for text search : google suggests a correction when your search recalls few results as compared to a very similar word or expression.

But I just wanted to add that I see no technical reason why a purely phonetic trancsription process would be so much faster or more readily available than a word-based transcription process.

Indeed, in both case, the speech-to-text (or ASR) engine uses some word-based statistical language model to rank the transcription hypothesis. in non-technical  words : no engine (and in fact no human) is capable of transcribing speech phoneticaly without some knowledge about the language, and about how words usually combine together. 
So, in all cases, FastTalk or GoogleSearch, a text transcription must be part of the engine primitive output. 

If GoogleSearch and FastTalk do not have the same availabity or speed or distribution scheme, blame it on business choices not on technical reasons.

Beside, I&#039;ve seen no evidence that GoogleSearch was not real-time, and indeed if you want to transcribe and index everyday all the audio of the day, you&#039;d better be faster than real-time! (but I must admit I&#039;ve not checked how much audio was indexed everyday by GoogleSearch).]]></description>
		<content:encoded><![CDATA[<p>Interesting thread.<br />
Using multiple phonetic mapping for the search text could probably improve the search recall for rare or new names and expressions. Maybe it could just work for audio search like orthographic corrections work for text search : google suggests a correction when your search recalls few results as compared to a very similar word or expression.</p>
<p>But I just wanted to add that I see no technical reason why a purely phonetic trancsription process would be so much faster or more readily available than a word-based transcription process.</p>
<p>Indeed, in both case, the speech-to-text (or ASR) engine uses some word-based statistical language model to rank the transcription hypothesis. in non-technical  words : no engine (and in fact no human) is capable of transcribing speech phoneticaly without some knowledge about the language, and about how words usually combine together.<br />
So, in all cases, FastTalk or GoogleSearch, a text transcription must be part of the engine primitive output. </p>
<p>If GoogleSearch and FastTalk do not have the same availabity or speed or distribution scheme, blame it on business choices not on technical reasons.</p>
<p>Beside, I&#8217;ve seen no evidence that GoogleSearch was not real-time, and indeed if you want to transcribe and index everyday all the audio of the day, you&#8217;d better be faster than real-time! (but I must admit I&#8217;ve not checked how much audio was indexed everyday by GoogleSearch).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: martin langhoff</title>
		<link>http://blog.jonudell.net/2008/09/19/why-didnt-phonetic-audio-indexing-prevail/#comment-125377</link>
		<dc:creator><![CDATA[martin langhoff]]></dc:creator>
		<pubDate>Mon, 22 Sep 2008 09:55:21 +0000</pubDate>
		<guid isPermaLink="false">http://jonudell.wordpress.com/?p=638#comment-125377</guid>
		<description><![CDATA[Here&#039;s an idea: Use the phonetic transcription as the seed for a wikipage-transcript, and let users improve it. A mostly-right transcript is trivial for a human to correct incrementally, where a full transcription (and time-coding) is a big job.

I&#039;ve created &#039;stub&#039; pages in wikipedia on topics that I hoped were there, only to find them a couple of months later fairly well developed. Move the ball into the &quot;easy to do in little collaborative steps&quot; and it&#039;ll happen.]]></description>
		<content:encoded><![CDATA[<p>Here&#8217;s an idea: Use the phonetic transcription as the seed for a wikipage-transcript, and let users improve it. A mostly-right transcript is trivial for a human to correct incrementally, where a full transcription (and time-coding) is a big job.</p>
<p>I&#8217;ve created &#8216;stub&#8217; pages in wikipedia on topics that I hoped were there, only to find them a couple of months later fairly well developed. Move the ball into the &#8220;easy to do in little collaborative steps&#8221; and it&#8217;ll happen.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jon Udell</title>
		<link>http://blog.jonudell.net/2008/09/19/why-didnt-phonetic-audio-indexing-prevail/#comment-125366</link>
		<dc:creator><![CDATA[Jon Udell]]></dc:creator>
		<pubDate>Fri, 19 Sep 2008 18:00:48 +0000</pubDate>
		<guid isPermaLink="false">http://jonudell.wordpress.com/?p=638#comment-125366</guid>
		<description><![CDATA[Ken:
&gt; We seem (or at least /I/ seem *grin*) to
&gt; find searches that appear with context 
&gt; more tractable

That&#039;s certainly true. Still, it&#039;s not as though people have had the opportunity to try phonetic systems and reject them on that basis. I can&#039;t point to examples where the approach has even been tried.

Even without a text snippet, the raw capability to jump in an audio stream to a word you&#039;ve searched for and found is powerful and I would have thought compelling.

What&#039;s more, the phonetic approach is radically more efficient computationally. Had it prevailed, we&#039;d have vast quantities of searchable audio now. Plus, the ability to search current material -- like a convention speech that just ended -- in near realtime. 

It&#039;s really hard for me to understand why these potential benefits have not been exploited.

Eric:
&gt; Maybe what’s keeping it back is the
&gt; workarounds we have - “start listening
&gt; at 13:42.”

Yes, although both the original FastTalk prototype and the current Google implementation do a very good job of jumping you to the quote in the audio stream.]]></description>
		<content:encoded><![CDATA[<p>Ken:<br />
&gt; We seem (or at least /I/ seem *grin*) to<br />
&gt; find searches that appear with context<br />
&gt; more tractable</p>
<p>That&#8217;s certainly true. Still, it&#8217;s not as though people have had the opportunity to try phonetic systems and reject them on that basis. I can&#8217;t point to examples where the approach has even been tried.</p>
<p>Even without a text snippet, the raw capability to jump in an audio stream to a word you&#8217;ve searched for and found is powerful and I would have thought compelling.</p>
<p>What&#8217;s more, the phonetic approach is radically more efficient computationally. Had it prevailed, we&#8217;d have vast quantities of searchable audio now. Plus, the ability to search current material &#8212; like a convention speech that just ended &#8212; in near realtime. </p>
<p>It&#8217;s really hard for me to understand why these potential benefits have not been exploited.</p>
<p>Eric:<br />
&gt; Maybe what’s keeping it back is the<br />
&gt; workarounds we have &#8211; “start listening<br />
&gt; at 13:42.”</p>
<p>Yes, although both the original FastTalk prototype and the current Google implementation do a very good job of jumping you to the quote in the audio stream.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Eric</title>
		<link>http://blog.jonudell.net/2008/09/19/why-didnt-phonetic-audio-indexing-prevail/#comment-125365</link>
		<dc:creator><![CDATA[Eric]]></dc:creator>
		<pubDate>Fri, 19 Sep 2008 17:22:45 +0000</pubDate>
		<guid isPermaLink="false">http://jonudell.wordpress.com/?p=638#comment-125365</guid>
		<description><![CDATA[I also thought that FastTalk or something like it would be a winner one day. I still have a FastTalk t-shirt someplace.

Maybe what&#039;s keeping it back is the workarounds we have - &quot;start listening at 13:42.&quot;]]></description>
		<content:encoded><![CDATA[<p>I also thought that FastTalk or something like it would be a winner one day. I still have a FastTalk t-shirt someplace.</p>
<p>Maybe what&#8217;s keeping it back is the workarounds we have &#8211; &#8220;start listening at 13:42.&#8221;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dilip Krishnan</title>
		<link>http://blog.jonudell.net/2008/09/19/why-didnt-phonetic-audio-indexing-prevail/#comment-125364</link>
		<dc:creator><![CDATA[Dilip Krishnan]]></dc:creator>
		<pubDate>Fri, 19 Sep 2008 15:36:17 +0000</pubDate>
		<guid isPermaLink="false">http://jonudell.wordpress.com/?p=638#comment-125364</guid>
		<description><![CDATA[Its so intuitive to search phonetically, specially for names, even more so for names from a different culture/language/country. I would think a hybrid of the two will really work well. The question is how does one signal which parts to index phonetically and which parts as text. As usual an amazing observation!]]></description>
		<content:encoded><![CDATA[<p>Its so intuitive to search phonetically, specially for names, even more so for names from a different culture/language/country. I would think a hybrid of the two will really work well. The question is how does one signal which parts to index phonetically and which parts as text. As usual an amazing observation!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ken Kennedy</title>
		<link>http://blog.jonudell.net/2008/09/19/why-didnt-phonetic-audio-indexing-prevail/#comment-125363</link>
		<dc:creator><![CDATA[Ken Kennedy]]></dc:creator>
		<pubDate>Fri, 19 Sep 2008 15:32:58 +0000</pubDate>
		<guid isPermaLink="false">http://jonudell.wordpress.com/?p=638#comment-125363</guid>
		<description><![CDATA[My guess would be your own comment: &quot;...this approach doesn&#039;t yield a transcript&quot;. 

We seem (or at least /I/ seem *grin*) to find searches that appear with context more tractable, and audio/video context is simply much more difficult. It involves dragging back and forth through a media file, and even if that&#039;s automagic, the simple act of reviewing 3-7 seconds of audio/video muliple times to find your best/correct search result seems much more involved than skimming 10 lines of text. Text, if nothing else, is efficient.]]></description>
		<content:encoded><![CDATA[<p>My guess would be your own comment: &#8220;&#8230;this approach doesn&#8217;t yield a transcript&#8221;. </p>
<p>We seem (or at least /I/ seem *grin*) to find searches that appear with context more tractable, and audio/video context is simply much more difficult. It involves dragging back and forth through a media file, and even if that&#8217;s automagic, the simple act of reviewing 3-7 seconds of audio/video muliple times to find your best/correct search result seems much more involved than skimming 10 lines of text. Text, if nothing else, is efficient.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

