<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
		>
<channel>
	<title>Comments on: Overcoming synthetic voice shock</title>
	<atom:link href="http://blog.jonudell.net/2008/07/30/overcoming-synthetic-voice-shock/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.jonudell.net/2008/07/30/overcoming-synthetic-voice-shock/</link>
	<description>Strategies for Internet citizens</description>
	<lastBuildDate>Sun, 12 Feb 2012 18:22:41 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
	<item>
		<title>By: Tim</title>
		<link>http://blog.jonudell.net/2008/07/30/overcoming-synthetic-voice-shock/#comment-124963</link>
		<dc:creator><![CDATA[Tim]]></dc:creator>
		<pubDate>Wed, 13 Aug 2008 04:54:36 +0000</pubDate>
		<guid isPermaLink="false">http://jonudell.wordpress.com/?p=459#comment-124963</guid>
		<description><![CDATA[I haven&#039;t tried this yet, but IBM Research has an &quot;expressive&quot; text-to-speech system on their site:

http://www.research.ibm.com/tts/]]></description>
		<content:encoded><![CDATA[<p>I haven&#8217;t tried this yet, but IBM Research has an &#8220;expressive&#8221; text-to-speech system on their site:</p>
<p><a href="http://www.research.ibm.com/tts/" rel="nofollow">http://www.research.ibm.com/tts/</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mike Caulfield</title>
		<link>http://blog.jonudell.net/2008/07/30/overcoming-synthetic-voice-shock/#comment-124845</link>
		<dc:creator><![CDATA[Mike Caulfield]]></dc:creator>
		<pubDate>Thu, 31 Jul 2008 18:00:22 +0000</pubDate>
		<guid isPermaLink="false">http://jonudell.wordpress.com/?p=459#comment-124845</guid>
		<description><![CDATA[For a while, back around 2000, I got into using TTS to make mp3s of Gutenberg stuff. I can tell you that after a bit it does fade away -- In the end, I was actually listening to Chesterton and James -- and you don&#039;t sense it as a voice at a certain point. 

It was strange -- because I was also using Audible.com at the time, and I came to the conclusion that TTS was inferior when compared to a good reader, but *superior* to an annoying reader (which happens a bunch on audiobooks).]]></description>
		<content:encoded><![CDATA[<p>For a while, back around 2000, I got into using TTS to make mp3s of Gutenberg stuff. I can tell you that after a bit it does fade away &#8212; In the end, I was actually listening to Chesterton and James &#8212; and you don&#8217;t sense it as a voice at a certain point. </p>
<p>It was strange &#8212; because I was also using Audible.com at the time, and I came to the conclusion that TTS was inferior when compared to a good reader, but *superior* to an annoying reader (which happens a bunch on audiobooks).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joe Clark</title>
		<link>http://blog.jonudell.net/2008/07/30/overcoming-synthetic-voice-shock/#comment-124842</link>
		<dc:creator><![CDATA[Joe Clark]]></dc:creator>
		<pubDate>Wed, 30 Jul 2008 21:48:05 +0000</pubDate>
		<guid isPermaLink="false">http://jonudell.wordpress.com/?p=459#comment-124842</guid>
		<description><![CDATA[The standard response these days is to try Mac OS X VoiceOver with the Alex voice, which is uncanny but keeps you out of the valley.]]></description>
		<content:encoded><![CDATA[<p>The standard response these days is to try Mac OS X VoiceOver with the Alex voice, which is uncanny but keeps you out of the valley.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: slger</title>
		<link>http://blog.jonudell.net/2008/07/30/overcoming-synthetic-voice-shock/#comment-124841</link>
		<dc:creator><![CDATA[slger]]></dc:creator>
		<pubDate>Wed, 30 Jul 2008 16:38:10 +0000</pubDate>
		<guid isPermaLink="false">http://jonudell.wordpress.com/?p=459#comment-124841</guid>
		<description><![CDATA[Whoops, I plead guilty of the same developer bias. I use Neospeech Kate for both screen and text reading, so that voice sounds best to me. 

I added ATT Mike and ATT Crystal to the list of audio recordings for more contrast in voice. Actually, I like ATT Mike best for this recording. 

Just to mention a few more parameters. Speed is slower in these recordings than usually used by a more experienced listener, with reported reading rates up to 800, versus more normal 250, words per minute.

Another variable is the dictionary for pronouncing and abbreviating words. These are highly tuned for limited vocabulary applications, as in telephony. 

Also, all text readers have different styles of pausing, pronouncing, etc. The text read in the same voice will sound different in the screen reader and the tool that converts the text to mp3.

And, even if we know the voice is nothing but a data file, we still ascribe personality, gender, and other human attributes, as reported in Nass&#039; fascinating &quot;Wired for Speech&quot; experiments.

Listening to synthetic speech is a skill. Or, from the other side, inability to master synthetic speech is, well, a disability when it comes to using audio interfaced tools.

Thanks for the feedback,

Susan]]></description>
		<content:encoded><![CDATA[<p>Whoops, I plead guilty of the same developer bias. I use Neospeech Kate for both screen and text reading, so that voice sounds best to me. </p>
<p>I added ATT Mike and ATT Crystal to the list of audio recordings for more contrast in voice. Actually, I like ATT Mike best for this recording. </p>
<p>Just to mention a few more parameters. Speed is slower in these recordings than usually used by a more experienced listener, with reported reading rates up to 800, versus more normal 250, words per minute.</p>
<p>Another variable is the dictionary for pronouncing and abbreviating words. These are highly tuned for limited vocabulary applications, as in telephony. </p>
<p>Also, all text readers have different styles of pausing, pronouncing, etc. The text read in the same voice will sound different in the screen reader and the tool that converts the text to mp3.</p>
<p>And, even if we know the voice is nothing but a data file, we still ascribe personality, gender, and other human attributes, as reported in Nass&#8217; fascinating &#8220;Wired for Speech&#8221; experiments.</p>
<p>Listening to synthetic speech is a skill. Or, from the other side, inability to master synthetic speech is, well, a disability when it comes to using audio interfaced tools.</p>
<p>Thanks for the feedback,</p>
<p>Susan</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Chris Winters</title>
		<link>http://blog.jonudell.net/2008/07/30/overcoming-synthetic-voice-shock/#comment-124840</link>
		<dc:creator><![CDATA[Chris Winters]]></dc:creator>
		<pubDate>Wed, 30 Jul 2008 15:06:09 +0000</pubDate>
		<guid isPermaLink="false">http://jonudell.wordpress.com/?p=459#comment-124840</guid>
		<description><![CDATA[My company (Vocollect Healthcare Systems) develops a wearable computer and software for voice-assisted healthcare and we have much the same reaction from people on first blush. We can talk about how workers get used to the TTS because they hear them every day, all the time, but the decision-makers have a mental barrier that prevents them from believing this. Nearly every user who is not hostile to the system is fine with the TTS after getting used to it.

One of the interesting issues is an &#039;uncanny valley&#039; for voice. Making a voice sound *too* human can be disorienting the 5-10% of the time when it mis-pronounces a word, or has an unnatural pause. It shocks you into remembering you&#039;re listening to a computer. 

Additionally, for applications like ours where the user talks back to the computer it&#039;s useful for the user to remember there is actually a computer on the other end of the conversation, otherwise they may relax and fall into more natural speech patterns instead of the constrained dialog that&#039;s necessary.]]></description>
		<content:encoded><![CDATA[<p>My company (Vocollect Healthcare Systems) develops a wearable computer and software for voice-assisted healthcare and we have much the same reaction from people on first blush. We can talk about how workers get used to the TTS because they hear them every day, all the time, but the decision-makers have a mental barrier that prevents them from believing this. Nearly every user who is not hostile to the system is fine with the TTS after getting used to it.</p>
<p>One of the interesting issues is an &#8216;uncanny valley&#8217; for voice. Making a voice sound *too* human can be disorienting the 5-10% of the time when it mis-pronounces a word, or has an unnatural pause. It shocks you into remembering you&#8217;re listening to a computer. </p>
<p>Additionally, for applications like ours where the user talks back to the computer it&#8217;s useful for the user to remember there is actually a computer on the other end of the conversation, otherwise they may relax and fall into more natural speech patterns instead of the constrained dialog that&#8217;s necessary.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Gregg Marshall</title>
		<link>http://blog.jonudell.net/2008/07/30/overcoming-synthetic-voice-shock/#comment-124839</link>
		<dc:creator><![CDATA[Gregg Marshall]]></dc:creator>
		<pubDate>Wed, 30 Jul 2008 13:52:20 +0000</pubDate>
		<guid isPermaLink="false">http://jonudell.wordpress.com/?p=459#comment-124839</guid>
		<description><![CDATA[The voices in the demos are certainly more understandable than the Kurzweil reading machine demo I heard 30 years ago.

There are better voice solutions.  ATT has some technology called Natural Voices that is integrated into text to speech software that is really much better than these demos.]]></description>
		<content:encoded><![CDATA[<p>The voices in the demos are certainly more understandable than the Kurzweil reading machine demo I heard 30 years ago.</p>
<p>There are better voice solutions.  ATT has some technology called Natural Voices that is integrated into text to speech software that is really much better than these demos.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

