Want to help improve LibriVox?

16 Oct 200716 Oct 2007 ~ Jon Udell

As I was synching the podcast feed for this LibriVox essay collection, to keep me company on a long walk tonight, I was reminded of a wart in the feed generator. The auto-generated filenames are just auto-incremented book names. That’s not so bad when you’re listening to a chapter book, but pretty lame when you’re sampling a collection. I don’t want to see:

01_librivox_nonfiction_collection

02_librivox_nonfiction_collection

03_librivox_nonfiction_collection

Instead I want to see:

What Is Enlightenment? by Immanuel Kant

Deity and Design by Chapman Cohen

Escape by Christopher Benson

What idiot wrote that feed generator?

Oh yeah. Me.

If someone wants to improve this before I can find the time, just go for it. The LibriVox crew would really appreciate it, and so would I.

Published by Jon Udell

View all posts by Jon Udell

11 thoughts on “Want to help improve LibriVox?”

hugh says:

16 Oct 2007 at 10:06 pm

yeah, on the list of things to do… generate proper metadata in the rss feeds… help would be appreciated. all the data is there in standard format, just not rss-ized.

Loading...

Reply
Kara Shallenberg says:

16 Oct 2007 at 10:16 pm

Yup, that’d sure be nice. A lot of books have lovely descriptive chapter titles, too. It would be great for those to get into the feed somehow.

Loading...

Reply
Minh says:

17 Oct 2007 at 8:01 am

Jon,
What language is this script in? I’m doing some stuff w/ MP3 tags in .Net & might be able to help.

Loading...

Reply
Jon Udell says:

17 Oct 2007 at 12:26 pm

“What language is this script in?”

Python.

http://jonudell.net/librivox.py
http://jonudell.net/mp3info.py

Loading...

Reply
Minh says:

17 Oct 2007 at 10:00 pm

Hey Jon,

Making steady progress, I think.

Python is pretty strange. This statement freaked me out:

return x, y, z

and of course:

(x, y, z) = func(p)

But there are currently a couple of issues:

1) request for the mp3 from archive.org returns a 302 – redirect, which shouldn’t be hard to deal with,

2) how is it that reading a 6K chunk in the middle of the file gives you minutes & seconds? Pretty cool. But I believe that ID3 tags are at the end of the MP3 file, so currently, I can retrieve title & artist when I pull down the entire file …. which is significantly larger than 6K.

Unless I can somehow just pull down the portion of the file that contain the ID3 tags.

We’ll see…

Loading...

Reply
Jon Udell says:

18 Oct 2007 at 7:37 am

“request for the mp3 from archive.org returns a 302 – redirect, which shouldn’t be hard to deal with”

Originally I had the script follow that redirect, but the LibriVox folks found it was better to let the RSS reader do that at feed fetch time.

“I believe that ID3 tags are at the end of the MP3 file”

Of course all the metadata comes from the LibriVox database. It could be scraped from the page, or perhaps LibriVox can publish it in a more tractable form.

Loading...

Reply
hugh says:

18 Oct 2007 at 11:43 am

our database holds the id3tags, i believe, so we could publish those too i think.

Loading...

Reply
Jeremy Dunck says:

18 Oct 2007 at 7:47 pm

Minh:
The HTTP spec defines byte ranges. Some web servers don’t support them, but archive.org does.

http://www.ietf.org/rfc/rfc2616.txt
Section 14.35.1 is what you want.

httplib2 is better than httplib2 for this kind of thing: http://code.google.com/p/httplib2/
docs: http://bitworking.org/projects/httplib2/ref/http-objects.html

Example of using httplib against archive.org to get part of the file:
http://dpaste.com/22843/

Loading...

Reply
Jeremy Dunck says:

18 Oct 2007 at 7:47 pm

Correction: *httplib2* is better than *httplib*.

:)

Loading...

Reply
Minh says:

18 Oct 2007 at 11:03 pm

Come to think of it, the title & artist info for each track can be acquire by screen scrapping alone. No need to go to the MP3 themselves.

Loading...

Reply
Jon Udell says:

19 Oct 2007 at 7:34 am

“title & artist info for each track can be acquire by screen scraping alone”

That is true. However I would recommend that LibriVox publish this metadata as a distinct XML fragment for each work. Not only for the purposes of the feed generator, but for use by other aggregators that will want to get hold of what are, in effect, bibliographic records.

Loading...

Reply