Want to help improve LibriVox?

As I was synching the podcast feed for this LibriVox essay collection, to keep me company on a long walk tonight, I was reminded of a wart in the feed generator. The auto-generated filenames are just auto-incremented book names. That’s not so bad when you’re listening to a chapter book, but pretty lame when you’re sampling a collection. I don’t want to see:

01_librivox_nonfiction_collection
02_librivox_nonfiction_collection
03_librivox_nonfiction_collection

Instead I want to see:

What Is Enlightenment? by Immanuel Kant
Deity and Design by Chapman Cohen
Escape by Christopher Benson

What idiot wrote that feed generator?

Oh yeah. Me.

If someone wants to improve this before I can find the time, just go for it. The LibriVox crew would really appreciate it, and so would I.

11 Comments

  1. yeah, on the list of things to do… generate proper metadata in the rss feeds… help would be appreciated. all the data is there in standard format, just not rss-ized.

  2. Jon,
    What language is this script in? I’m doing some stuff w/ MP3 tags in .Net & might be able to help.

  3. Hey Jon,

    Making steady progress, I think.

    Python is pretty strange. This statement freaked me out:

    return x, y, z

    and of course:

    (x, y, z) = func(p)

    But there are currently a couple of issues:

    1) request for the mp3 from archive.org returns a 302 – redirect, which shouldn’t be hard to deal with,

    2) how is it that reading a 6K chunk in the middle of the file gives you minutes & seconds? Pretty cool. But I believe that ID3 tags are at the end of the MP3 file, so currently, I can retrieve title & artist when I pull down the entire file …. which is significantly larger than 6K.

    Unless I can somehow just pull down the portion of the file that contain the ID3 tags.

    We’ll see…

  4. “request for the mp3 from archive.org returns a 302 – redirect, which shouldn’t be hard to deal with”

    Originally I had the script follow that redirect, but the LibriVox folks found it was better to let the RSS reader do that at feed fetch time.

    “I believe that ID3 tags are at the end of the MP3 file”

    Of course all the metadata comes from the LibriVox database. It could be scraped from the page, or perhaps LibriVox can publish it in a more tractable form.

  5. Come to think of it, the title & artist info for each track can be acquire by screen scrapping alone. No need to go to the MP3 themselves.

  6. “title & artist info for each track can be acquire by screen scraping alone”

    That is true. However I would recommend that LibriVox publish this metadata as a distinct XML fragment for each work. Not only for the purposes of the feed generator, but for use by other aggregators that will want to get hold of what are, in effect, bibliographic records.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s