Want to bootstrap the data web? Make batch data entry easier for civilians.

People are trying, once again, to kickstart the music scene here in my town. The other day I received two emails, each containing a schedule for a newly-activated local venue. In the past, I’ve advised folks to add this information to Eventful, which in turn feeds my my local aggregator. That hasn’t happened much, and when I sat down and did some of the data entry myself, I could see why. It’s such a drag!

There are really two very different scenarios for managing event data online: one personal, the other public. On the personal front, using services like Evite or Windows Live Events, you’re doing a single event: a meeting, a birthday party. It’s OK to fill in a form field by field.

But for public events, venue operators will typically want to do batch entry. And when you’ve got a schedule of dozens of events, it’s painful to decompose everything into fields and pump them into forms.

Here’s a piece of one of the schedules that was emailed to me:

March 15, 2008 (Saturday) Chris Fitz Band
March 20, 2008 (Thursday) Blues Jam w/ Otis Doncaster
March 22, 2008 (Saturday) Groove Theory

It was quick and easy for the author of that email to write out the schedule in that way. But it was really slow and difficult for me to input the same information to Eventful. Even though venue operators are highly motivated to do it, I can see why they often don’t.

So here’s how I speeded things up. I started with a template for a URL that invokes the Eventful API:


Then I made a bunch of copies, and tweaked them like so:


Because all the events start at 8:30, I only need to adjust the title, month, and day for each record. It’s not only way quicker and easier to enter data this way, it’s also quicker and easier to check and correct. When I was done I put the email into one window, the new file into another, and compared. Corrections here are way easier than corrections that require you to navigate to an online database record and edit it in a form.

Finally I inserted the curl command in front of each record, yielding a script that invokes the set of URLs:

curl http://api.evdb.com/ … title=Chris+Fitz&start_time=2008-03-15+20:30
curl http://api.evdb.com/ … title=Otis+Doncaster&start_time=2008-03-20+20:30
curl http://api.evdb.com/ … title=Groove+Theory&start_time=2008-03-22+20:30

I saved this script as eventful.cmd, ran it on the Windows command line, and produced this result.

Now clearly this method is too geeky for a typical venue operator. But an online service like Eventful could smooth out the rough edges. I can easily imagine an unstructured input form that includes a template like the one I’ve shown here, invites people to copy and tweak it, and runs a batch insertion. It would need to let people preview the results before committing them, but that’s doable.

It seems to me that a lot of information systems expect civilians to do per-item data entry, but not batch entry. For that, they provide APIs for geeks to use. But as we see here, these two styles of data entry aren’t necessarily very far apart. And by applying a bit of Wiki-like inferencing to a more English-like script, they could be brought even closer.

The friction of data entry remains the single largest obstacle to bootstrapping the data web. Efforts to overcome that friction, and reduce the distance between what civilians can do with forms and what geeks can do with scripts, could make a huge difference.


  1. The actions involved remind me a lot of what I’ve seen of Dabble DB and its importing. You really want to take a chunk of data, in whatever arbitrary format you have, and map it to another schema (the website’s entries). Dabble seemed compelling in the way you concretely create that mapping — working with the real data, but all your manipulations are abstract (without really seeming abstract).

    One could imagine a service that did that translation, separate from any particular service. You’d need some kind of service description, though I don’t think it needs to be nearly as complete as typical service descriptions, and then point the data translation service at that description plus your own data.

    Of course previews and probably transactions of some sort really need to be there to create a robust user experience. That is, the ability to try things out without side effects, and to avoid consistency errors like multiple submissions. All of this is much more SOA than REST :(

  2. Why not copy the data to one text box – like the “quick add” feature of google calendar, but for multiple items.
    No hassle with URL, and with preview it is safe…

  3. I had a somewhat similar experience about a year ago, on del.icio.us. My list of tags had grown into low thousands, and I noticed that there were many typos, singular/plural pairs and other anomalies. I began editing the tags using the tag renaming page at the site, and quickly tired of it. All 3000+ tags were in one alpha sorted dropdown listbox!

    So I dumped the source to file, examined the URLs submitted by the rename tags form and delete tags form and used Vim macros and regular expression substitutions to derive about 100 tag modification URLs.

    My intended edits were achieved within 10-15 minutes. I could have done that on a per-item basis, but it probably would have taken nearly two hours – more actually, since navigating a dropdown list of 3000+ entries is not simple.

  4. “My intended edits were achieved within 10-15 minutes.”

    Exactly. I’ve had this kind of experience with del.icio.us too. Several times I’ve had to tag hundreds of items. It’d be a non-starter through the web. But in a text file it’s a breeze. Hence my point: a Wiki-like front-end to APIs is an underappreciated power tool for data entry.

  5. iwantsandy.com does something interesting like this: parsing raw text communiques containing calendar and other information into a structured format. A big part of it’s success is the fact that it acts as a user interface for real people, providing feedback and metaphors that act as flexible abstractions around the structure of the data. The eponymous “sandy” is a fake personal assistant that you imagine yourself writing to. In the process you end up producing much more easily machine parseable statements. And, of course, there’s some heavy lifting going on in the actual parsing as well based on having seen a lot of real communiques.

  6. “iwantsandy.com does something interesting like this”

    I was going to say that sounds like what Rael Dornfest’s been working on. But…it /is/ what Rael’s been working on. Knew about stikkit, hadn’t caught up with the iwantsandy name. Thanks for the reminder!

  7. Agreat way to mitigate the pain of forms for visually impaired web users. A string of form elements filled in by simple text editing is an accessibility solution to a pervasive problem. Accessibility gurus should take note and please count me as a tester if forms are made available this way. Thanks.

  8. The times at eventful.com are all 8:30 pm (at least in my browser’s view — I’m in MST but I don’t think that is the problem since the math works in other direction). You said it was 6:30pm and used 18:30. Has the new method at api.evdb.com misinterpreted your start_time parameter?

  9. “You said it was 6:30pm and used 18:30. Has the new method at api.evdb.com misinterpreted your start_time parameter?”

    Oops, no, I wound up cribbing the example from a different script. The shows really are 8:30 and I’ve adjusted the above accordingly. Good eye!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s