May 2008
Monthly Archive
May 30, 2008
Posted by Jon Udell under
Uncategorized [11] Comments
When David Stutz left Microsoft, he wrote a parting essay that invoked a new kind of Internet-oriented operating system characterized by “software that runs above the level of a single device.” Tim O’Reilly echoed that phrase here, and often used it to help explain what he meant by Web 2.0.
The recently-announced LiveMesh is a nice example of software that runs above the level of a single device. It runs symmetrically on all your computing devices, in the part of the cloud that’s associated with your devices, and in other parts of the cloud where services you transact with are running. This entire constellation is the LiveMesh platform which, as Ray Ozzie recently explained to investors, is an answer to this question:
What would an OS look like in a world of multiple devices, in a world where instead of the computer being at the center, you are at the center?
The platform’s connective tissue, as discussed in my interview with Ray Ozzie, is FeedSync, a synchronization system based on the same simple technology that powers the blogosphere: XML feeds of items, in RSS or Atom formats. Whether they represent big chunks of information like documents and media files, or small scraps of information like calendar events and status messages, LiveMesh objects are made up of feeds. All these objects synchronize across your mesh of devices and services using the same openly-specified FeedSync mechanism.
That openness is another key characteristic of an Internet operating system. So it’s nice to see that FeedSync isn’t only being applied in the context of LiveMesh. This week’s Perspectives interview, with Barbara Willett of Mercy Corps and Nigel Snoad of Microsoft Humanitarian Systems, details Mercy Corps’ use of FeedSync to collect, synthesize, and share information about the management of agricultural development programs in Afghanistan.
In this case, the synchronized data sources are humble Access databases for which Nigel and his team have developed a FeedSync adapter. They’ve also built adapters for spreadsheets and — what should be very interesting to Ken Banks and Joel Selanikio — for SMS messaging systems.
I love stories about pragmatic solutions that find new ways to use existing, simple, and widely-deployed technologies. This is clearly one of those. But it also illuminates an aspect of the LiveMesh platform that hasn’t yet been widely noticed or appreciated. One of its keystones, FeedSync, is an open and general-purpose building block that can be used by anyone, for any purpose.
If the Mercy Corps solution interests you, Nigel says that the toolkit he and his team built for them will be openly released in a few weeks. Watch the FeedSync blog for details.
May 29, 2008
Posted by Jon Udell under
Uncategorized [12] Comments
The word drafting has many meanings but the one I’m interested in here comes from bicycling. When you ride closely behind another rider, you’re drafting. The leader pushes the headwinds out of the way, and the follower doesn’t have to push so hard.
Blogging can work that way too. I thought of this when I reallized that one of the benefits of subscribing to James Fallows is that I’m drafting on his interest in the fledgling air taxi industry. My interest in that topic is more than casual. I’ve interviewed DayJet founder Ed Iacobucci, for example. But James Fallows is way more deeply invested than I am, having written the seminal book on the topic, Free Flight.
I can do a pretty good job of tracking developments on that front by scanning the news, or better yet by subscribing to searches for terms like DayJet and Eclipse 500. But the best way is to draft on a blogger who is authoritative on the topic.
I don’t need to see every news story about DayJet, and pushing them all out of the way in order to focus on the ones that really matter is like pushing a headwind. But James Fallows is already motivated to push that headwind, so I can just draft on him. That way I get just the right air taxi newsfeed, with a dollop of expert analysis on top.
Happily, the analogy breaks down in a couple of ways. A cyclist can only draft on one other cyclist, and it’s a one-way relationship. The follower can’t simultaneously lead. With blogging, I can draft on many peoples’ interests, and many people can draft on mine, and sometimes the leader/follower relationship is reciprocal — I draft on you for topic A, and you draft on me for topic B.
For our purposes here, we can define blogging broadly to include the conventional format, but also microblogging formats like del.icio.us crumbtrails and Twitter tweets. Drafting, in the sense I mean here, can happen in any publish/subscribe medium.
May 28, 2008
Posted by Jon Udell under
Uncategorized [10] Comments
As part of my project in community calendar syndication, I would like to find a way to push an Exchange calendar to a web-accessible ICS file. Although that isn’t a native function of Exchange, I’m sure it can be accomplished by way of the Exchange API, as an add-in or a scheduled server process. For maximum breadth, I guess the relevant API would be Collaboration Data Objects (CDO) rather than the newer Exchange Web Services.
I asked some folks on the Exchange team and none were aware of a component that does this. So I’ve set up an Exchange server in my lab, rolled up my sleeves, and am ready to dive in and see what can be done. But it never hurts to ask. Have I overlooked an existing off-the-shelf solution?
May 28, 2008
Posted by Jon Udell under
Uncategorized [11] Comments
This post is part three of a series in which I’ll summarize what I know about publishing calendars openly on the web, for free, using popular calendar applications including Outlook, Google Calendar, and Apple iCal.
Apple iCal
If you have a .Mac account you can publish your calendar there, but
.Mac isn’t free, and the purpose of this series is to showcase free
calendar publishing options.
The solution here is to find a free service that uses the same
protocol, and the same kind of server, as the .Mac service uses.
The protocol is called WebDAV, and the server is a
special-purpose web server commonly used for calendar publishing.
It’s possible that your ISP already offers a WebDAV server you can
use, for no additional charge. But for many people it would be ideal if there were a free service available for this purpose. One such service is iCal Exchange.
Here’s the signup page:
After completing the form you’ll land here:
For private sharing you can create passwords and use the private URL, but our goal here is public sharing so the public URL — in this case, http://icalx.com/public/judell — is the one you’ll use.
Now switch to iCal, select Calendar, select Publish, and switch the Publish option from the default — .Mac — to Private Server. Paste the public URL you just created into the Base URL, and enter your iCal Exchange credentials in the Login and Password fields.
Now click Publish. Here’s the outcome:
The web address for your calendar is the public web address that iCal Exchange gave you, plus the name you gave your calendar (in this case, Jon), plus the .ics extension. For this example, it adds up to webcal://icalx.com/public/judell/Jon.ics.
This solves half of the problem. Your calendar is now published in a way that enables individuals to subscribe to it. It’s also available for syndication by online services like http://elmcity.info/events.
But the other half of the problem remains unsolved. In parts one and two of this series, we saw that the free calendar hosting options offered by Microsoft and Google provide links to hosted calendar viewers.
There are a variety of other hosted viewers, but as yet I’ve not found one that’s free, and can render any public calendar given a public URL like the one shown here. If such a service does exist, I’m hoping this entry will help me find it.
May 27, 2008
Posted by Jon Udell under
Uncategorized [4] Comments
My guest on Innovators this week is Greg Wilson. We share common interests in collaboration and Python, but neither of those topics was the focus of this conversation. Instead, we discussed Greg’s unique and somewhat curmudgeonly take on high-performance computing. In his view, the HPC industry has focused on achieving bigger and faster computation at the expense of human productivity, verifiable correctness, and reproducibility.
I claim no expertise in that field, but Greg is an expert, so I wondered what he’d think about the approach discussed in one of my recent Perspectives shows, Cluster computing for the classroom. On that show, Kryil Faenov — Microsoft’s general manager for Windows HPC — describes a system that enables professors to define computational models that students can check out, tweak, and then run against large data on a compute cluster.
From a human productivity standpoint Greg likes that approach. But he’d prefer to see more attention paid to verifying the correctness of the models, and to ensuring that code and the data are managed in ways that make experiments reliably reproducible.
Disclosure: While working at Los Alamos National Laboratory back in 2000, Greg commissioned me to write a report on Internet Groupware for Scientific Collaboration.
May 27, 2008
Posted by Jon Udell under
Uncategorized [11] Comments
This post is part two of a series in which I’ll summarize what I know about publishing calendars openly on the web, for free, using popular calendar applications including Outlook, Google Calendar, and Apple iCal.
Google Calendar
You’ll need a Google account. If you use Gmail you already have one. Start the calendar program by clicking the Calendar link at the top of the Gmail page.
To publish your calendar in ICS (aka ICAL) format, open the drop-down menu for your calendar’s name under the My Calendars heading, and select Calendar Settings.
The first tab on the ensuing page is called Calendar Details. If you scroll to the bottom you’ll see two sections containing sets of hyperlinked icons. The sections are labeled Calendar Address and Private Address.
You don’t actually have to make your calendar public in order to share both its ICS (ICAL) and HTML formats. You could use the second set of private links to publish (and otherwise communicate) those formats without exposing the contents of your calendar to the Google search engine. But if the goal is to advertise your calendar as widely as possible, you’ll want to do that. So, visit the second tab on this page, labeled Share this Calendar, and check Make This Calendar Public:
Now your ICS feed is active at an URL that looks like this:
http://www.google.com/calendar/ical/yourname%40gmail.com/
public/basic.ics
To capture your version of this link, right-click the ICAL icon in the Calendar Address section, and use your browser’s link-capture method: Copy Shortcut (IE), Copy Link Location (Firefox), Copy Link (Safari). You can paste the link into a web page that you publish, or into a web form or an email that transmits it to another site to which you want to syndicate your calendar.
Similarly, the web view of your calendar is active at an URL that looks like this:
http://www.google.com/calendar/embed?
src=yourname%40gmail.com&ctz=America/New_York
To capture your version of this link, right-click the HTML icon in the Calendar Address section and do as above. This link leads to a Google-hosted page for viewing the calendar.
If your web hosting circumstances allow you to use an HTML feature called IFRAME, you can instead embed the calendar in one of your own pages. The HTML code to do that is provided in the Embed This Calendar section.
May 23, 2008
Posted by Jon Udell under
Uncategorized [32] Comments
This post is part one of a series in which I’ll summarize what I know about publishing calendars openly on the web, for free, using popular calendar applications including Outlook, Google Calendar, and Apple iCal.
Outlook 2007
With Outlook 2007, you can publish for free to calendars.office.microsoft.com. You’ll need a Live ID account. If you don’t already have one, a Live ID is useful for many other services too. To get one, start at login.live.com and click the “Sign up for an account” link.
To start publishing, right-click the name of your Outlook calendar as it appears under My Calendars in Outlook’s navigation pane, select Publish to Internet, and select Publish to Office Online as shown here:
You’ll land on this screen, where — for an open public calendar — you can just click OK and take the defaults.
Now you’ll be prompted for your Live ID credentials.
Enter the email address and password of your Live ID account. And check “Remember my password” so that Outlook can send calendar updates to the server automatically.
Here’s the confirmation:
Even though you likely won’t want to send individual invitations, click Yes anyway. That’s the easiest way to discover what the web address of your published calendar will be. Here’s the email message:
You don’t need to send it anyone, you just need to capture the calendar’s web address. Which, in this case, is:
webcals://calendars.office.microsoft.com/pubcalstorage/
j447ytlz27542/test_Calendar.ics.
If you publish that link on a web page (more realistically, with a label like Subscribe to calendar), visitors who click the link will be invited to launch one or another calendar program (such as Outlook, or Apple iCal) to view the calendar and subscribe to updates. That same address can be used by online services like http://elmcity.info/events which combine calendars from multiple sources.
The .ics in test_Calendar.ics stands for Internet Calendar Standard. The ICS file is useful for exchanging calendar information among calendar programs that run on personal computers, and among calendar services that live online. But it’s not something people can view directly on the web. For that, you’ll want to use a variant of the address that produces a web page people can see and interact with. Here’s the variant:
http://calendars.office.microsoft.com/en-us/pubcal/viewer.aspx?path=
/pubcalstorage/j447ytlz27542/test_Calendar.ics
To form your version of this link, copy the initial part of the above link — the part that isn’t bold — and then replace the part that is bold with the corresponding part from the invitation email shown above.
If you then publish that link on your website, it will lead visitors to a page like this:
Visitors to that page can view the calendar in several ways. And they can subscribe to the calendar by clicking the Subscribe link.
Earlier versions of Outlook
I’m still researching the options. Comments welcome.
May 22, 2008
Posted by Jon Udell under
Uncategorized [2] Comments
My guest for this week’s Perspectives show is Caroline Arms, a digital preservation pioneer at the Library of Congress. She’s a leading student and promoter of digital formats for long-term preservation.
It was fascinating to hear her take on the interplay between the reality of market forces and the interests of cultural preservation. From the Library’s perspective, an important format is one that is both disclosed (i.e., openly specified) and widely adopted. The Library has few illusions about its ability to influence adoption, but it does participate in standardization efforts such as PDF/A and Office Open XML.
Caroline joined the Library of Congress in 1995 to work on the American Memory project, and she well understands that our memories are not only represented by commercially-published content, but also by personally-created content such as photographs and diaries. When that content is paper-based, it tends to survive benign neglect. But digital content doesn’t survive benign neglect, and the Library is thinking hard about the challenge that presents for the photographs and diaries we’re creating from now on.
Yesterday’s proposal for an association of URL-shortening services was motivated by that same challenge. It’s overwhelming to think about tackling the URL persistence problem in a general way, although there’s good progress being made in particular domains, notably scholarly publishing. But it strikes me that URL-shortening is an area where we could bootstrap a scheme that would provide at least some assurance of continuity, in a way that would be evident to a lot of mainstream users. It wouldn’t solve a major problem, but that’s actually the point. We need to pluck some low-hanging fruit, and start to raise expectations about the persistence of the digital resources we’re all creating.
May 21, 2008
Posted by Jon Udell under
Uncategorized [20] Comments
The creator of a new URL-shortening service, urlborg, recently wrote to me to announce some new features. There are, at this point, quite a few of these URL-shortening services. I’m sure each has differentiating features, but before I explore the differences I’d like to see a new and important kind of commonality.
Each of these services invites you to invest in creating a set of short URLs that point to your own longer URLs. None of them provides any guarantees about the future availability of those short URLs. I’d love to see these services form an association that does make such guarantees.
There can never be a simple solution to the problem of linkrot. We don’t own domain names, we only rent them. As content management systems evolve, so often do the URLs they project onto the web. Even if an association of URL-shortening services guaranteed the continuity of short URLs, the long URLs behind them would remain as fragile as they are today.
Still, it would be an inspiring and forward-looking experiment to try. What if TinyURL, snurl, urlborg, and the others were members of an association that would inherit the URL mappings of any member that ceased to honor them? Given such a guarantee, I’d be much more willing to invest in the creation of URL mappings with any of the members, and to explore the features that differentiate them.
May 19, 2008
Posted by Jon Udell under
Uncategorized [3] Comments
In my writeup on MIT’s Project Simile, and again in my talk at the CUSEC conference, I lauded an approach to collective information management that respects our actual linguistic nature. People don’t normally create vocabularies by committee. Rather, we absorb, imitate, innovate, and negotiate the vocabularies we use. Simile embraces that reality. It encourages people to name resources in ways that make sense to them, within the context of their tribes. Then it provides ways to map out equivalences among the terms used by different tribes.
This same idea of pluralistic naming and equivalence mapping came up in last week’s Perspectives interview with Quentin Clark: Where is WinFS now? The connection was implicit but it’s worth making explicit. Here’s what Quentin said:
QC: Going through the litany of technologies that have come from WinFS, one of them is the notion of what I refer to as semi-structured records. The schema is not necessarily all that well defined at the outset of the application. How does the database handle that? We had built WinFS around a feature called UDTs [user-defined types], which is a column type — a CLR type system type.
We finished that up, and we built a whole spatial datatype on it in SQL Server 2008, it’s all good stuff.
But when we stepped back and looked at the semi-structured data problem in a larger context, beyond the WinFS requirements, we saw the need to extend the top-level SQL type system in that way. Not just UDTs, but to have arbitrary extensibility.
So we did this feature in SQL Server 2008 that we internally refer to as sparse columns. It’s a combination of various things. First, a large number of columns. Right now there’s a 1024 limit on the number of columns in a single SQL table. We’re way widening that out.
That comes of course with the ability to store data that’s very sparsely populated across a large number of columns. In SQL Server 2005 we actually allocate space for every column in every row, whether it’s filled or not.
JU: This is what the semantic web folks are interested in, right? Having attributes scattered through a sparse matrix?
QC: That’s right. And that leads to another thing which we call column groups, which allow you to clump a few of them together and say, that’s a thing, I’m going to put a moniker on that and treat it as an equivalence class in some dimension.
Given my enduring fascination with del.icio.us as a prime example of social tagging services that enable real people to evolve metadata vocabularies in a natural way, that really got my spidey sense tingling.
May 16, 2008
Posted by Jon Udell under
Uncategorized [7] Comments
Last November the New York Times ran an interactive visualization of one of the Republican debates that absolutely wowed me. On this week’s Interviews with Innovators show I spoke with two of its creators, Gabriel Dance and Shan Carter, about that project, and about some of their other work including the stunning Faces of the Dead in Iraq. It’s a great overview of how and why the NYTimes has been raising the level of its game — and therefore of everyone’s game — in the realm of interactive data display.
There’s an odd little Web 2.0 backstory about how we arranged this interview. When I cited the credits for the debate visualizer in my blog post, I had a hunch that my use of those names would appear on the creators’ radar screens. And sure enough, I heard back from Gabriel Dance. When I didn’t find any contact info for him on his website, I went hunting around and eventually found him on Facebook.
We then began an on-again, off-again dialogue that lasted for a couple of months, until we eventually settled on a time for the interview. At one point I tried to steer the discussion away from Facebook and into regular email, but for some reason that didn’t happen, so we wound up doing all the communication in Facebook.
When we finally got together for the interview, Gabriel mentioned that he’d never been involved in such a long Facebook email thread. Me neither. Somehow we got stuck in a loop where each of us thought the other preferred to communicate only in Facebook. I was glad to know that this wasn’t some kind of Gen-Y thing, and that we both thought it was a weird glitch.
The other delightful thing about this interview is the audio quality. Gabriel and Shan called me from the Times’ tape synch facility, so their half of the call was professionally recorded, then I merged their track with my locally recorded track. Nice!
May 15, 2008
Posted by Jon Udell under
Uncategorized 1 Comment
In 2004 I interviewed Quentin Clark, who led the WinFS effort, for an InfoWorld cover story on Longhorn. We had dinner recently, and Quentin made a surprising remark. He said that although WinFS never shipped, many of the underlying technologies already have. I wanted to hear more.
So, on this week’s Perspectives show, Quentin expounds at length on the question: Where is WinFS now? Topics include schemas, the entity data model, filestream and hierarchical namespace support in SQL Server, and synchronization.
In general I’m trying to aim Perspectives at a wider audience. But although you have to be fairly technical to enjoy reading or listening to this interview, I coudn’t resist. It’s a fascinating story, and not one the technology press is ever likely to tell. From that perspective, when the WinFS project was shut down, the whole thing evaporated. But as we know, technologies often wind up being used in ways not originally intended. WinFS is a prime example.
May 14, 2008
Posted by Jon Udell under
Uncategorized [10] Comments
Sean McGrath’s report on coping with RSI reminded me of a couple of things. First, I need to find out whether the chair-mounted split keyboard shown here is still available. It’s been hugely helpful to me over the years, but I’m not sure it can be replaced at this point, and that would suck.
(Update: Uh oh. Discontinued 3 years ago.)
Second, I’ve been meaning to note a connection between computational thinking and health. Sean writes:
RSI is about the most complex problem I have ever tried to debug.
His reference to debugging might seem like a geeky affectation, but I don’t think that it is. When you’re searching for the causes of health problems, including mechanical ones like RSI, it can be fiendishly hard to, as Sean says, “establish repeatable causal connections between events.” Our bodies are complex, layered systems. Problems arise at different levels; the levels interact; any assumption may need to be questioned. But ultimately our bodies are systems, and computational thinkers can be pretty good at hacking and debugging them.
You see it when geeks deal with RSI. And you also see it when they deal with obesity. I known seven or eight technical types who have slimmed dramatically in recent years. We’re talking major weight losses of 75 pounds, or 100, or even more. In each case they describe the process in the language of computational thinking. “I hacked my body.” “I debugged my metabolism.”
Sean is right to offer this disclaimer:
I am a computer geek. Not a medical practitioner. If you have symptoms, go see a doctor, ok?
And yet, in my experience with RSI and with other kinds of mechanically-induced soft tissue injuries, doctors can’t help much if at all. What’s required is realtime analysis and debugging of a complex system, on a continuous and perpetual basis. The person best equipped to do that debugging is you, the owner, operator, and inhabitant of the system.
May 13, 2008
Posted by Jon Udell under
Uncategorized [6] Comments
It was a great pleasure to speak with Lucas Gonze for this week’s Innovators interview. Back in 2004, in Blogs + playlists = collaborative listening, I first wrote about webjay.org, the playlist-sharing service that Lucas founded and later sold to Yahoo. Later that year, I made an audio documentary about the people, the services, and ideas that I saw coming together to create a new kind of cultural curation. The factors in play included abundant talent, Creative Commons licensing, and linkable hypermedia.
That vision hasn’t materialized yet. In our conversation, Lucas and I discuss why it hasn’t — and how it might still.
In the realm of music, I think that Lucas’ project to reanimate 19th-century songs provides one of the missing pieces of the puzzle. Copyright restrictions are what sent him to the archives to learn, perform, record, and distribute these old tunes. But as he’s explored them, he’s realized that parlour music of that era was social and participatory in ways that are far less common today.
Lucas once wrote about how he was happy with a recording he’d made of a piece that he played with “only had a few mistakes.” The other day he wrote:
Imagine that we lived in a world where all photography was the kind you see in magazines. In this world all photos are taken by professionals and all the people who got their pictures taken are models at the peak of their career. If you had your picture taken normally, you’d think you were hideously ugly. That is the musical world we grew up in, and it’s bogus. Things don’t have to be that way.
In an era of cognitive surplus, as the pendulum swings back from consumption to production of culture, that’s a good thing to remember.
May 8, 2008
Posted by Jon Udell under
Uncategorized [18] Comments
Something about the title of this week’s Perspectives interview, OpenSearch federation with Search Server 2008, has been nagging me ever since I wrote it. In the interview, Richard Riley and Keller Smith describe how the new Microsoft search server can extend its reach by sending queries to other search services that can return results as OpenSearch-compliant RSS or Atom feeds.
We call this activity federation, but the enabling technology is syndication. So is the group of participating servers a federation, or is it a syndicate?
Some definitions of federation, from 1 dictionary.com and 2 Merriam-Webster:
1 a federated body formed by a number of nations, states, societies, unions, etc., each retaining control of its own internal affairs.
2 an encompassing political or societal entity formed by uniting smaller or more localized entities: as a: a federal government b: a union of organizations
That seems too formal, too heavyweight, for an OpenSearch-mediated search scenario. When you modify a search service to return results in the OpenSearch format, you’re not necessarily joining any kind of union. You’re just making it easier for other entities to latch onto your search results.
OpenSearch was announced on March 16, 2005, at the Web 2.0 conference. That same day I adapted my version of the InfoWorld search service to use it. There was nothing special about what I did, which is why it only took a few minutes. I just added a variant of the query URL that returned results as RSS, with a few minor extensions to comply with OpenSearch.
Then I registered my service with Amazon’s A9, searched A9 for “Jean Paoli”, and saw the combined results shown here.
This arguably was a federation, because you had to join the club in order to have results from your service show up in A9. But nothing about OpenSearch required things to work that way. Other services could consume my search feeds without requiring me to register with them, or permit them.
What’s more, any RSS reader could consume those feeds. Although I’d done the OpenSearch hack to showcase integration with A9, it turned out that I’d solved another problem without even intending to. It was now also possible for individuals to subscribe to InfoWorld queries.
OpenSearch can involve federation, but more fundamentally it’s about syndication. So, do the participating entities form a syndicate?
1 a: a group of persons or concerns who combine to carry out a particular transaction or project b: cartel c: a loose association of racketeers in control of organized crime
2 a group of individuals or organizations combined or making a joint effort to undertake some specific duty or carry out specific transactions or negotiations
That doesn’t seem right either. We can get closer by focusing on the definitions that emphasize simultaneous publication:
1 a business concern that sells materials for publication in a number of newspapers or periodicals simultaneously
2 to publish simultaneously, or supply for simultaneous publication, in a number of newspapers or other periodicals in different places: Her column is syndicated in 120 papers
But these definitions still involve more business coordination than OpenSearch, or feed syndication in general, require. If I use OpenSearch to publish a search service within the enterprise, I don’t need to make a formal agreement with the Search Server administrator in order to enable that server to include my search results. I just need to publish my results as an RSS feed, and tell that person I’ve done so. That same RSS feed is available to users who may wish to subscribe to searches performed directly on my service.
It’s the same on the open web. When you adopt a syndication-oriented architecture, small pieces can be loosely joined, or they can be more tightly coupled. But the underlying publish/subscribe mechanism doesn’t determine that choice.
Chewing on these definitions is more than a pedantic exercise for me. In my local community, I’m trying to show how a particular use of publish/subscribe technology — namely, calendar syndication — can solve an important problem for people, organizations, and the community as a whole.
Federation would clearly be the wrong word for the network of calendars that I’m trying to bring into existence. I’ve been using the word syndication instead. But now I suspect that’s the wrong word too. I want to convey that we can create small pieces, that they can be loosely joined, and that important network effects will emerge. I don’t yet know what word or phrase will make that cluster of concepts light up in people’s heads.
May 5, 2008
Posted by Jon Udell under
Uncategorized [9] Comments
In response to a popular recent item — “We posted weekly.pdf to the website. Isn’t that good enough?” — Sarah Allen echoes my favorite Sergey Brin quote. Sergey said: “I’d rather make progress by having computers understand what humans write, than by forcing humans to write in ways computers can understand.”
Sarah, citing weblog software as an example of software that enables people to write naturally, goes on to say:
Likewise, it is natural to record calendar information overlaid on a timeline with day, week, and month views that mimic traditional paper visualizations of time. This enables the software to generate structured data without people needing to think about it.
I mostly agree with her about blog software. And I would have been inclined to agree with her about calendar software too, until I started looking seriously into how people do — and often don’t — use calendar software.
Let’s look at a fragment of a softball schedule which, significantly, has been written as an Excel file:
| Fri. Apr. 25 |
6:15 |
Whitney Brothers |
Greenwald Realty |
|
7:45 |
Servpro |
Athen’s Pizza |
| Sat. Apr. 26 |
9:00 |
WR Painting |
Peerless Insurance |
|
|
Notice what’s missing? There’s no AM/PM, because everybody is expected to know that 6:15AM would be too early for a Friday game while 9:00PM would be too late for a Saturday game.
Yes, it’s natural to view calendar information in ways that mimic traditional presentations. But it’s unnatural to write it using calendar software that constantly nags you to specify nitpicky details like AM and PM. People understand what’s a reasonable time for a Friday or Saturday game. Why can’t software figure that out?
I guess that’s why another recent item on parsing human-written date and time information struck a chord with readers. Until we create (and widely deploy) naturalistic interfaces, people are going to avoid the Procrustean bed that is conventional calendar data entry.
May 5, 2008
On this week’s Interviews with Innovators I spoke with Janis Dickinson, director of citizen science at the Cornell Ornithology Lab. We talked about several of the lab’s projects that involve collection and analysis of volunteer observations about birds and bird habitats.
Courtesy of the eBird project, for example, here is a view of first sightings of common bird species in New Hampshire. At first glance it might be tempting to see the preponderance of dates in the current decade as an effect of global warming. But to support that interpretation, you’d have to answer a bunch of questions about the evolution of record-keeping over the period, and the distribution, reliability, and bias of volunteer observers.
Extracting signal from noise is, of course, one of the classic bread-and-butter activities of information science. What’s fascinating here is the Web 2.0 angle. Birdwatchers are famously passionate data collectors who develop reputations among their peers. When they contribute their data to eBird — and thence to the Avian Knowledge Network — those reputations can begin to be measured, and used to tune the analysis of a large body of contributed data.
For example, the all-time latest reported sighting of the Nelson’s Sharp-tailed Sparrow in New Hampshire was on Nov 24 2007, by Michael Harvey. Is that unusually late? And if so, is it credible? To answer these questions, Cornell’s data crunchers can compare what was and wasn’t reported in the region around that time, by observers whose reputations are one kind of signal that emerges from noisy data.
May 2, 2008
Posted by Jon Udell under
Uncategorized [10] Comments
Lately I’m obsessed with figuring out how to harness the cognitive surplus and put it to work doing better social information management.
The other night I attended a kick-off meeting for a group interested in advancing the cause of local food production in our region. Inevitably the discussion turned to questions that require data to answer. Who are the local producers? Where are they? What do they produce?
In the ensuing discussion, various sources of data emerged. There’s a USDA website, a state government website, a special-interest website, this or that blog. Two things were immediately clear to everyone. First, there would be no effective way to collate these existing sources. Second, most of the needed data wouldn’t be there anyway.
I’d like to be able to recommend the sort of loosely-coupled collaborative list-making method that works so effectively for me. But here’s why I can’t. The method presumes that all the things you’d want to collaboratively curate are already represented by URLs.
In the real world, some are and some aren’t. Consider two examples from this list:
Name: Darby Brook Farm
Day/Time: 8:00 AM – 5:00 PM
Season: June 1 – October 1
Address: 347 Hill Road
What you’ll find: Vegetables, raspberries, apples.
More Info: 603.835.6624
Name: Stonewall Farm
Day/Time: Hours vary
Season: June – October
Address: 242 Chesterfield Road
What you’ll find: Garden fresh produce through the Community Supported Agriculture (CSA) program, call for options
More Info: 603.357.7278, bsaunders@stonewallfarm.org, www.stonewallfarm.org
Because Stonewall Farm has a web presence, we can do all kinds of useful things with its URL. We can tag various bits of metadata onto it (location, products), we can derives views that include that information, we can syndicate those views.
Because Darby Brook Farm doesn’t have an URL, we can’t do those things.
Of course Darby Brook Farm does have an implicit URL-addressable identity at Lighten Up NH. That identity is the record in Lighten Up NH’s database that’s currently being published into a web page by its ColdFusion server.
If that record were directly URL-addressable, the implicit identity would be explicit. Using the record’s URL as a temporary placeholder, we could bootstrap Darby Brook Farm into a collaborative list-making regime based on URLs, tags, and syndication.
Later, when Darby Brook Farm does establish a real web presence, we can unhook its cloud of annotations from the placeholder URL and attach it to the official one.
This scenario highlights a subtle but powerful benefit of data-publishing technologies like Astoria. When you aggressively expose record-level URLs, you can enable the same methods that will work for Stonewall Farm to also work for Darby Brook Farm.