Category Archives: Uncategorized

Crowdsourcing local data the right way

In How Google Map Hackers Can Destroy a Business at Will, Wired’s Kevin Poulsen sympathizes with local businesses trying to represent themselves online.

Maps are dotted with thousands of spam business listings for nonexistent locksmiths and plumbers. Legitimate businesses sometimes see their listings hijacked by competitors or cloned into a duplicate with a different phone number or website.

These attacks happen because Google Maps is, at its heart, a massive crowdsourcing project, a shared conception of the world that skilled practitioners can bend and reshape in small ways using tools like Google’s Mapmaker or Google Places for Business.

No, these attacks happen because Google Maps isn’t based on the right kind of crowdsourcing. The Wired story continues:

Google seeds its business listings from generally reliable commercial mailing list databases, including infoUSA and Axciom.

Let’s back up a step. Where does infoUSA get its data? From sources like new business filings and company websites, and follow-up calls to verify the data.

Those calls shouldn’t be necessary. The source of truth should be an individual business owner who signs a state registration form and publishes a website. Instead, intermediaries govern what the web knows about that business. If that data were crowdsourced in the right way, it would flow directly from the business owner.

Here’s how that could happen. A state’s process for business registration asks for a URL. If data available at that URL conforms to an agreed-upon format, it populates the registration form. If the registration is approved, the state endorses that URL as the source of truth for basic facts about the business.

Of course the business might provide more information than the state can verify. That’s OK. The state’s website might only record and assure the name and address of the business, plus the URL at which additional facts — not verifiable by the state — are provided by the business owner. Those facts would include the hours of operation. The business owner is the source of truth for those facts. Changes made at the source ripple through the system.

The problem isn’t that information about local businesses is crowdsourced. We’re just doing it wrong.

Things in the era of dematerialization

As we clear out the house in order to move west, we’re processing a vast accumulation of things. This morning I hauled another dozen boxes of books from the attic, nearly all of which we’ll donate to the library. Why did I haul them up there in the first place? We brought them from our previous house, fourteen years ago. I could have spared myself a bunch of trips up and down the stairs by taking them directly to the library back then. But in 2000 we were only in the dawn of the era of dematerialization. You couldn’t count on being able to find a book online, search inside it, have a used copy shipped to you in a couple of days for a couple of dollars.

Now I am both shocked and liberated to realize how few things matter to me. I joke that all I really need is my laptop, my bicycle, and my guitar, but in truth there isn’t much more. For Luann, though, it’s very different. Her cabinets of wonders are essential to who she is and what she does. So they will have to be a logistical priority.

In the age of dematerialization, some things will matter more than ever. Things that aren’t data. Things that are unique. Things made by hand. Things that were touched by other people, in other places, at other times. RadioLab’s podcast about things is a beautiful collection of stories that will help you think about what matters and why, or what doesn’t and why not.

Trails near me

I stayed this week at the Embassy Suites in Bellevue, Washington [1, 2]. Normally when visiting Microsoft I’m closer to campus, but the usual places were booked so I landed here. I don’t recommend the place, by the way, and not because of the door fiasco, that could have happened in any modern hotel. It’s the Hyatt-esque atrium filled with fake boulders and plastic plants that creeps me out. Also the location near the junction of 156th and route 90. Places like this are made for cars, and I want to be able hike and run away from traffic.

A web search turned up no evidence of running trails nearby. So I went down to the gym only to find people waiting in line for the treadmills. Really? It’s depressing enough to run on a treadmill, I’m not going to queue for the privilege. So I headed out, figuring that a run along busy streets is better than no run at all.

Not far from the hotel, on 160th, I found myself in a Boeing industrial park alongside a line of arriving cars. As I jogged past the guard booth a guy leaped out at me and asked for my badge. “I’m just out for a run,” I said. “This is private property,” he said, and pointed to a nearby field. “But I think there’s a trail over there.” I crossed the field and entered part of the Bellevue trail network. The section I ran was paved with gravel, with signs identifying landmarks, destinations, and distances. I ran for 45 minutes, exited into the parking lot of a Subaru dealership near my hotel, and congratulated myself on a nice discovery.

Later I went back to the web to learn more about the trails I’d run. And found nothing that would have enabled a person waiting in line for a treadmill at the Embassy Suites to know that, within a stone’s throw, there were several points of access to a magnificent trail system. The City of Bellevue lists trails alphabetically, but the name of the nearby Robinswood Park Trail had meant nothing to me until I found it myself. Nor did I find anything at the various trails and exercise sites that I checked — laboriously, one by one, because each is its own silo.

I knew exactly what I wanted: running trails near me. That the web didn’t help me find them is, admittedly, a first world problem. What’s more, I like exploring new places on foot and discovering things for myself. But still, the web ought to have enabled that discovery. Why didn’t it, and how could it?

The trails I found have, of course, been walked and hiked and cycled countless times by people who carry devices in their pockets that can record and publish GPS breadcrumbs. Some will have actually done that, but usually by way of an app, like Runtastic, that pumps the data into a siloed social network. You can get the data back and publish it yourself, but that’s not the path of least resistance. And where would you publish to?

Here’s a Thali thought experiment. I tell my phone that I want to capture GPS breadcrumbs whenever it detects that I’m moving at a walking or running pace along a path that doesn’t correspond to a mapped road and isn’t a path it’s seen before. The data lands in my phone’s local Thali database. When I’m done, the data just sits there. If there was nothing notable about this new excursion my retention policy deletes the data after a couple of days.

But maybe I want to contribute it to the commons, so that somebody else stuck waiting in line for a treadmill can know about it. In that case I tell my phone to share the data. Which doesn’t mean publish it to this or that social network silo. As Gary McGraw once memorably said: “I’m already a member of a social network. It’s called the Internet.”

Instead I publish the data to my personal cloud, using coordinates, tags, and a description so that search engines will index it, and aggregators will include it in their heat maps of active trails. Or maybe, because I don’t want my identity bound to those trails, I publish to an anonymizing service. Either way, I might also share with friends. I can do that via my personal cloud, of course, but with Thali I can also sync with them directly.

For now I have no interest in joining sites like Runtastic. Running for me is quiet meditation, I don’t want to be cheered on by virtual onlookers, or track my times and distances, or earn badges. But maybe I’ll change my mind someday. In that case I might join Runtastic and sync my data into it. Later I might switch to another service and sync there. The point is that it’s never not my data. I never have to download it from one place in order to upload it to another. The trails data lives primarily on my phone. Anyone else who interacts with it gets it from me, where “me” means the mesh of devices and personal cloud services that my phone syncs with. I can share it with my real friends without forcing them to meet me in a social network silo. And I can share it with the real social network that we call the web.

Turning it off and on again


In The Internet of Things That Used To Work Better I whined about rebooting my stove. This morning I was stuck outside a hotel room waiting for “engineering” to come and reboot the door. It eventually required a pair of technicians, Luis and Kumar, who jiggled and then replaced batteries (yes, it’s a battery-operated door), then attached two different diagnostic consoles. When they got it working I asked what the problem had been. They had no idea. “Hello, IT, have you tried turning it off and on again?” is the tagline for a civilization whose front-line technicians have no theory of operation. Will the door open when I return tonight? I have no idea. But at least now I know how to turn it off and on again.

How Thali could make the Smallest Federated Wiki even smaller

Thanks to my friend Mike Caulfield, an educational technologist who’s been digging into Ward Cunningham’s Smallest Federated Wiki, I’ve now got a much clearer idea of how SFW and Thali could play together and why they should.

Mike’s recent series on SFW is the best review and analysis of Ward’s newest creation that I’ve seen:

http://hapgood.us/2014/06/12/student-curation-in-smallest-federated-wiki/

http://hapgood.us/2014/06/11/letting-lots-of-people-host-your-stuff-in-their-collections-is-a-good-survival-strategy/

http://hapgood.us/2014/06/10/the-answer-to-project-based-work-in-moocs-is-federation/

I had dipped a toe into the SFW water but there’s a learning curve and Mike climbed it before I could. Today he jumpstarted me by setting me up with a node of an SFW federation he’s hosting on AWS. Here I am participating in a wiki federation with some friends in the ed-tech tribe. We are able to do this because Mike provisioned SFW instances for each of us.

What’s the Thali connection? Well, in the first few seconds of http://screencast.com/t/fRlahVd0EK5 you see Mike provisioning a node in a federation he’s hosting on AWS. That’s the minimum bar for SFW: you need an instance of the server. Most people can’t or won’t leap over that bar.

But the server’s a pretty small piece of the pie. Most of SFW runs in the browser. There’s a lot there, and it’s well-architected for growth.

A server implementation for Thali would enable lots more people to create and participate in Wiki federations, by running SFW on their own devices and syncing opportunistically with peers on friends’ devices. Since the existing Sinatra-based SFW is CouchDB-aware, Thali — based on Couchbase Lite — should provide a comfortable home.

Why would people want to use SFW? Mike’s posts and screencasts point to a world in which GitHub-like collaboration breaks out of the geek ghetto and becomes a natural way for all kinds of teachers and learners to collaborate.

Ward points to that possibility and others in a series of SFW screencasts at http://vimeo.com/channels/wiki. I’d seen a few, tonight I went back and watched the rest. Some highlights:

On forking and comparing

An inline calculator plugin (in 25 lines of CoffeeScript!)

Visualization of in-page data

These demos really capture the idea of the universal canvas (http://www.infoworld.com/d/developer-world/we-need-universal-canvas-doesnt-suck-130) that I’ve dreamed of for a long time.

My 2006 InfoWorld article said, by the way,

Here’s the best definition of the universal canvas: ‘Most people would prefer a single, unified environment that adapts to whichever environment they are working in, moves transparently between local and remote services and applications, and is largely device-independent — a kind of universal canvas for the Internet Age.’

You might expect to find that definition in a Google white paper from 2006. Ironically, it comes from a Microsoft white paper from 2000, announcing a “Next Generation Internet” initiative called .NET.

You never know how things will turn out.

Mapping the decentralization movement

“Right now we’re experiencing a moment of maximum centralization,” says Scott Rosenberg in his introduction to a new effort that combines “a tech-industry beat I will cover; a cultural investigation and conversation I will undertake; and a personal-publishing venture I am kicking off now.”

We’ve been here before. The Internet was a peer-to-peer network until it wasn’t. Likewise the Web. Some have forgotten, and most never knew, that Tim Berners-Lee’s original browser could write and publish as well as read pages. By the early 2000s the pendulum had swung so far toward centralization that, as it began to swing back, we called the “two-way web” one of the pillars of “Web 2.0.” Personal publishing flourished for a while, then the pendulum swung again toward centralized social media. If Scott’s right, and I hope he is, the pendulum is about to swing back toward a more distributed Web.

Thali is one project moving in that direction, there are many others. When we compared notes with Jeremie Miller the other day, he pointed us to a long list of fellow travelers. Another observer, Doc Searls, periodically issues updates with pointers to related (and some of the same) efforts.

It behooves all of us to sort out how these efforts are similar or different along various axes. Some are peer-to-peer, others not. Some bind identities to public keys, others don’t. Some skew toward messaging and social networking, others toward bulk data exchange or publishing. Some consider themselves personal data stores, others don’t. Many are “friend-to-friend” networks with peer-to-peer trust models, some aren’t. There are platforms, protocols, overlay networks, and apps in the mix.

In order to reason about these axes of comparison I loaded up a bunch of links into Pinboard, made a common tag (redecentralize) to unite all the links related to this exploration, and began tagging. Here’s what I’ve got at https://pinboard.in/u:judell/t:redecentralize/ so far:

What else belongs on this list? What are core attributes? What are the best axes along which to compare? The tag cloud is suggestive but it’s only my lens on the list, I’d love to see other lenses applied to the same (evolving) list.

Note that Pinboard (as with del.icio.us long ago) such lenses can be applied — and in a decentralized way! You could import my redecentralize feed into your own Pinboard account and tag the links according to your world view. We could compare one anothers’ views, and see a combined view at https://pinboard.in/t:redecentralize/. While that’s a very cool way to do collaborative mind-mapping, it’s not likely to happen in this case. But comments here (or elsewhere) will be welcome.