Update: For a simpler formulation of the ideas in this essay, see Doug Belshaw’s Working openly on the web: a manifesto.
Back in 2000, the patterns, principles, and best practices for building web information systems were mostly anecdotal and folkloric. Roy Fielding’s dissertation on the web’s deep architecture provided a formal definition that we’ve been digesting ever since. In his introduction he wrote that the web is “an Internet-scale distributed hypermedia system” that aims to “interconnect information networks across organizational boundaries.” His thesis helped us recognize and apply such principles as universal naming, linking, loose coupling, and disciplined resource design. These are not only engineering concerns. Nowadays they matter to everyone. Why? Because the web is a hybrid information system co-created by people and machines. Sometimes computers publish our data for us, and sometimes we publish it directly. Sometimes machines subscribe to what machines and people publish, sometimes people do.
Given the web’s hybrid nature, how to can we teach people to make best use of this distributed hypermedia system? That’s what I’ve been trying to do, in one way or another, for many years. It’s been a challenge to label and describe the principles I want people to learn and apply. I’ve used the terms computational thinking, Fourth R principles, and most recently Mark Surman’s evocative thinking like the web.
Back in October, at the Traction Software users’ conference, I led a discussion on the theme of observable work in which we brainstormed a list of some principles that people apply when they work well together online. It’s the same list that emerges when I talk about computational thinking, or Fourth R principles, or thinking like the web. Here’s an edited version of the list we put up on the easel that day:
Be the authoritative source for your own data
Pass by reference not by value
Know the difference between structured and unstructured data
Create and adopt disciplined naming conventions
Push your data to the widest appropriate scope
Participate in pub/sub networks as both a publisher and a subscriber
Reuse components and services
1. Be the authoritative source for your own data
In the elmcity context, that means regarding your own website, blog, or online calendar as the authoritative source. More broadly, it means publishing facts about yourself, or your organization, to a place on the web that you control, and that is bound in some way to your identity.
To a large and growing extent, your public identity is what the web knows about your ideas, activities, and relationships. When that knowledge isn’t private, your interests are best served by publishing it to online spaces that you control and use for the purpose.
Mastering your own search index, Hosted lifebits
2. Pass by reference rather than by value
In the case of calendar events, you’re passing by value when you send copies of your data to event sites in email, or when you log into an events site and recopy data that you’ve already written down for yourself and published on your own site.
You’re passing by reference when you publish the URL of your calendar feed and invite people and services to subscribe to your feed at that URL.
Other examples include sending somebody a link to an article instead of a copy of the article, or uploading a file to DropBox and sharing the URL.
Nobody else cares about your data as much as you do. If other people and other systems source your data from a canonical URL that you advertise and control, then they will always get data that’s as timely and accurate as you care to make it.
Also, when you pass by reference you’re enabling reuse (see 7 below). The resources you publish can be recombined, by you and by others, with other resources published by you and by others.
Finally, a canonical URL helps you measure how the web reacts to your data. If the URL is cited elsewhere you can discover those citations, and you can evaluate the context that surrounds them.
The principle of indirection, Hyperlinks matter
3. Know the difference between unstructured and structured data
When you create an events page on your website, and the calendar on that page is an HTML file or a PDF file, you’re posting unstructured data. This is information that people can read and print, and it’s fine for that purpose. But it’s not data that networked computers can process.
When you publish an iCalendar feed in addition to your HTML- or PDF-based calendar, you’re publishing data that machines can work with.
Perhaps the most familiar example is your blog, if you have one. Your blog publishing software creates an HTML page for people to read. But at the same time it creates an RSS or Atom feed that enables feedreaders, or blog aggregation services, to automatically collect your entries and merge them with entries from other blogs.
When you publish an iCalendar feed in addition to your HTML- or PDF-based calendar, you’re publishing data that machines can work with.
The web is a human/machine hybrid. If you contribute data in formats useful only to people, you sacrifice the network effects that the machines can promote. If you also contribute in formats the machines understand, they can share your stuff amongst themselves, convey it to more people than you can reach through word-of-mouth human networks, and enable hybrid human/machine intelligence to work with it.
The laws of information chemistry, Developing intuitions about data
4. Create and adopt disciplined naming conventions
When people publish calendars into elmcity hubs, they can assign unique and meaningful URLs and/or tags to each event they publish. And they can collaborate with curators of hubs to use tag vocabularies that define virtual collections of events.
The same strategies work in all web contexts. Most familiar is the first order of business at every conference attended by web thinkers: “The tag for this conference is ______.” When people agree to use common names in shared data spaces, effects like aggregation, routing, and targeted search require no special software.
The web’s supply of unique names (e.g., URLs, tags) is infinite. The namespace that you can control, by choosing URLs and tags for the things you post, is smaller but still infinite. Web thinkers use thoughtful, rigorous naming conventions to manage their own personal information and, at the same time, to enable network effects in shared data spaces.
Heds, deks, and ledes, The power of informal contracts, Permalinks and hashtags for city council agenda items, Scribbling in the margins of iCalendar
5. Push your data to the widest appropriate scope
When you speak in electronic spaces you can address audiences at varying scopes. An email message addresses one or several people; a blog post on a company intranet can address the whole company; a blog post on the public web can address the whole world. Web thinkers know that keystrokes invested to capture and transmit knowledge will pay the highest dividends when routed to the widest appropriate scope.
The elmcity example: a public calendar of events can be managed in what is notionally a personal calendar application, say, Google Calendar or Outlook, but one that can post data to a public URL.
For bloggers, this principle governs the choice to explain what you think, learn, and do on your public blog (when appropriate) rather than in private communication.
Unless confidentiality precludes the choice, web thinkers prefer shared data spaces to private ones because they enable directed or serendipitous discovery and ad-hoc collaboration.
Too busy to blog? Count your keystrokes
6. Participate in pub/sub networks as both a publisher and a subscriber
Our everyday calendar programs are, in blog parlance, both feed publishers and feed readers. Individuals and organizations can publish their own feeds to the web of calendar data while at the same time subscribing to others’ feeds. On a larger scale, an elmcity hub subscribes to a set of feeds, and in turn publishes a feed to which other individuals (or hubs) can subscribe.
The blog ecosystem is the best example of pub/sub syndication among heterogeneous endpoints through intermediary services. Similar effects can happen in social media, and they happen in ways that people find easier to understand, but they happen within silos: Facebook, Twitter. Web thinkers know that standard protocols and formats enable syndication that crosses silos and supports the most open kinds of collaboration.
Personal data stores and pub/sub networks
7. Reuse components and services
In the elmcity context, calendar programs are used in several complementary ways. They combine personal information management (e.g., keeping track of your own organization’s public calendar) with public information management (e.g., publishing the calendar).
In another sense they serve the needs of humans who read those calendars on the web while also supporting mechanical services (like elmcity) that subscribe to and syndicate the calendars.
In general, a reusable web resource is:
- Effectively named
- Properly structured
- Densely interconnected (linked) both within and beyond itself
- Appropriately scoped
The web’s “small pieces loosely joined” architecture echoes what in another era we called the Unix philosophy. Web thinkers design reusable parts, and also reuse such parts where possible, because they know that the web both embodies and rewards this strategy.
How will the elmcity service scale? Like the web!, How to manage private and public calendars together
87 thoughts on “Seven ways to think like the web”
Make sure your data on yourself really is authoritative. The first time someone decides it’s not may be the last. Especially if you’re important enough to matter to Wikipedia.
If you pass by reference, make sure you keep the reference live, and accurate. Don’t change it without a good reason, because the first time someone decides your data’s losing value they’ll start keeping a mirror. And that might well become authoritative.
Make sure your stuff is archived in archive.org to keep you honest. It might seem painful, but it’s better to have people accessing out of data information about yourself than information actively managed by someone who doesn’t trust you to do it right any more.
As I violate these principles by commenting here… Jon, what do you think of posterous and similar services as enablers of these principles? For example, posterous groups for private group publishing (widest possible in some cases) and regular posts for the widest public group…?
I think we need tools that make it easy for people to be the authoritative sources for their own content. We can only pass by reference if we’ve published somewhere that DNS can find.
Curious on your take on the best tools for citizens (not limited to calendars).
@pete: As I violate these principles by commenting here…
Exactly. You’re quite right to point that out. In many cases, things aren’t (yet) arranged in a way that makes it easy and natural to adopt the first principle.
In the case of blog comments, I’d like all of mine to live in one place that I control. And I’d like the operation of commenting to be (without overtly seeming to be) the passing of a reference to an entry in my cloud, rather than the passing of a value into the site’s cloud.
Given how a lot of things work today this is clearly aspirational, but I think it’s a really important aspiration.
In the case of private Posterous groups, and similar features based on the sharing of a secret URL, you’re still putting your stuff into their space not yours. Now we’re getting /really/ aspirational but I’d rather syndicate from my cloud to Posterous or elsewhere and use a common mechanism for identity and access control on my end. That implies that identities coming to my cloud from Posterous or elsewhere match identities known to the identity and access control service that protects my cloud.
All very blue sky to be sure, but I can clearly envision it, I’d like to get there, and I think these kinds of arrangements would open the door to paid services that providers would love to offer and users would be happy to buy.
My most memorable mantra for this year so far: If you’re not paying for the product, you are the product. I want to pay reasonable sums to keep my stuff in the cloud coherent, and to be able to manage it sanely.
Jon, what are your thoughts on the scalability issues? That is, responsiveness for example, when there are 1000 comment references to a blog post)? I guess that’s just a matter of AJAX (or future idea) and a mindset change that will have to be accepted — clicking “Show next 20 comments” could take ~5-10 seconds? This idea has some serious TCP overhead, no? Though I suppose one could bundle related (comment, etc) queries to common Lifebits Providers over a single TCP transaction.
I’m fascinated by all of this.
Thoughts on the data format aspects of all of this? Lots of microformats? For example, how would we choose to represent a “comment”?
@jeff: We shouldn’t expect (or accept) that Show Next 20 Comment will take 10 seconds. The web’s design includes provisions for caching at many levels, and in many places, and the emerging cloud infrastructures embrace that design.
Though I suppose one could bundle related (comment, etc) queries to common Lifebits Providers over a single TCP transaction.
There’s that too.
Whenever I’m tempted to make an assumption about what is or isn’t possible on the web, I reread Roy Fielding’s wonderful essay, Paper Tigers and Hidden Dragons: http://roy.gbiv.com/untangled/2008/paper-tigers-and-hidden-dragons
Roy’s point is that given a hypermedia system made of linked resources, there can be enormous flexibility and creativity in the design of the resources. Which goes exactly to your point: how /would/ we choose to represent a comment, or a stream of comments, in order to satisfy such requirements as authoritative provenance and efficient delivery? It’s a very useful thing to be thinking about.
Jon, what’s your take on The Locker Project, TeleHash, and Singly?
Interesting post. On the topic of authority. Is anyone really an authority? We konw what we know and if we are intelligent we are always striving to learn and grow. Does that in and of itself bar us from being authoritative in any subject matter? Even the brightest scientists can and do postulate incorretly. I think there might be to much emphasis on ‘being right’ and not enough on the constant education of ones self. Thanks for the post it makes one think.
Among our principal targets would be to preserve material independent from style
or speech. ‘s guide
Jon — The videos of the TUG 2010 Observable Workshop are now posted at
(in two parts). Cheers, Greg