Today’s hangout with Gardner Campbell and Howard Rheingold, part of the Connected Courses project, dovetailed nicely with a post I’ve been meaning to write. Our discussion topic was web literacy. One of the literacies that Howard has been promoting is critical consumption of information or, as he more effectively says, “crap detection.” His mini-course on the subject links to a page entitled The CRAP Test which offers this checklist:
* Currency –
o How recent is the information?
o How recently has the website been updated?
o Is it current enough for your topic?
* Reliability –
o What kind of information is included in the resource?
o Is content of the resource primarily opinion? Is is balanced?
o Does the creator provide references or sources for data or quotations?
* Authority –
o Who is the creator or author?
o What are the credentials?
o Who is the published or sponsor?
o Are they reputable?
o What is the publisher’s interest (if any) in this information?
o Are there advertisements on the website?
* Purpose/Point of View –
o Is this fact or opinion?
o Is it biased?
o Is the creator/author trying to sell you something?
The first criterion, Currency, seems more straightforward than the others. But it isn’t. Web servers often don’t know when the pages they serve were created or last edited. The pages themselves may carry that information, but not in any standard way that search engines can reliably use.
In an earlier web era there was a strong correspondence between files on your computer and pages served up on the web. In some cases that remains true. My home page, for example, is just a hand-edited HTML file. When you fetch the page into your browser, the server transmits the following information in HTTP headers that you don’t see:
HTTP/1.1 200 OK Date: Thu, 23 Oct 2014 20:54:46 GMT Server: Apache Last-Modified: Wed, 06 Aug 2014 19:28:27 GMT
That page was served today but last edited on August 6th.
Nowadays, though, for many good reasons, most pages aren’t hand-edited HTML. Most are served up by systems that assemble pages dynamically from many parts. Such systems may or may not transmit a Last-Modified header. If they do they usually report when the page was assembled, which is about the same time you read it.
Search engines can, of course, know when new pages appear on the web. And there are ways to tap into that knowledge. But such methods are arcane and unreliable. We take it for granted that we can list files in folders on our computers by date. Reviewing web search results doesn’t work that way, so it’s arduous to apply the first criterion of C.R.A.P. detection. If you’re lucky the URL will encode a publication date, as is often true for blogs. In such cases you can gauge freshness without loading the page. Otherwise you’ll need to click the link and look around for cues. Some web publishing systems report when items were published and/or edited, many don’t.
Social media tend to mask this problem because they encourage us to operate in what Mike Caulfield calls StreamMode:
StreamMode is the approach to organizing your thoughts as a history, integrated primarily as a sequence of events. You know that you are in StreamMode if you never return to edit the things you are posting on the web.
He contrasts StreamMode with StateMode:
In StateMode we want a body of work at any given moment to be seen as an integrated whole, the best pass at our current thinking. It’s not a journal trail of how we got here, it’s a description of where we are now.
…
The ultimate expression of StateMode is the wiki.
But not only the wiki. Any website whose organizing principle is not reverse chronology is operating in StateMode. If you’re publishing that kind of site, how can you make its currency easier to evaluate? If you can choose your publishing system, prefer one that can form URLs with publication dates and embed last-edited timestamps in pages.
In theory, our publishing tools could capture timestamps for the creation and modification of pages. Our web servers could encode those timestamps in HTTP headers and/or in generated pages, using a standard format. Search engines could use those timestamps to reliably sort results. And we could all much more easily evaluate the currency of those results.
In practice that’s not going to happen anytime soon. Makers of publishing tools, servers, and search engines would have to agree on a standard approach and form a critical mass in support of it. Don’t hold your breath waiting.
Can we do better? We spoke today about the web’s openness to user innovation and cited the emergence of Twitter hashtags as an example. Hashtags weren’t baked into Twitter. Chris Messina proposed using them as a way to form ad-hoc groups, drawing (I think) on earlier experience with Internet Relay Chat. Now the scope of hashtags extends far beyond Twitter. The tag for Connected Courses, #ccourses, finds essays, images, and videos from all around the web. Nine keystrokes join you to a group exploration of a set of ideas. Eleven more, #2014-10-23, could locate you on that exploration’s timeline. Would it be worth the effort? Perhaps not. But if we really wanted the result, we could achieve it.
Hashtags work, but I have noticed something in how twitter treats them. It seems like it hates anything other than alphanumeric characters in the tag. I’ve tried underscores, periods, dashes and it will always parse the thing up to that character and show a “dead” hanging tail of unrecognized or un-honored characters after that. Have you seen that on twitter at least? I don’t know if that’s the case say on WordPress.com, Instagram, Facebook, etc. It would be a small fix on Twitter’s side to honor non-alphanumeric characters I think.
For implementers it’s much easier/safer to exclude non-alphanumerics and whitespace. That makes it easier to build tag-aware systems and more likely that they will interoperate.
For users, though, such constraints are arbitrary and annoying.
Absent an agreed-upon standard character set for tags, though, I’d err on the side of simplicity. Otherwise you run into the same frustration as with passwords, never knowing which special characters do or don’t work in a given context.