Our web search strategies are largely unconscious. Back in December I dredged one up to take a look at it, and resolved to do that again from time to time. Today’s challenge was to find this article on infomania that I read about a week ago and neglected to bookmark. More specifically, I needed to recall the name Mary Czerwinski, a Microsoft researcher mentioned in the story, because I want to interview her for a podcast.
The multi-step strategy that got me there is subtle, and independent of any particular search engine. Here were the givens:
- I thought I’d seen the story on SeattlePI.com.
- I thought the researcher was female, and was an organizer of the event that was the subject of the story.
- I thought I’d recognize her name if I saw it.
- I thought that the word “attention” would appear frequently in the story.
I started with these queries:
“microsoft research” conference on attention
“microsoft research” seminar on interruption
This would have nailed it:
“microsoft research” workshop on infomania
But of course I didn’t recall that it was a workshop rather than a seminar or conference, and the word infomania hadn’t sunk in when I read the article.
Next I tried this:
“microsoft research” “continuous partial attention”
This leads, in any search engine, to Linda Stone, which I knew was a blind alley. I’ve read and heard Linda Stone on the subject of continuous partial attention, I know she’s no longer at Microsoft and wasn’t the female researcher in the story. But I figured this query would get me in the neighborhood, that the nimbus of documents surrounding her name would shake something loose. It didn’t.
Next I broadened to:
“microsoft research” attention
This leads, in any search engine, to Eric Horvitz. Note that although Eric Horvitz’s name does appear in the story I was looking for, the word “attention” does not appear in the story.
I wish I could be more precise about what happened next, but the general idea was to explore documents surrounding Eric Horvitz that would contain the name of a female researcher which, when I saw it, would ring a bell. In a couple of clicks I saw the name “Mary Czerwinski” and it did ring a bell. So my final search at SeattlePI.com was for Mary Czerwinski, and the target story was the first hit.
In retrospect I could’ve searched SeattlePI for Eric Horvitz and found the target story as the second hit. I can’t say exactly why I didn’t, but I suspect it’s because I thought exploring the document cluster around Eric Horvitz would be useful for other reasons than to locate Mary.
We perform these kinds of searches every day without thinking much about them, but there’s an amazing amount of stuff going on under the hood. Consider, for example, the aspect of this strategy that involves switching from general search engines to SeattlePI’s search engine. If I was right about the the source of the article, that would be a winning strategy because the target would tend to pop up readily in SeattlePI’s engine. If I was wrong, though, it would be a complete waste of time. Some part of my brain calculated that tradeoff. A successful search strategy involves a bunch of those kinds of calculations. How could we surface them from unconsciousness, study them, and optimize them?
um, why didn’t you skim through your browser history?
“why didn’t you skim through your browser history”
Great question. Because I was on a different computer, and the one I’d been on at the time wasn’t near to hand.
You suffer from an analytical mind. “Our web search strategies” as described only apply to folk of like mind, and who have acquired some skill at formulating queries.
My bet is that most ordinary folk have a much harder time finding stuff on the web, as their search strategies are much simpler.
I’ve got a different, harder search problem: you’ve seen web pages with examples of art by artist that you liked, then sent the link to a friend. 3 months later you decide maybe you will buy that print after all, but you’ve lost the name. It’s impossible (or very very difficult) to find an artist using verbal descriptions of the art; and as far as I know there’s nothing that can help me get an image built that’s “close enough to” the artist’s image to find what I’m looking for. I wonder if people have strategies for that? If it were in a magazine the brute force strategy is to look through all the magazines in the house, but for the browser if you don’t keep a lot of browser history, there is no brute force method that I can tell.
“My bet is that most ordinary folk have a much harder time finding stuff on the web”
Yes. But maybe if we knew more about how good searchers actually search, we could make search software smarter for everyone.
For example, I was scanning for names. A view that extracted names from result sets would have been helpful. Of course it’s not obvious just from the query transcript that I was scanning for names. But you might be able to infer that from the clickstream.
“It’s impossible (or very very difficult) to find an artist using verbal descriptions of the art”
Unless you had tagged the pages with the art you liked. That helps a couple of ways. You might just remember the tags you used. But if not, you can scan or search your tagspace to find them.
I hadn’t thought of this kind of annotation as a search strategy, but arguably it is.
good work describing so clearly and in such a readable fashion something I go through very often.
“A successful search strategy involves a bunch of those kinds of calculations. How could we surface them from unconsciousness, study them, and optimize them?”
I think this would be a great research project for the hive mind to tackle. I imagine a del.icio.us for searches. A user could tag the search results, annotating the search as you have done. The idea is to generate a large number of transparent searches by a large number of searchers.
Furthermore, the open research project should categorize the annotations, generating a searchable database. Those categories would come after a bunch of published search trails. We could infer them from the data made available from those initial transparent searches.
Huh – 2 hours after I posted I found another “search strategy” – look in your Google search history. Mine contains more than my browser history, which is cleared quite often – but I don’t know how to clear my Google search history.
The “strategy” – scroll down through seaarch history, choosing things that “sound likely”, then check the search results.
Tagging does sound good – but I haven’t yet figured out where to go to get this tagging technology of which you speak.
As for the ‘del.icio.us’ for serarches idea: maybe a genetic algorithm which tries to find the ‘genes’ of successful searches?
This is brilliant and instructive, to reflect on your actual search process. Historian William Turkel recently made some observations that seem complementary to yours, thinking about the process of inference that a human intelligence would go through if presented with no more evidence than the search strings people offer to the “database of intentions.” http://digitalhistoryhacks.blogspot.com/2007/01/keywords-and-clues.html
Not sure how if this is the right channel, but I’ve found no other way, so Jon, here is a plea that I hope you can really help out as you’re inside MS.
you must’ve heard about the plea from Gorbechev about a Russian school principal who got arrested because he unknowingly bought a pirate copy of windows. I know that MS is very sensitive on this issue, but I don’t agree that that man should be sent to jail because of this. First, there’s no way he or the school can ever afford to buy a copy of Windows (and now Vista is insanely expensive), and second, I truly believe that software for educational should be economically accessible in local standard. However, the horrible thing is that MS PR has already issued a cold-blood reply, saying that MS would not intervene because it respects the Russian judicial system. Nevermind how “respectable” the Russian judical system is, but it can really undone the positive image that MS has built up and, in my humble opinion, can be averted.
So please, Jon, convince the PR boneheads to stop and help the school principal and the students who cannot really afford the MS software.
Here is another link: http://www.eweek.com/article2/0,1759,2091412,00.asp
I just tried ChaCha (a site where remote workers search for you). This is a hard search, like yours, though I may have less information, and know less context. I saw an article about three high school students whose teacher got them access to professional level astronomy data (which is now possible to anyone). The students were able to analyse the data and come up with findings that are publishable. Obviously interesting from the point of view of “future of science” as more data comes online. This is harder because it may have been reported only locally, or in just one small online magazine; and none of the people will have any links to others, or much publicly recorded profile.
This problem is way too hard for ChaCha: this is interesting — ChaCha is a useful service for many people who have simpler queries. The people who are working on ChaCha have much more search skill than a large number of people.
This just shows how deep and wide “search” really is.
So – in reference to your reply in comment 6 : how exactly do I tag pages I’ve visited? Some intermediary site or plug-in? But if I were going to tag it, then I might have bookmarked it as well. Tagging and bookmarking are “sort of” a search strategy – they’re more of an indexing strategy, to me. Searching gets easier if items are indexed, but only if the indexing scheme aligns with my search strategy.. it will do me no good if I’m searching for “Pointillism” and the indexer tagged the item with “Lots O’ Dots”.
“But if I were going to tag it, then I might have bookmarked it as well.”
True. In my case I bookmark, and tag, using del.icio.us.
BTW, the google search history only records the sites you actually clicked on while logged into google.
So unless he went to the site by first doing a search and then clicking on a result, it wouldn’t be in Google.
However, I have been saying for YEARS that this is a killer product. One that just keeps track of all the URLS you’ve visited and searches thru them all. So you can say “search all pages I’ve VISTED for “.
Like all killer products, the interface is simple, the execution would be….challenging.
Keep on going and the chances are you will stumble on something, perhaps when you are least expecting it. I have never heard of anyone stumbling on something sitting down.
To do anything truly worth doing, I must not stand back shivering and thinking of the cold and danger, but jump in with gusto and scramble through as well as I can.