Components, pipes, and effective search

Sometime in the latter 1990s I was looking for a passage in a book that I owned. It was a revelation to discover that I could find the passage online more easily than I could by first locating my copy of the book, then scanning it and using its index. I’ve since re-enacted that scenario many times, most recently the other day when I was looking for the Diane Deutsch quotation about perfect pitch that appears in Oliver Sacks’ Musicophilia. In this case I had a library copy of the book on my desk. The text I was looking for is on page 125 of the library’s edition. As I like to do now and then, I made notes on the search strategy that got me to that page.

What I remembered of the passage was the analogy between pitch discrimination and color discrimination, so I began by searching the book using, somewhat arbitrarily, Google Books. My search term was simply color. The outcome of this naive attempt was both lucky and unlucky. Luckily it produced the most memorable part of the passage:

Suppose you showed someone a red object and asked him to name the color … Then you juxtaposed a blue object and named its color, and he responded, “OK,

Unluckily there was no preview available for the page. And the number of the found page was given as 134, which didn’t match the library edition I had on my desk. So I switched to Amazon. But the trip through Google Books was not useless. I came away with a much more discriminating phrase with which to search Amazon: red object.

Armed with that phrase, I found the page on Amazon right away, and the preview was available. But it wasn’t fully available: it ended in the middle of the passage I wanted. And again the page was given as 134, which differed from my edition.

Now, though, I had a partial page preview that showed me the layout of the page I was looking for. It was distinguished by a large indented block quote. I also had rough idea of where to look in the book: somewhere near page 134. Armed with these inputs I was able to scan the library book and zero in on page 125.

We don’t often enough name or describe the knowledge, the skills, and the techniques that enable successful search. To the extent that we do, we tend to suggest that there’s a best search engine, or a best search strategy, but the real story is subtler. Often, as in this case, the theme of the story is a pipeline of components. Here’s an illustration of the pipeline:

The mental model that drives this pipeline includes these assumptions:

  • There are multiple components. In this case: Google Books, Amazon, and the library book.

  • The components are differently searchable. Google Books and Amazon provide fulltext search; the book’s affordances are page-scanning and an index.

  • Search results are differently viewable. Google Books and Amazon may or may not provide previews; the book in hand is fully viewable.

  • The searchable components yield varying results depending on both input terms and available previews.

  • It’s possible, maybe likely, that no single component will lead to the desired result

  • A partial result from one search component can be piped into another search component.

I use the same approach when I search the web using Google and Bing in parallel. We have a cornucopia of tools at our disposal. We don’t expect to use the same screwdriver for every task; tools vary in their affordances and uses; we keep an evolving collection in our kits and combine them in novel ways to meet evolving challenges. To speak of a best search engine is as meaningless as to speak of a best screwdriver. When we teach “computer literacy” we need to develop the intuition that there’s no best information tool, but that there is a best model for using these tools.

6 Comments

  1. Kartik Subbarao Says: in a comment on your article “Brainworms and perfect pitch”, “…What I find is that some people immediately catch on to this discussion and can follow these design principles when integrating with the system. Whereas other people (again, even with ample subject-matter knowledge) don’t, and it’s a herculean task to communicate these things to them.”
    Perhaps we are dealing with different points of view, framework of the issue and further orienting information. I often liken it to the metaphor that I am speaking a language you do not understand.
    Your explanation on the search was not repeatable to me because I do not have Oliver Sacks’ book in my Google library. When I made the search, “musicophilia oliver sacks color”, I got a reference to page 182 as the first of 180 results.
    Further I would suggest that the real issue of your search issue is not addressed. You bought the book but not the contents. You bought the ink and paper not the words. Why are you denied the content? This discussion is certainly more relevant and important to me than the mechanical issue of the search process (which is primarily about making a reasonable search request).

  2. “search not repeatable”

    Ah, interesting. In fact, I don’t even have a Google library, but I failed to fully elaborate what I actually did which was:

    1. search Google Books for musicophilia

    2. search inside the book for color

    “You bought the book but not the contents.”

    Actually I borrowed the book from the library, which raises interesting questions about the usage rights that will attach to borrowed e-books when (if?) e-books become available for borrowing.

    But let’s say I owned it as an e-book. Now I can search the book directly. But that still may not be the most effective way to search it. External search engines will have different ways of searching it, and those ways will complement one another as well as the local search capability.

    1. “…failed to fully elaborate…”
      The devil is in the details.
      “…I borrowed the book from the library…”
      Even more devilish. Consider that the borrowed book is ‘owned’ by you for the period that the library has ‘loaned’ it to you. No one else has access to it at the same time. Yet you still do not have access to its contents as you should. Why do you believe that you have to own it as an e-book to be able to search its contents? Why do you believe that an external search engine has to be involved?
      You have opened a peep hole into a deep and primal discussion. I have worked in this area for over 8 years. With your interest in publishing, search, things technical you should find it fascinating. I would like to expand your knowledge in this area and the state of this art off line if you are interested. Just send me a separate email with contact info.

  3. Why do you believe that you have to own it as an e-book to be able to search its contents?

    I don’t. It might also be true that as a purchaser — or even a library borrower — I would possess a token that grants such access. But that isn’t the case now.

    Why do you believe that an external search engine has to be involved?

    Because we’re entering the era of cooperating services. A publisher willing to honor my token could put up its own search engine, but that isn’t its specialty and it won’t do the job well. The publisher will want to outsource that service to one (or more!) partners who will also honor my token.

  4. Hi Jon:

    Responding to your:
    “Why do you believe that you have to own it as an e-book to be able to search its contents?”, I have to agree with my good friend, Jim Yates.

    This brings us, of course, to the legal issue of fair use. Fair use is a principle which sits at the heart of copyright law. It is not an exception to copyright it *is* part of copyright itself. So it is pretty important, in the order of things. Not some sort of “loophole”.

    The following is far from settled law *either* way, but it is my contention that a copyright holder would lose (and look foolish in the process) if he tried to bring a case to *make* it settled law in his favor. For in so doing, he’d be in effect bringing an action against a user who paid for a physical book and her only motive for searching a digital version of that book was to have to avoid spending the extra time to make an “eye-scan hunt” for the passage she wanted. That would be like telling Winston Churchill that it is not “fair” to use his 13 secretaries to hunt for passages in books in his own library, he can only do it the slow way, i.e. do his own eye-scan. Now that technology has made it possible for us all to be Churchills even without the funds to employ 13 secretaries it would be absurd to say, “No, you can only search the slow way.” The Copyright law mandate itself stems from the Constitution’s charge to Congress to “promote science and the useful arts” and is meant to encourage scholarship, not hinder it.

    So, I conclude you do not “have to own it as an e-book to be able to search its contents” any more than you can’t print out a copy if you own an e-book. They are both fair uses of what you bought, fair and square.

    BTW, you are much more likely, in the online research process you described above, to have been violating Amazon’s user agreement, which tries to limit Amazon’s use to people who are legitimately shopping for books (which you apparently were not), than you would be viewing the digital contents (gained from an unrestricted source) of a book you owned.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s