Controlling a browser extension with puppeteer

Web technology, at its best, has always occupied a sweet spot at the intersection of two complementary modes: interactive and programmatic. You interact with a web page by plugging its URL into a browser. Then you read what’s there or, if the page is an app, you use the app. Give that same URL, web software can read the page, analyze it, extract data from it, interact with the page, and weave its data and behavior into new apps. Search is the classic illustration of this happy synergy. A crawler reads and navigates the web using the same affordances available to people, then produces data that powers a search engine.

In the early days of the web, it was easy to automate the reading, navigation, analysis, and recombination of web pages because the pages were mostly just text and links. Such automation was (and still is) useful for analysis and recombination, as well as for end-to-end testing of web software. But things got harder as the web moved away from static server-generated pages and toward dynamic client-generated pages made with JavaScript.

All along the way, we’ve had another powerful tool in our kit. Web pages and apps, however they’re produced, can be enhanced with extra JavaScript that’s injected into them. My first experience of the power of that approach was a bookmarklet that, when activated on an Amazon page, checked your local library to see if the book was available there. Then came Greasemonkey, a framework for injecting “userscripts” into pages. It was the forerunner of browser extensions now converging on the WebExtensions API.

The Hypothesis web annotator is a good example of the kind of software that relies on the ability to inject JavaScript into web pages. It runs in two places. The annotator is injected into the web page you’re annotating; the sidebar (i.e. the app) runs in an iframe that’s also injected into the page. The annotator acquires page metadata, reacts to the selection event, forms W3C-style selectors that define the selection, and sends that information (by way of JavaScript messages) to the sidebar. The sidebar queries the server, retrieves annotations for the page into which the annotator was injected, displays the annotations, tells the annotator about navigation it should sync with, and listens for messages from the annotator announcing selections that serve as the basis of new annotations. When you create a new annotation in the sidebar, it messages the annotator so it can anchor it. That means: receive annotation data from the sidebar, and highlight the selection it refers to.

Early in my Hypothesis tenure we needed to deploy a major upgrade to the anchoring machinery. Would annotations created using the old system still anchor when using the new one? You could check interactively by loading HTML and PDF documents into two versions of the extension, then observing and counting the annotations that anchored or didn’t, but that was tedious and not easily repeatable. So I made a tool to automate that manual testing. At the time the automation technology of choice was Selenium WebDriver. I was able to use it well enough to validate the new anchoring system, which we then shipped.

Now there’s another upgrade in the works. We’re switching from a legacy version of PDF.js, the Mozilla viewer that Hypothesis uses to convert PDF files into web pages it can annotate, to the current PDF.js. Again we faced an automation challenge. This time around I used the newer automation technology that is the subject of this post. The ingredients were: headless Chrome, a browser mode that’s open to automation; puppeteer, an API for headless Chrome; and the devtools protocol, which is how the Chromium debugger, puppeteer, and other tools and APIs control the browser.

This approach was more powerful and effective than the Selenium-based one because puppeteer’s devtools-based API affords more intimate control of the browser. Anything you can do interactively in the browser’s debugger console — and that’s a lot! — you can do programmatically by way of the devtools protocol.

That second-generation test harness is now, happily, obsolete thanks to an improved (also puppeteer-based) version. Meanwhile I’ve used what I learned on a couple of other projects. What I’ll describe here, available in this GitHub repo, is a simplified version of one of them. It illustrates how to use puppeteer to load an URL into an extension, simulate user input and action (e.g., clicking a button), open a new extension-hosted tab in reaction to the click, and transmit data gathered from the original tab to the new tab. All this is wrapped in a loop that visits many URLs, gathers analysis results for each, combines the results, and saves them.

This is specialized pattern, to be sure, but I’ve found important uses for it and expect there will be more.

To motivate the example I’ve made an extension that, while admittedly concocted, is plausibly useful as a standalone tool. When active it adds a button labeled view link elements to each page you visit. When you click the button, the extension gathers all the link elements on the page, organizes them, and sends the data to a new tab where you view it. There’s a menagerie of these elements running around on the web, and it can be instructive to reveal them on any given page.

The most concocted aspect of the demo is the input box that accompanies the view link elements button. It’s not strictly required for this tool, but I’ve added it because the ability to automate user input is broadly useful and I want to show how that can work.

The extension itself, in the repo’s browser-extension folder, may serve as a useful example. While minimal, it illustrates some key idioms: injecting code into the current tab; using shadow DOM to isolate injected content from the page’s CSS; sending messages through the extension’s background script in order to create a new tab. To run it, clone the repo (or just download the extension’s handful of files into a folder) and follow these instructions from

  1. Open the Extension Management page by navigating to chrome://extensions. The Extension Management page can also be opened by clicking on the Chrome menu, hovering over More Tools then selecting Extensions.

  2. Enable Developer Mode by clicking the toggle switch next to Developer mode.

  3. Click the LOAD UNPACKED button and select the extension directory.

This works in Chromium-based browsers including Chrome, Brave, and the new version of Edge. The analogous procedure for Firefox:

Open “about:debugging” in Firefox, click “Load Temporary Add-on” and select any file in your extension’s directory.

The Chromium and Firefox extension machinery is not fully compatible. That’s why Hypothesis has yet to ship a Firefox version of our extension, and why the demo extension I’m describing here almost works in Firefox but fails for a reason I’ve yet to debug. Better convergence would of course be ideal. Meanwhile I’m confident that, with some elbow grease, extensions can work in both environments. Chromium’s expanding footprint does tend to erode the motivation to make that happen, but that’s a topic for another day. The scope of what’s possible with Chromium-based browsers and puppeteer is vast, and that’s my topic here.

Of particular interest to me is the notion of automated testing of workflow. The demo extension hints at this idea. It requires user input; the automation satisfies that requirement by providing it. In a real workflow, one that’s driven by a set of rules, this kind of automation can exercise web software in a way that complements unit testing with end-to-end testing in real scenarios. Because the software being automated and the software doing the automation share intimate access to the browser, the synergy between them is powerful in ways I’m not sure I can fully explain. But it makes my spidey sense tingle and I’ve learned to pay attention when that happens.

Here’s a screencast of the runnning automation:

The results gathered from a selection of sites are here.

Some things I noticed and learned along the way:

Languages. Although puppeteer is written in JavaScript, there’s no reason it has to be. Software written in any language could communicate with the browser using the devtools protocol. The fact that puppeteer is written in JavaScript introduces potential confusion in places like this:

const results = await viewPage.evaluate((message) => {
  // this blocks run in the browser's 2nd tab
  const data = JSON.parse(document.querySelector('pre').innerText) 

The first line of JS runs in node, the second line runs in the browser. If puppeteer were written in a different language that would be obvious; since it isn’t I remind myself with a comment.

Settings. If you need the automated instance of Chromium to remember login info or other things, you can add userdir: {PATH} to the object passed to puppeteer.launch.

Interactivity. It may seem odd that the demo is using headless Chrome but specifies headless: false in the object passed to puppeteer.launch. I do that in order to be able to watch the automation when running locally; I’d turn it off when running automation in the cloud. Although I’ve not yet explored the idea, an interactive/automated hybrid seems an intriguing foundation for workflow apps and other kinds of guided experiences.

Web components used in the ClinGen app

A few years back I watched Joe Gregorio’s 2015 OSCON talk and was deeply impressed by the core message: “Stop using JS frameworks, start writing reusable, orthogonally-composable units of HTML+CSS+JS.” Earlier this year I asked Joe to reflect on what he’s since learned about applying the newish suite of web standards known collectively as web components. He replied with a blog post that convinced me to try following in his footsteps.

I’ve been thinking about how software components can be built and wired together since the dawn of the web, when the state-of-the-art web components were Java applets and Netscape browser plugins. Since then we’ve seen newer plugin mechanisms (Flash, Silverlight) rise and fall, and the subsequent rise of JavaScript frameworks (jQuery, Angular, React) that are now the dominant way to package, reuse, and recombine pieces of web functionality.

The framework-centric approach has never seemed right to me. My touchstone example is the tag editor in the Hypothesis web annotator. When the Hypothesis project started, Angular was the then-fashionable framework. The project adopted a component, ngTagsInput, which implements the capable and attractive tag editor you see in Hypothesis today. But Angular has fallen from grace, and the Hypothesis project is migrating to React — more specifically, to the React-API-compatible preact. Unfortunately ngTagsInput can’t come along for the ride. We’ll either have to find a React-compatible replacement or build our own tag-editing component.

Why should a web component be hardwired to an ephemeral JavaScript framework? Why can’t the web platform support framework-agnostic components, as Joe Gregorio suggests?

I’ve been exploring these questions, and my provisional answers are: “It shouldn’t!” and “It can!”

The demo I’ll describe here comes from an app used by biocurators working for the ClinGen Consortium. They annotate scientific literature to provide data that will feed machine learning systems. While these biocurators can and do use off-the-shelf Hypothesis to highlight words and phrases in papers, and link them to search results in databases of phenotypes, genes, and alleles, the workflow is complex and the requirements for accurate tagging are strict. The app provides a guided workflow, and assigns tags automatically. In the latest iteration of the app, we’ve added support for a requirement to tag each piece of evidence with a category — individual, family, or group — and an instance number from one to twenty.

The demo extracts this capability from the app. It’s live here, or you can just download index.html and index.js and open index.html locally. This is plain vanilla web software: no package.json, no build script, no transpiler, no framework.

To create a widget that enables curators to declare they are annotating in the context of individual 1 or family 3 or group 11, I started with IntegerSelect, a custom element that supports markup like this:

  <select is="integer-select" type="individual" count="20"></select>  

Here I digress to clarify a point of terminology. If you look up web components on Wikipedia (or anywhere else) you will learn that the term refers to a suite of web technologies, primarily Custom Elements, Shadow DOM, and HTML Template. What I’m showing here is only Custom Elements because that’s all this project (and others I’m working on) have required.

(People sometimes ask: “Why didn’t web components take off?” Perhaps, in part, because nearly all the literature implies that you must adopt and integrate a wider variety of stuff than may be required.)

The thing being extended by IntegerSelect is a native HTML element, HTMLSelectElement. When it renders into the DOM, it will have empty select markup which the element’s code then fills. It inherits from HTMLSelectElement, so that code can use expressions like this.options[this.selectedIndex].value.

Two key patterns for custom elements are:

  1. Use attributes to pass data into elements.

  2. Use messages to send data out of elements.

You can see both patterns in IntegerSelect’s class definition:

class IntegerSelect extends HTMLSelectElement {
  constructor() {
    this.type = this.getAttribute('type')
  relaySelection() {
    const e = new CustomEvent('labeled-integer-select-event', { 
      detail: {
        type: this.type,
        value: this.options[this.selectedIndex].value
    // if in a collection, tell the collection so it can update its display
    const closestCollection = this.closest('labeled-integer-select-collection')
    if (closestCollection) {
    // tell the app so it can save/restore the collection's state
  connectedCallback() {
    const count = parseInt(this.getAttribute('count'))
    let options = ''
    const lookupInstance = parseInt(getLookupInstance(this.type))
    for (let i = 1; i < count; i++) {
      let selected = ( i == lookupInstance ) ? 'selected' : ''
      options += `<option ${selected}>${i}`
    this.innerHTML = options
    this.onclick = this.relaySelection
customElements.define('integer-select', IntegerSelect, { extends: "select" })

As you can see in the first part of the live demo, this element works as a standalone tag that presents a picklist of count choices associated with the given type. When clicked, it creates a message with a payload like this:


And it beams that message at a couple of targets. The first is an enclosing element that wraps the full hierarchy of elements that compose the widget shown in this demo. The second is the single-page app that hosts the widget. In part 1 of the demo there is no enclosing element, but the app is listening for the event and reports it to the browser’s console.

An earlier iteration relied on a set of radio buttons to capture the type (individual/family/group), and combined that with the instance number from the picklist to produce results like individual 1 and group 3. Later I realized that just clicking on an IntegerSelect picklist conveys all the necessary data. An initial click identifies the picklist’s type and its current instance number. A subsequent click during navigation of the picklist changes the instance number.

That led to a further realization. The radio buttons had been doing two things: identifying the selected type, and providing a label for the selected type. To bring back the labels I wrapped IntegerSelect in LabeledIntegerSelect whose tag looks like this:

  <labeled-integer-select type="individual" count="20"></labeled-integer-select>

Why not a div (or other) tag with is="labeled-integer-select"? Because this custom element doesn’t extend a particular kind of HTML element, like HTMLSelectElement. As such it’s defined a bit differently. IntegerSelect begins like so:

class IntegerSelect extends HTMLSelectElement

And concludes thusly:

customElements.define('integer-select', IntegerSelect, { extends: "select" })

LabeledIntegerSelect, by contrast, begins:

class LabeledIntegerSelect extends HTMLElement

And ends:

customElements.define('labeled-integer-select', LabeledIntegerSelect)

As you can see in part 2 of the demo, the LabeledIntegerSelect picklist is an IntegerSelect picklist with a label. Because it includes an IntegerSelect, it sends the same message when clicked. There is still no enclosing widget to receive that message, but because the app is listening for the message, it is again reported to the console.

Finally we arrive at LabeledIntegerSelectCollection, whose markup looks like this:

    <labeled-integer-select type="individual" count="20"></labeled-integer-select>
    <labeled-integer-select type="family" count="20"></labeled-integer-select>
    <labeled-integer-select type="group" count="20"></labeled-integer-select>

When this element is first connected to the DOM it visits the first of its LabeledIntegerSelects, sets its selected attribute true, and bolds its label. When one of its LabeledIntegerSelects is clicked, it clears that state and reestablishes it for the clicked element.

The LabeledIntegerSelectCollection could also save and restore that state when the page reloads. In the current iteration it doesn’t, though. Instead it offloads that responsibility to the app, which has logic for saving to and restoring from the browser’s localStorage.

You can see the bolding, saving, and restoring of choices in this screencast:

I’m still getting the hang of custom elements, but the more I work with them the more I like them. Joe Gregorio’s vision of “reusable, orthogonally-composable units of HTML+CSS+JS” doesn’t seem far-fetched at all. I’ll leave it to others to debate the merits of the frameworks, toolchains, and package managers that so dramatically complicate modern web development. Meanwhile I’ll do all that I can with the basic web platform, and with custom elements in particular. Larry Wall has always said of Perl, which enabled my earliest web development work, that it makes easy things easy and hard things possible. I want the web platform to earn that same distinction, and I’m more hopeful than ever that it can.

Products and Capabilities

I’ve come to depend on the outlining that’s baked into Visual Studio Code. So far as I can tell, it is unrelated to the tool’s impressive language intelligence. There’s no parsing or understanding of the text in which sections can fold and unfold, but indentation is significant and you can exploit that to make nested expand/collapse zones in any language. I use it for HTML, CSS, JavaScript, Python, Markdown, SQL, and maybe a few others I’m forgetting.

For a long time I thought of myself as someone who respected outlining tools as an important category of software but, having tried various of them going all the way back to ThinkTank, never adopted one as part of my regular work. My experience with VSCode, though, has shown me that while I never adopted an outliner product, I have most surely adopted outlining capability. I use it wherever it manifests. I make tables of contents in Google Docs, I use code folding in text editors that support it, I exploit any outlining affordance that exists in any tool I’m using. Recently I was thrilled to learn that the web platform offers such an affordance in the form of the <details> tag.

In thinking about this post, the first title that occurred to me was: Products vs. Capabilities. Were I still earning a living writing about tech I might have gone with that, or an editor might have imposed it, because conflict (in theory) sells content. But since I’m not selling content it’s my call. I choose Products and Capabilities because the two are, or anyway can (and arguably should) be complementary. For some people, an outliner is a product you use regularly because you value that way of visualizing and reorganizing stuff that you write. You might not be one of those people. I’m not. I don’t, for example, use outlining to organize my thoughts when I write prose, as I’m doing now. That just isn’t how my mind works. I’ve done enough decent writing over the years to accept that it’s OK to not explicitly outline my prose.

For the various kinds of code I spend a lot of time wrangling these days, though, outlining has become indispensable. An outlining expert might say that the capability I’m talking about isn’t real full-fledged outlining, and I’d agree. In a dedicated outliner, for example, you expect to be able to drag things up and down, and in and out, without messing them up. VSCode doesn’t have that, it’s “just” basic indentation-driven folding and unfolding. But that’s useful. And because it shows up everywhere, I’m more effective in a variety of contexts.

In web annotation, I see the same kind of product/capability synergy. Hypothesis is a product that you can adopt and use as a primary tool. You might do that if you’re the sort of person who fell in love with delicious, then moved on to Diigo or Pinboard. Or if you’re a teacher who wants to make students’ reading visible. But web annotation, like outlining, is also a capability that can manifest in — and enhance — other tools. I showed some examples in this 2017 talk; I’ve helped develop others since; I’m more excited than ever about what’s possible.

I’m sure there are other examples. What are some of them? Is there a name for this pattern?

Digital sweatshops then and now

In the early 1960s my family lived in New Delhi. In 1993, working for BYTE, I returned to India to investigate its software industry. The story I wrote, which appears below, includes this vignette:

Rolta does facilities mapping for a U.S. telephone company through its subsidiary in Huntsville, Alabama. Every night, scanned maps flow through the satellite link to Bombay. Operators running 386-based RoltaStations retrieve the maps from a Unix server, digitize them using Intergraph’s MicroStation CAD software, and relay the converted files back to Huntsville.

I didn’t describe the roomful of people sitting at those 386-based RoltaStations doing the work. It was a digital sweatshop. From the window of that air-conditioned room, though, you could watch physical laborers enacting scenes not unlike this one photographed by my dad in 1961.

I don’t have a picture of those rows of map digitizers in the city soon to be renamed Mumbai. But here’s a similar picture from a recent Times story, A.I. Is Learning From Humans. Many Humans.:

Labor-intensive data labeling powers AI. Web annotation technology is a key enabler for that labeling. I’m sure there will digital sweatshops running the software I help build. For what it’s worth, I’m doing my best to ensure there will be opportunities to elevate that work, empowering the humans in the loop to be decision-makers and exception-handlers, not just data-entry clerks.

Meanwhile, here’s that 1993 story, nearly forgotten by the web but salvaged from the NetNews group soc.culture.indian.

Small-systems thinking makes India a strategic software partner

by Jon Udell

BYTE, 09/01/1993

Recently, I saw a demonstration of a new Motif-based programmer’s tool called Sextant. It’s a source code analyzer that converts C programs into labeled graphs that you navigate interactively. The demonstration was impressive. What made it unique for me was that it took place in the offices of Softek, a software company in New Delhi, India.

It’s well known that Indian engineering talent pervades every level of the microcomputer industry. But what’s happening in India? On a recent tour of the country, I visited several high-tech companies and discovered that India is evolving rapidly from an exporter of computer engineering talent into an exporter of computer products and services. Software exports, in particular, dominate the agenda. A 1992 World Bank study of eight nations rated India as the most attractive nation for U.S., European, and Japanese companies seeking offshore software-development partners.

The World Bank study was conducted by Infotech Consulting (Parsippany, NJ). When the opportunity arose to visit India, I contacted Infotech’s president, Jack Epstein, for advice on planning my trip. He referred me to Pradeep Gupta, an entrepreneur in New Delhi who publishes two of India’s leading computer magazines, PC Quest and DataQuest. Gupta also runs market-research and conference businesses. He orchestrated a whirlwind week of visits to companies in New Delhi and Bombay and generously took time to acquaint me with the Indian high-tech scene.

A Nation of Small Systems

Even among Indians, there’s a tendency to attribute India’s emerging software prowess to the innate mathematical abilities of its people. “After all, we invented zero,” says Dewang Mehta, an internationally known computer graphics expert. He is also executive director of the National Association of Software and Service Companies (NASSCOM) in New Delhi. While this cultural stereotype may hold more than a grain of truth, it’s not the whole story. As NASSCOM’s 1992 report on the state of the Indian software industry notes, India has the world’s second-largest English-speaking technical work force. Consequently, Indian programmers are in tune with the international language of computing, as well as the language spoken in the U.S., the world’s largest market.

Furthermore, India’s data-communications infrastructure is rapidly modernizing. And the Indian government has begun an aggressive program of cutting taxes and lifting import restrictions for export-oriented Indian software businesses while simultaneously clearing the way for foreign companies to set up operations in India.

Other countries share many of these advantages, but India holds an ace. It is a nation of small systems. For U.S. and European companies that are right-sizing mainframe- and minicomputer-based information systems, the switch to PC-based client/server alternatives can be wrenching. Dumping the conceptual baggage of legacy systems isn’t a problem for India, however, because, in general, those systems simply don’t exist. “India’s mainframe era never happened,” says Gupta.

When Europe, Japan, and the U.S. were buying mainframes left and right, few Indian companies could afford their high prices, which were made even more costly by 150 percent import duties. Also, a government policy limiting foreign investors to a 40 percent equity stake in Indian manufacturing operations drove companies like IBM away, and the Indian mainframe industry never got off the ground.

What did develop was an indigenous microcomputer industry. In the early 1980s, Indian companies began to import components and to assemble and sell PC clones that ran DOS. This trend quickened in 1984, when the late Rajiv Gandhi, prime minister and an avid computer enthusiast, lifted licensing restrictions that had prevented clone makers from selling at full capacity.

In the latter half of the 1980s, a computerization initiative in the banking industry shifted the focus to Unix. Front offices would run DOS applications, but behind the scenes, a new breed of Indian-made PCs — Motorola- and Intel-based machines running Unix — would handle the processing chores. Unfortunately, that effort stalled when the banks ran afoul of the unions; even today, many of the Bank of India’s 50,000 branches aren’t linked electronically.

Nevertheless, the die was cast, and India entered the 1990s in possession of a special advantage. Indian programmers are not only well educated and English-speaking, but out of necessity they’re keenly focused on client/server or multiuser solutions for PCs running DOS (with NetWare) or Unix — just the kinds of solutions that U.S. and European companies are rushing to embrace. India finds itself uniquely positioned to help foreign partners right-size legacy applications.

The small-systems mind-set also guides India’s fledgling supercomputer industry. Denied permission by the U.S. government to import a Cray supercomputer, the Indian government’s Center for the Development of Advanced Computers built its own — very different — sort of supercomputer. Called PARAM, it gangs Inmos T800 transputers in parallel and can also harness Intel 860 processors for vector work. Related developments include a transputer-based neural-network engine intended to run process-control applications. The designers of this system impressed me with their clear grasp of the way in which inexpensive transputers can yield superior performance, scalability, modularity, and fault tolerance.

Software Products and Services

Many of the companies I visited produce comparable offerings for LAN or Unix environments. In the realm of packaged software, Oberoi Software in New Delhi sells a high-end hotel management application using Sybase 4.2 that runs on Hewlett-Packard, DEC, and Sun workstations. A low-end version uses Btrieve for DOS LANs. Softek offers 1-2-3, dBase, and WordStar work-alikes for DOS and Unix.

Shrink-wrapped products, however, aren’t India’s strong suit at the moment. PCs remain scarce and expensive commodities. According to DataQuest, fewer than 500,000 PCs can be found in this nation of 875 million people. To a U.S. software engineer, a $3000 PC might represent a month’s wages. An equivalently prosperous Indian professional would have to work a full year to pay for the same system. To put this in perspective, the average per capita wage in India is about $320, and the government caps the monthly salary of Indian corporate executives at around $1600 per month.

Software piracy is another vexing problem. “The competition for a 5000-rupee [approximately $165] Indian spreadsheet isn’t a 15,000-rupee imported copy of Lotus 1-2-3,” says Gupta, “but rather a zero-rupee pirated copy of Lotus 1-2-3.”

Painfully aware of the effect piracy has on the country’s international reputation as a software power, government and industry leaders have joined forces to combat it. The Department of Electronics (DoE), for example, has funded an anti-piracy campaign, and Lotus has a $69 amnesty program that enables users of illegal copies of 1-2-3 to come clean.

Reengineering Is a National Strength

The real action in Indian software isn’t in products. It’s in reengineering services. A typical project, for example, might involve re-creating an IBM AS/400-type application for a LAN or Unix environment. A few years ago, Indian programmers almost invariably would perform such work on location in the U.S. or Europe, a practice called “body shopping.” This was convenient for clients, but it wasn’t very beneficial to India because the tools and the knowledge spun off from reengineering projects tended to stay overseas.

More recently, the trend is to carry out such projects on Indian soil. Softek, for example, used a contract to build a law-office automation system for a Canadian firm as an opportunity to weld a number of its own products into a powerful, general-purpose client/server development toolkit. Softek engineers showed me how that toolkit supports single-source development of GUI software for DOS or Unix (in character mode) as well as Windows. They explained that client programs can connect to Softek’s own RDBMS (relational DBMS) or to servers from Gupta, Ingres, Oracle, or Sybase. That’s an impressive achievement matched by few companies anywhere in the world, and it’s one that should greatly enhance Softek’s appeal to foreign clients.

While reengineering often means right-sizing, that’s not always the case. For example, the National Indian Institution for Training, a New Delhi-based computer-training institute rapidly expanding into the realm of software products and services, has rewritten a well-known U.S. commercial word processor. Rigorous development techniques are the watchword at NIIT. “We have a passion for methodology,” says managing director Rajendra S. Pawar, whose company also distributes Excelerator, Intersolv’s CASE tool.

Other projects under way at NIIT include an X Window System interface builder, Mac and DOS tutorials to accompany the Streeter series of math textbooks (for McGraw-Hill), a simple but effective multimedia authoring tool called Imaginet, a word processor for special-needs users that exploits an NIIT-designed motion- and sound-sensitive input device, and an instructional video system.

Although services outweigh products for now, and the Indian trade press has complained that no indigenous software product has yet made a splash on the world scene, the situation could well change. Indian programmers are talented, and they’re up-to-date with database, GUI, network, and object-oriented technologies. These skills, along with wages 10 or more times less than U.S. programmers, make Indian programming a force to be reckoned with. Software development is a failure-prone endeavor; many products never see the light of day. But, as Tata Unisys (Bombay) assistant vice president Vijay Srirangan points out, “The cost of experimentation in India is low.” Of the many software experiments under way in India today, some will surely bear fruit.

A major obstacle blocking the path to commercial success is the lack of international marketing, but some help has been forthcoming. Under contract to the U.K.’s Developing Countries Trade Agency, the marketing firm Schofield Maguire (Cambridge, U.K.) is working to bring selected Indian software companies to the attention of European partners. “India does have a technological lead over other developing countries,” says managing partner Alison Maguire. “But to really capitalize on its software expertise, it must project a better image.”

Some companies have heard the message. For example, Ajay Madhok, a principal with AmSoft Systems (New Delhi), parlayed his firm’s expertise with computer graphics and digital video into a high-profile assignment at the 1992 Olympics. On a recent U.S. tour, he visited the National Association of Broadcasters show in Las Vegas. Then he flew to Atlanta for Comdex. While there, he bid for a video production job at the 1996 Olympics.

Incentives for Exporters

According to NASSCOM, in 1987, more than 90 percent of the Indian software industry’s $52 million in earnings came from “on-site services” (or body shopping). By 1991, on-site services accounted for a thinner 61 percent slice of a fatter $179 million pie. Reengineering services (and, to a lesser extent, packaged products) fueled this growth, with help from Indian and U.S. government policies.

On the U.S. side, visa restrictions have made it harder to import Indian software labor. India, meanwhile, has developed a range of incentives to stimulate the software and electronics industries. Government-sponsored technology parks in Noida (near New Delhi), Pune (near Bombay), Bangalore, Hyderabad, and several other locations support export-oriented software development. Companies that locate in these parks share common computing and telecommunications facilities (including leased-line access to satellite links), and they can import duty-free the equipment they need for software development.

The Indian government has established export processing zones in which foreign companies can set up subsidiaries that enjoy similar advantages and receive a five-year tax exemption. Outside these protected areas, companies can get comparable tax and licensing benefits by declaring themselves fully export-oriented.

Finally, the government is working to establish a number of hardware technology parks to complement the initiative in software. “We want to create many Hong Kongs and Singapores,” says N. Vittal, Secretary of the DoE and a tireless reformer of bureaucracy, alluding to the economic powerhouses of the Pacific Rim.

The Indian high-tech entrepreneurs I met all agreed that Vittal’s tenacious slashing of government red tape has blazed the trail they now follow. How serious is the problem of government red tape? When the government recently approved a joint-venture license application in four days, the action made headlines in both the general and trade press. Such matters more typically take months to grind their way through the Indian bureaucracy.

The evolution of India’s telecommunications infrastructure shows that progress has been dramatic, though uneven. In a country where only 5 percent of the homes have telephone service, high-tech companies increasingly rely on leased lines, packet-switched data networks, and satellite links. The DoE works with the Department of Telecommunication (DoT) to ensure that software export businesses get priority access to high-bandwidth services.

But the slow pace of progress at the DoT remains a major frustration. For example, faxing can be problematic in India, because the DoT expects you to apply for permission to transmit data. And despite widespread Unix literacy, only a few of the dozens of business cards I received during my tour carried Internet addresses. Why? DoT regulations have retarded what would have been the natural evolution of Unix networking in India. I did send mail home using ERNET, the educational resource network headquartered in the DoE building in New Delhi that links universities throughout the country. Unfortunately, ERNET isn’t available to India’s high-tech businesses.

Vittal recognizes the critical need to modernize India’s telecommunications. Given the scarcity of an existing telecommunications infrastructure, he boldly suggests that for many scattered population centers, the solution may be to completely pass over long-haul copper and vault directly into the satellite era. In the meantime, India remains in this area, as in so many others, a land of extreme contrasts. While most people lack basic telephone service, workers in strategic high-tech industries now take global voice and data services for granted.

Powerful Partners

When Kamal K. Singh, chairman and managing director of Rolta India, picks up his phone, Rolta’s U.S. partner, Intergraph, is just three digits away. A 64-Kbps leased line carries voice and data traffic from Rolta’s offices, located in the Santacruz Electronics Export Processing Zone (SEEPZ) near Bombay, to an earth station in the city’s center. Thence, such traffic travels via satellite and T1 lines in the U.S. to Intergraph’s offices in Huntsville, Alabama.

Rolta builds Intel- and RISC-based Intergraph workstations for sale in India; I saw employees doing everything from surface-mount to over-the-network software installation. At the same time, Rolta does facilities mapping for a U.S. telephone company through its subsidiary in Huntsville. Every night, scanned maps flow through the satellite link to Bombay. Operators running 386-based RoltaStations retrieve the maps from a Unix server, digitize them using Intergraph’s MicroStation CAD software, and relay the converted files back to Huntsville.

Many Indian companies have partnerships with U.S. firms. India’s top computer company, HCL, joined forces with Hewlett-Packard to form HCL-HP. HCL’s roots were in multiprocessor Unix. “Our fine-grained multiprocessing implementation of Unix System V has been used since 1988 by companies such as Pyramid and NCR,” says director Arjun Malhotra.

HCL’s joint venture enables it to build and sell HP workstations and PCs in India. “People appreciate HP quality,” says marketing chief Ajai Chowdhry. But since Vectra PCs are premium products in the price-sensitive Indian market, HCL-HP also plans to leverage its newly acquired HP design and manufacturing technology to build indigenous PCs that deliver “good value for money,” according to Malhotra.

Pertech Computers, a system maker in New Delhi, recently struck a $50 million deal to supply Dell Computer with 240,000 motherboards. Currently, trade regulations generally prohibit the import of certain items, such as finished PCs. However, exporters can use up to 25 percent of the foreign exchange they earn to import and sell such items. Pertech director Bikram Dasgupta plans to use his “forex” money to buy Dell systems for resale in India and to buy surface-mount equipment so that the company can build work-alikes.

IBM returned to India last year, after leaving in 1978, to join forces with the Tatas, a family of Indian industrialists. The joint venture, Tata Information Systems, will manufacture PS/2 and PS/VP systems and develop software exports.

Citicorp Overseas Software, a Citicorp subsidiary, typifies a growing trend to locate software-development units in India. “Our charter is first and foremost to meet Citicorp’s internal requirements,” says CEO S. Viswanathan, “but we are a profit center and can market our services and products.” On a tour of its SEEPZ facility in Bombay, I saw MVS, Unix, VMS, and Windows programmers at work on a variety of projects. In addition to reengineering work for Citicorp and other clients, the company markets banking products called Finware and MicroBanker.

ITC (Bangalore) supplements its Oracle, Ingres, and AS/400 consulting work by selling the full range of Lotus products. “Because we have the rights to manufacture Lotus software locally,” says vice president Shyamal Desai, “1-2-3 release 2.4 was available here within a week of its U.S. release.” Other distributors of foreign software include Onward Computer Technologies in Bombay (NetWare) and Bombay-based Mastek (Ingres).

India’s ambitious goal is to quadruple software exports from $225 million in 1992 to $1 billion in 1996. To achieve that, everything will have to fall into place. It would be a just reward. India gave much to the international microcomputer industry in the 1980s. In the 1990s, the industry just might return the favor.

“It’s not your fault, mom.”

I just found this never-published 2007 indictment of web commerce, and realized that if mom were still here 12 years later I could probably write the same thing today. There hasn’t been much much progress on smoother interaction for the impaired, and I don’t see modern web software development on track to get us there. Maybe it will require a (hopefully non-surgical) high-bandwidth brain/computer interface. Maybe Doc Searls’ universal shopping cart. Maybe both.

November 3, 2007

Tonight, while visiting my parents, I spent an hour helping my mom buy tickets to two Metropolitan Opera simulcasts. It went horribly wrong in six different ways. Here is the tale of woe.

A friend had directed her to Fandango with instructions to “Type ‘Metropolitan Opera’ in the upper-right search box.” Already my mom’s in trouble. Which upper-right search box? There’s one in the upper-right corner of the browser, and another in the upper-right corner of the Fandango page. She doesn’t distinguish between the two.

I steer her to the Fandango search box, she tries to type in ‘Metropolitan Opera’, and fails several times. Why? She’s unclear about clicking to set focus on the search box. And when she does finally aim for it, she misses. At age 86, arthritis and macular degeneration conspire against her.

I help her to focus on the search box, she types in ‘Metropolitan Opera’, and lands on a search results page. The title of the first show she wants to buy is luckily on the first page of a three-page result set, but it’s below the fold. She needs to scroll down to find it, but isn’t sure how.

I steer her to the show she wants, she clicks the title, and lands on another page where the interaction is again below the fold. Now she realizes she’s got to navigate within every page to the active region. But weak vision and poor manual dexterity make that a challenge.

We reach the page for the April 26th show. Except, not quite. Before she can choose a time for the show — unnecessarily, since there’s only one time — she has to reselect the date by clicking a link labeled ‘Saturday April 26th’. Unfortunately the link text’s color varies so subtly from the regular text’s color that she can’t see the difference, and doesn’t realize it is a link.

I steer her to the link, and she clicks to reveal the show time: 1:30. Again it’s unclear to her that this is a link she must follow.

I steer her to the 1:30 link, and finally we reach the purchase page. Turns out she already has an account with Fandango, which my sister must have helped her create some time ago. So mom just needs to sign in and…

You can see this coming from a mile away. She’s forgotten which of her usual passwords she used at this site. After a couple of failures, I steer her to the ‘Forgot password’ link, and we do the email-checking dance.

The email comes through, we recover the password, and proceed to checkout. The site remembers her billing address and credit card info. I’m seeing daylight at the end of the tunnel.

Except it’s not daylight, it’s an oncoming train.

Recall that we’re trying to buy tickets for two different shows. So now it’s time to go back and add the second one to the shopping cart. Unfortunately Fandango doesn’t seem to have a shopping cart. Each time through you can only buy one set of tickets.

I steer her back to the partially-completed first transaction. We buy those tickets and print the confirmation page.

Then we enter the purchase loop for the second time and…this one is not like the other one. This time through, it asks for the card’s 3-digit security code. She enters it correctly, but the transaction fails because something has triggered the credit card company’s fraud detector. Probably the two separate-but-identical charges in rapid succession.

We call the credit card company, its automated system wants her to speak or enter her social security number. She tries speaking the number, the speech recognizer gets it wrong, but mom can’t hear what it says back to her. In addition to macular degeneration and arthritis, she’s lost a lot of her hearing in the last few years. So she tries typing the social security number on the phone’s keypad, and fails.

I grab the phone and repeat ‘agent’ and ‘operator’ until somebody shows up on the line. He does the authentication dance with her (maiden name, husband’s date of birth, etc.), then I get back on the line and explain the problem. The following dialogue ensues:

Agent: “Whoa, this is going to be hard, we’re having a problem and I can’t access her account right now. Do you want to try later?”

Me: “Well, I’m here with my 86-year-old mom, and we’ve invested a whole lot of effort in getting to this point, is there any way to hang onto the context?”

He sympathizes, and connects me to another, more powerful agent. She overrides the refusal, authorizes the $63, and invites me to try again.

Now the site reports a different error: a mismatch in either the billing zip code, or the credit card’s 3-digit security code. To Fandango it looks like the transaction failed. To the credit card company it looks like it succeeded. What now?

The agent is pretty sure that if the transaction failed from Fandango’s perspective, it’ll come out in the wash. But we’re both puzzled. I’m certain the security code is correct. And all other account info must be correct, right? How else how could we have successfully bought the first set of tickets?

But just to double-check, I visit mom’s account page on Fandango. The billing address zip code is indeed correct. So is everything else, except…wait…the credit card’s expiration date doesn’t match. The account page says 11/2007, the physical card says 11/2010.

Turns out the credit card company recently refreshed mom’s card. This is weird and disturbing because, if the successful transaction and the failed transaction were both using the same wrong expiration date, they both should have failed.

Sigh. I change the expiration date, and finally we buy the second set of tickets. It’s taken an hour. My mom observes that if Fandango accepted orders over the phone, we’d have been done about 55 minutes ago. And she asks:

“How could I possibly have done that on my own?”

I know all the right answers. Better web interaction design. Better assistive technologies for the vision-, hearing-, and dexterity-impaired. Better service integration between merchants and credit card companies. Better management of digital identity. Someday it’ll all come together in a way that would enable my mom to do this for herself. But today isn’t that day.

Her only rational strategy was to do just what she did, namely recruit me. For which she apologizes.

All I can say, in the end, is: “Mom, it’s not your fault.”

Walking the Kortum Trail

On my first trip to Sonoma County I flew to SFO and drove north through San Francisco, across the Golden Gate Bridge, and through Marin County on the 101 freeway.

The next time I drove highway 1 up the coast and was stunned by the contrast. The Marin headlands are twisty and challenging, but then the road opens out to a series of ocean and grassland vistas inhabited mainly, it seems, by a few lucky cattle. How, I wondered, could such pristine coastline exist so near the megalopolis?

One especially magical stretch runs north from Bodega Bay, on the Marin/Sonoma county line, to Jenner at the mouth of the Russian River. If you park at Wrights Beach you can walk the last four miles to the river, along dramatic coastal bluffs, on a trail named for Bill and Lucy Kortum. When Bill died in 2015 the Press Democrat described him as a man “who spent most of his life fighting to rein in urban sprawl and protect public access to the coast” and “saved the landscape of Sonoma County.”

Recently, for the first time, I hiked the Kortum Trail end-to-end both ways. Here’s the view from Peaked Hill:

For that view, and for so much else that we love about Sonoma County, we owe thanks to Bill Kortum. It began in the early 1960s when his older brother Karl led the fight to stop the construction of a nuclear power plant on Bodega Head. Although I’ve changed my mind about the need for nuclear power, siting one on the San Andreas fault was a crazy plan. Thankfully that never happened.

We’re frequent visitors to Bodega Head, and we’d heard about the “Hole in the Head” — the excavated pit left behind when PG&E abandoned the project — but until recently we weren’t sure quite where it was. Turns out we’d driven right past it every time we made the hairpin turn on Westshore Road at Campbell Cove. What would have been a nuclear plant is now a tranquil pond enjoyed by waterfowl.

Many of the things we love about Sonoma County can be traced to the activism that began there, and has continued for fifty years, led or influenced by Bill Kortum.

The planned community at Sea Ranch would have blocked public access to 13 miles of coastline. That didn’t happen. Now there are six access trails.

The entire California coastline would look very different were it not for the oversight of the California Coastal Commission, established by voters in 1972 partly in response to the Sea Ranch controversy.

The Sonoma County landscape would look very different were it not for urban growth boundaries that restrict sprawl and encourage densification.

The shameful six-decade-long lack of a passenger rail system in the North Bay would have continued.

The 1200-mile California Coastal Trail would not be more than halfway complete.

I was a bit surprised to find no Wikipedia page for Bill Kortum, so I’ve created a stub, about which Wikipedia says: “Review waiting, please be patient. This may take 8 weeks or more.”

As a relative newcomer to Sonoma County I feel ill-equipped to write more of this history. Perhaps others who know more will add to that stub. Here are some of the best sources I’ve found:

– John Crevelli’s Fifty Year History of Environmental Activism in Sonoma County.

– Gaye LeBaron’s 2006 interview with Bill Kortum.

– The Press Democrat’s obituary.

You were nearing the end of your life when we arrived here, Bill. I never had the chance to meet you, or to thank you for having the vision to preserve so much that is special about this place, along with the will and political savvy to make it happen. We are deeply grateful.

Why Maine embraces immigration

An excellent choice for 4th-of-July podcast listening would be Maine, Africa on Radio Open Source. The intro:

We went to Portland, Maine, this week to meet newcomers from Central Africa, Angolans and Congolese asking for U.S. asylum. Fox News hit the panic button two weeks ago: their line was that Maine is being overrun, inundated by African migrants. On a long day in Portland, however, we found nobody sounding scared.

I’ve noticed the strong African influence in Portland when visiting, and wondered: “Why is that?” The answer is that Maine’s population is stable but aging. Immigrants are needed, and the state welcomes them.

A couple of moments in this podcast stopped me in my tracks.

With mayor Ethan Strimling:

“You know what, we don’t consider you somebody else’s problem, we consider you our opportunity.”

And with a woman interviewed on the street:

Chris Lydon: “How did Portland get to be so welcoming?”

Interviewee: “We’re Americans.”

Not everyone’s on board. There’s an anti-immigrant minority, but it is a minority. Most of the people in Portland, it seems, look backward to their own immigrant roots and forward to a future that embraces an influx of world cultures.

On this Independence Day I choose to believe that a national majority feels the same way, and will prevail.