Controlling a browser extension with puppeteer

Web technology, at its best, has always occupied a sweet spot at the intersection of two complementary modes: interactive and programmatic. You interact with a web page by plugging its URL into a browser. Then you read what’s there or, if the page is an app, you use the app. Give that same URL, web software can read the page, analyze it, extract data from it, interact with the page, and weave its data and behavior into new apps. Search is the classic illustration of this happy synergy. A crawler reads and navigates the web using the same affordances available to people, then produces data that powers a search engine.

In the early days of the web, it was easy to automate the reading, navigation, analysis, and recombination of web pages because the pages were mostly just text and links. Such automation was (and still is) useful for analysis and recombination, as well as for end-to-end testing of web software. But things got harder as the web moved away from static server-generated pages and toward dynamic client-generated pages made with JavaScript.

All along the way, we’ve had another powerful tool in our kit. Web pages and apps, however they’re produced, can be enhanced with extra JavaScript that’s injected into them. My first experience of the power of that approach was a bookmarklet that, when activated on an Amazon page, checked your local library to see if the book was available there. Then came Greasemonkey, a framework for injecting “userscripts” into pages. It was the forerunner of browser extensions now converging on the WebExtensions API.

The Hypothesis web annotator is a good example of the kind of software that relies on the ability to inject JavaScript into web pages. It runs in two places. The annotator is injected into the web page you’re annotating; the sidebar (i.e. the app) runs in an iframe that’s also injected into the page. The annotator acquires page metadata, reacts to the selection event, forms W3C-style selectors that define the selection, and sends that information (by way of JavaScript messages) to the sidebar. The sidebar queries the server, retrieves annotations for the page into which the annotator was injected, displays the annotations, tells the annotator about navigation it should sync with, and listens for messages from the annotator announcing selections that serve as the basis of new annotations. When you create a new annotation in the sidebar, it messages the annotator so it can anchor it. That means: receive annotation data from the sidebar, and highlight the selection it refers to.

Early in my Hypothesis tenure we needed to deploy a major upgrade to the anchoring machinery. Would annotations created using the old system still anchor when using the new one? You could check interactively by loading HTML and PDF documents into two versions of the extension, then observing and counting the annotations that anchored or didn’t, but that was tedious and not easily repeatable. So I made a tool to automate that manual testing. At the time the automation technology of choice was Selenium WebDriver. I was able to use it well enough to validate the new anchoring system, which we then shipped.

Now there’s another upgrade in the works. We’re switching from a legacy version of PDF.js, the Mozilla viewer that Hypothesis uses to convert PDF files into web pages it can annotate, to the current PDF.js. Again we faced an automation challenge. This time around I used the newer automation technology that is the subject of this post. The ingredients were: headless Chrome, a browser mode that’s open to automation; puppeteer, an API for headless Chrome; and the devtools protocol, which is how the Chromium debugger, puppeteer, and other tools and APIs control the browser.

This approach was more powerful and effective than the Selenium-based one because puppeteer’s devtools-based API affords more intimate control of the browser. Anything you can do interactively in the browser’s debugger console — and that’s a lot! — you can do programmatically by way of the devtools protocol.

That second-generation test harness is now, happily, obsolete thanks to an improved (also puppeteer-based) version. Meanwhile I’ve used what I learned on a couple of other projects. What I’ll describe here, available in this GitHub repo, is a simplified version of one of them. It illustrates how to use puppeteer to load an URL into an extension, simulate user input and action (e.g., clicking a button), open a new extension-hosted tab in reaction to the click, and transmit data gathered from the original tab to the new tab. All this is wrapped in a loop that visits many URLs, gathers analysis results for each, combines the results, and saves them.

This is specialized pattern, to be sure, but I’ve found important uses for it and expect there will be more.

To motivate the example I’ve made an extension that, while admittedly concocted, is plausibly useful as a standalone tool. When active it adds a button labeled view link elements to each page you visit. When you click the button, the extension gathers all the link elements on the page, organizes them, and sends the data to a new tab where you view it. There’s a menagerie of these elements running around on the web, and it can be instructive to reveal them on any given page.

The most concocted aspect of the demo is the input box that accompanies the view link elements button. It’s not strictly required for this tool, but I’ve added it because the ability to automate user input is broadly useful and I want to show how that can work.

The extension itself, in the repo’s browser-extension folder, may serve as a useful example. While minimal, it illustrates some key idioms: injecting code into the current tab; using shadow DOM to isolate injected content from the page’s CSS; sending messages through the extension’s background script in order to create a new tab. To run it, clone the repo (or just download the extension’s handful of files into a folder) and follow these instructions from https://developer.chrome.com/extensions/getstarted:

  1. Open the Extension Management page by navigating to chrome://extensions. The Extension Management page can also be opened by clicking on the Chrome menu, hovering over More Tools then selecting Extensions.

  2. Enable Developer Mode by clicking the toggle switch next to Developer mode.

  3. Click the LOAD UNPACKED button and select the extension directory.

This works in Chromium-based browsers including Chrome, Brave, and the new version of Edge. The analogous procedure for Firefox:

Open “about:debugging” in Firefox, click “Load Temporary Add-on” and select any file in your extension’s directory.

The Chromium and Firefox extension machinery is not fully compatible. That’s why Hypothesis has yet to ship a Firefox version of our extension, and why the demo extension I’m describing here almost works in Firefox but fails for a reason I’ve yet to debug. Better convergence would of course be ideal. Meanwhile I’m confident that, with some elbow grease, extensions can work in both environments. Chromium’s expanding footprint does tend to erode the motivation to make that happen, but that’s a topic for another day. The scope of what’s possible with Chromium-based browsers and puppeteer is vast, and that’s my topic here.

Of particular interest to me is the notion of automated testing of workflow. The demo extension hints at this idea. It requires user input; the automation satisfies that requirement by providing it. In a real workflow, one that’s driven by a set of rules, this kind of automation can exercise web software in a way that complements unit testing with end-to-end testing in real scenarios. Because the software being automated and the software doing the automation share intimate access to the browser, the synergy between them is powerful in ways I’m not sure I can fully explain. But it makes my spidey sense tingle and I’ve learned to pay attention when that happens.

Here’s a screencast of the runnning automation:

The results gathered from a selection of sites are here.

Some things I noticed and learned along the way:

Languages. Although puppeteer is written in JavaScript, there’s no reason it has to be. Software written in any language could communicate with the browser using the devtools protocol. The fact that puppeteer is written in JavaScript introduces potential confusion in places like this:

const results = await viewPage.evaluate((message) => {
  // this blocks run in the browser's 2nd tab
  const data = JSON.parse(document.querySelector('pre').innerText) 
  ...

The first line of JS runs in node, the second line runs in the browser. If puppeteer were written in a different language that would be obvious; since it isn’t I remind myself with a comment.

Settings. If you need the automated instance of Chromium to remember login info or other things, you can add userdir: {PATH} to the object passed to puppeteer.launch.

Interactivity. It may seem odd that the demo is using headless Chrome but specifies headless: false in the object passed to puppeteer.launch. I do that in order to be able to watch the automation when running locally; I’d turn it off when running automation in the cloud. Although I’ve not yet explored the idea, an interactive/automated hybrid seems an intriguing foundation for workflow apps and other kinds of guided experiences.

Posted in .

One thought on “Controlling a browser extension with puppeteer

Leave a Reply