My sister is writing a report for which she needs facts about the growth of New Jersey’s foreign-born population. She found some numbers at, and we explored them on a Facebook thread. For my friend Mike Caulfield, who’s writing a textbook called Making Fair Comparisons, the discussion reinforced a lot of what he’s been teaching lately. For me it was a reminder that the dream of straightforward access to canonical facts remains elusive.

I wanted to check my sister’s sources. She gave me this link: That page says New Jersey’s 2010 population was 8,791,894, of which 20.3% were foreign-born — so we can compute the number of those folks to be 1,784,754.

I never did find the 2000 counterpart to that report. While searching the FactFinder site, though, I found this page where, with further searching within the page — for Geography: New Jersey and “foreign born” — I landed on a report called “SELECTED CHARACTERISTICS OF THE NATIVE AND FOREIGN-BORN POPULATIONS 2010 ACS 1-year estimates” with an ID of S0501. According to it, there were 1,844,581 foreign-born New Jerseyans, or 21% (not 20.3%) of the same 8,791,894 total.

I cited that link in our Facebook discussion, but later was horrified to find that I actually hadn’t. The base URL never changes. If I navigate to a report on foreign-born New Jerseyans, and you navigate to the same report for Texans, or the whole US, it’s the same URL. This is catastrophic if you’re trying to have a discussion informed by canonical citation of source data.

Meanwhile I still hadn’t found the 2000 counterpart to Back on the FactFinder site I searched in vain for “SELECTED CHARACTERISTICS OF THE NATIVE AND FOREIGN-BORN POPULATIONS 2000″ and for combinations of terms like “foreign-born 2000.” So I searched the web for “foreign-born 2000 census”; both Google and Bing pointed me to From this PDF file I was able to extract New Jersey’s total (8.414,350) and foreign-born (1,476,327) populations in 2000. Now I could complete this table (using, arbitrarily, one of the values I found for 2010 foreign-born):

2000	8,414,340	1,476,327	17.5%
2010	8,791,894	1,784,754	20.3%

Now, finally, we could have the real discussion. Should growth be evaluated in terms of percentages, so (20.3-17.5)/20.3 = 15.7%, or absolute numbers, so (1.784-1.476)/1.476 = 20.9%? It depends, my friend Doug Smith said, on the point you’re trying to make:

When you do the calculation on the growth of the percentages it does not take into account that the total population also grew over the 10 years. So while the percentage of foreign- born people grew by 15.7%, the actual number of foreign-born people in the state grew by 20.3%. If you’re trying to make a case that depends on the total number, like services consumed or potential market size, then you should use the growth of total numbers. If you’re trying to make a case based on percentages, for example the likelihood of encountering a foreign-born individual, then growth based on percentages would be better.

Doug added this intriguing observation:

This small amount of data actually presents a very interesting picture. The total population of NJ grew 4.5% over ten years. During that time, the natural born population grew only 1%, while the foreign-born population grew 21%. This suggests that more than 80% of the population increase over these ten years came as a result of immigration. So, while going from 17.5% foreign-born to 20.3% foreign-born doesn’t seem like much of a change to me, the implications seem huge.

That made me wonder about comparable figures for other states. But the prospect of digging out the numbers from a mishmash of HTML pages and PDF files killed that curiosity. What would help? Let’s give every fact its own home page on the web. The OData is one good way to do that. Imagine as a web of data. A top-level path might be:

A next-level path might be:

A path to the ACS survey might be:

By year:

And finally, paths to individual facts might be:

Nothing’s hidden behind a JavaScript veil or stored in a cookie. The entire web of data is navigable in a standard browser, which displays human-readable Atom feeds if set for human viewing, or raw XML or JSON if used to discover URLs for machine processing. Every URL is a canonical home page for a data set or an individual datum. User-friendly search and navigational tools are built on top of this foundation. Nobody has to deal with raw URLs and feeds. But they’re always available.

I’m not ungrateful for what (and so many other sites) offer. Any kind of web access to data is infinitely better than no access. But there are better and worse ways to provide access. It’s 2012. We ought to be doing better by now.