Over the weekend I was poking around in the recipient-reported data at recovery.gov. I filtered the New Hampshire spreadsheet down to items for my town, Keene, and was a bit surprised to find no descriptions in many cases. Here’s the breakdown:
# of awards | 25 | |
# of awards with descriptions | 05 | 20% |
# of awards without descriptions | 20 | 80% |
$ of awards | 10,940,770 | |
$ of awards with descriptions | 1,260,719 | 12% |
$ of awards without descriptions | 9,680,053 | 88% |
In this case, the half-dozen largest awards aren’t described:
award | amount | funding agency | recipient | description |
EE00161 | 2,601,788 | Sothwestern Community Services Inc | ||
S394A090030 | 1,471,540 | Keene School District | ||
AIP #3-33-SBGP-06-2009 | 1,298,500 | City of Keene | ||
2W-33000209-0 | 1,129,608 | City of Keene | ||
2F-96102301-0 | 666,379 | City of Keene | ||
2F-96102301-0 | 655,395 | City of Keene | ||
0901NHCOS2 | 600,930 | Sothwestern Community Services Inc | ||
2009RKWX0608 | 459,850 | Department of Justice | KEENE, CITY OF | The COPS Hiring Recovery Program (CHRP) provides funding directly to law enforcement agencies to hire and/or rehire career law enforcement officers in an effort to create and preserve jobs, and to increase their community policing capacity and crime prevention efforts. |
NH36S01050109 | 413,394 | Department of Housing and Urban Development | KEENE HOUSING AUTHORITY | ARRA Capital Fund Grant. Replacement of roofing, siding, and repair of exterior storage sheds on 29 public housing units at a family complex |
That got me wondering: Where does the money go? So I built a little app that explores ARRA awards for any city or town: http://elmcity.cloudapp.net/arra. For most places, it seems, the ratio of awards with descriptions to awards without isn’t quite so bad. In the case of Philadelphia, for example, “only” 27% of the dollars awarded ($280 million!) are not described.
But even when the description field is filled in, how much does that tell us about what’s actually being done with the money? We can’t expect to find that information in a spreadsheet at recovery.gov. The knowledge is held collectively by the many people who are involved in the projects funded by these awards.
If we want to materialize a view of that collective knowledge, the ARRA data provides a useful starting point. Every award is identified by an award number. These are, effectively, webscale identifiers — that is, more-or-less unique tags we could use to collate newspaper articles, blog entries, tweets, or any other online chatter about awards.
To promote this idea, the app reports award numbers as search strings. In Keene, for example, the school district got an award for $1.47 million. The award number is S394A090030. If you search for that you’ll find nothing but a link back to a recovery.gov page entitled Where is the Money Going?
Recovery.gov can’t bootstrap itself out of this circular trap. But if we use the tags that it has helpfully provided, we might be able to find out a lot more about where the money is going.
You’re absolutely right that project award identifiers ought to be given more prominence. Unfortunately, I think that emphasis probably needs to start higher up the chain than just the data consumer. In our work with the FAADS and FPDS systems (upon which Recovery reporting is based), we’ve found that records aren’t given award identifiers reliably enough to even total up the funding for a project across its various payments (or obligations, more accurately).
Other identifiers in the system have their problems, too. The system relies on DUNS numbers to identify recipients, but these aren’t particularly reliable, and are largely controlled by a private entity. Using a proprietary identifier was a terrible decision; the government really needs to fix this.
Finally, I’d note that the quality of the ARRA data is also definitely in question. I think there’s reason to expect that it’s better than the normal FAADS disclosure, if only because of the political attention it’s garnered. But more worrying to me than the missing descriptions you point to are the records that might be missing — something that we can’t spot except by doing a cross-walk with other funding records (which is easier said than done).
But let me end on a more cheerful note: I’m glad that the attention that Recovery funds are attracting is helping to expose some of these problems. They’ve been there for a long time — maybe now we can get them fixed!
we’ve found that records aren’t given award identifiers reliably enough to even total up the funding for a project across its various payments (or obligations, more accurately).
Doesn’t surprise me a bit.
And yes, the higher-ups ought to mandate metadata hygiene.
But even were that to happen, the knowledge of how that money is actually flowing through our society is held collectively. And it will take collective effort to materialize it.
As broken as the award-numbering system is, it exists. And we could do a lot with it right now, if we could easily tag those numbers onto the things we read and write online.
And…it’s getting a lot easier lately to envision a silo-crossing web application that could make that tagging possible. Fun, even.
Finally, I’d note that the quality of the ARRA data is also definitely in question.
Ya think? :-)
If I were applying for a $2.6M award I think I’d bother to write “Southwestern” vs “Sothwestern Community Services” — c’mon, please.
But more worrying to me than the missing descriptions you point to are the records that might be missing — something that we can’t spot except by doing a cross-walk with other funding records (which is easier said than done).
A high-level cross-check is possible, right?
I.e., is the sum of recipient-reported dollars received within shouting distance of the sum of government-reported dollars given?
Anyone done this?
This is possible, but difficult. You can’t easily start with the budget, because of its complexity and because it’s spread across years in ways that are hard to account for. You can’t get the Treasury records of expenditures because they haven’t been scrubbed to protect recipients’ privacy (if this could be fixed, it could probably be the best way to do a crosswalk with obligations). Instead you have to go to each agency, ask for/demand their financial records, then compare them to what was reported in FAADS/FPDS. Often you’ll have to do significant amounts of manual reconciliation to account for deviations from the reporting guidelines — to separate what’s just confusing from what’s genuinely missing or in error.
This is never done in a comprehensive way. In practice the issue keeps popping up when GAO tries to do a report about a specific question, finds it has to use this data, and notices that the data isn’t really good enough for serious analysis (at this point they usually proceed with caveats). Here’s one from 2005 that’s nominally about rural economic development; here’s a more recent one about the nonprofit sector.
On the upside, the text of FFATA states that the Comptroller General will be reporting about the state of these systems before the new year. It’ll be very interesting to see what that report finds.
I’m glad that the attention that Recovery funds are attracting is helping to expose some of these problems. They’ve been there for a long time — maybe now we can get them fixed!
I hope so. Let’s please just not miss the opportunity to apply our cognitive surplus to the fix.
Hi Jon,
We’ve been working on some of these issues with some recommendations here:
http://recovery.berkeley.edu/tech/
My colleague Raymond Yee has also been trying to do some cross checking. One bit of background is knowing the universe of accounts involved in the Recovery Act. We’ve made a FOIA request for getting a list of all the Treasury Accounts used for the Recovery, but we don’t have those data yet.
Thanks for writing about this! It’s a really interesting topic!
-Eric
Hi Jon,
I see that Eric Kansa has already posted about our work and our FOIA request. I’m redesigning my Mixing and Remixing Information course at Berkeley (for next semester) to focus on making sense of ARRA spending. I’m glad to see that you’ve gotten into looking at ARRA data yourself and hope that you’ll want to do even more work in the area.
-Raymond
Eric, good to meet you. And Raymond, nice to hear from you!
We’ve made a FOIA request for getting a list of all the Treasury Accounts used for the Recovery, but we don’t have those data yet.
At what feed could an interested party track the progress of that request?
One feed to follow is that for the recovery.gov tracking category on my blog: http://blog.dataunbound.com/category/recoverygov-tracking/feed/
Last week, we received an Excel spreadsheet from OMB in response to our FOIA request for an up-to-date list of Recovery TAFS See http://blog.dataunbound.com/2009/11/23/foia-outcome/ for details.
This is never done in a comprehensive way.
Suppose that it were. Suppose that every line item in those recipient reports added up — within shouting distance — to the totals reported by the government. We still wouldn’t know jack about outcomes.
But collectively that knowledge exists. It’s distributed across a broad swath of funders, recipients, and beneficiaries. It was never before possible to collate their narratives. Now it is possible. That doesn’t mean that it will happen. But it could.
Hi Jon,
Thanks for the App (h/t Micah Sifry).
I’m working on a project in Philadelphia, http://trackingchange.pbworks.com/ARRA-Advocacy-and-Outreach-Initiative, to connect minority-owned businesses with stimulus opportunities. The award descriptions will help them better identify projects to pursue.
Faye
Faye — I’d be curious to hear how you are making use of recovery.gov and whether you are directly analyzing any of the data (which you can download) to help “connect minority-owned businesses with stimulus opportunities”.
Make sure you filter by P for Prime and S for Sub-recipients. Subs do not provide a description of the work only Primes yet both are available from the download center.
To find the Prime description use the same Award Key as the Sub.
For the city of Keene there are 5 Prime Recipients and all reported a description.
You can type in the Award ID into either Google or the Recipient Search scope and the Award Summary page result surfaces all of the Recipient reported info into a single page.
For the city of Keene there are 5 Prime Recipients and all reported a description.
Thanks Robert, that helps a bit. In the case of the example I searched for…
http://elmcity.info/doublesearch/?q=S394A090030
…the Prime Recipient is:
EXECUTIVE OFFICE OF THE STATE OF NEW HAMPSHIRE
And the description is:
The purpose of this grant is to support and restore funding for elementary, secondary, and postsecondary education and, as applicable, early child hood education programs and services in States and local educational agencies
That description covers, for the whole state, 161 awards totaling $192,121,666.
There’s obviously more to the story. But when I search for the identifier S394A090030
all that comes back is the page I referenced before at recovery.gov. And also, now, some links to this blog.
My point, again: S394A090030 is a webscale identifier that thousands of people involved in hundreds of funded projects could be using to tell much more of the story. I think a lot of them would like to write and tell it, and a lot of us would like to read and hear it.
I don’t expect the ARRA spreadsheet alone to tell the story. It can’t possibly. But it could be a table of contents for a book of stories written by many people.
I like Jon’s analogy of the ARRA spreadsheets to “table of contents for a book of stories written by many people.” In our report “Web Services for recovery.gov”, we argue for the use of various identifiers to help make such a table of contents: http://escholarship.org/uc/item/0fv601z8?pageNum=7#page-7
For example, the treasury account symbol (TAS) has been of particular interest to me. In one of the grants to Keene, NH (2009RKWX0608) is part of the Community Oriented Policing Services super-program of the Recovery Act:
COMMUNITY ORIENTED POLICING SERVICES
For an additional amount for ‘‘Community Oriented Policing
Services’’, for grants under section 1701 of title I of the 1968
Omnibus Crime Control and Safe Streets Act (42 U.S.C. 3796dd)
for hiring and rehiring of additional career law enforcement officers
under part Q of such title, notwithstanding subsection (i) of such
section, $1,000,000,000.
(See http://www.govtrack.us/congress/billtext.xpd?bill=h111-1&version=enr&nid=t0:enr:237 )
All the grants/contracts tied to Community Oriented Policing (COPS) are tied to a TAS of 15-0412 (15 is the Treasury symbol for the Department of Justice)
Wouldn’t it be useful to be able to find all the programs funded under COPS (TAS = 15-0412) across the country to see what the $1 billion appropriate for this purpose is doing? And, of course, COPS is just one of many different programs funded under ARRA….
Wouldn’t it be useful to be able to find all the programs funded under COPS (TAS = 15-0412) across the country
Sure. In this case, there are a couple of ways to isolate the 7 COPS awards for NH.
1. The titles match: “The COPS Hiring Recovery Program (CHRP) provides funding directly to law enforcement agencies…”
2. The award numbers share a common pattern:
2009RKWX0612
2009RKWX0609
2009RKWX0613
2009RKWX0617
2009RKWX0608
2009RKWX0614
2009RKWX0616
If these correspondences hold across all 50 spreadsheets, then there’s an easy algorithmic way to tag all COPS entries with TAS_15-0412 and link them to other things so tagged.
Ideally the tagging would be done at the source. But there’s no need to wait for that to happen. If it’s useful and important, it can be done in a view overlaid onto the source.
Agreed. The AwardID alone is obviously insufficient though as it refers only to Primes. You need a key pair of AwardID and OrderID to specifically identify the Subs.
Agencies also have their own systems for creating the AwardID number so there are some inconsistencies.
The AwardID alone is obviously insufficient though as it refers only to Primes.
The award number alone is not ideal, but it’s what we’ve got and what we are likely to have in the foreseeable future.
Given that reality, the award number can be augmented with other clues — either from within the Arra data or from outside it — to create views of the data that give more traction.
You need a key pair of AwardID and OrderID to specifically identify the Subs.
Unless OrderID is missing, as is true for the COPS example discussed in #14 above.
But in any case, if that kind of key pair is a requirement, few will be able to meet it, and little or no collective annotation will emerge.
From a pure information management perspective, you want to uniquely identify records with key pairs.
But from a social information management perspective, you want to keep the activation threshold really low, so that it’s quick and easy for people to associate things with other things.
People might be interested in Stimulus Watch 2.0, which just soft-launched today. To get contracts for Keene, NH, look at
http://stimuluswatch.org/2.0/performance_places/city/NH/03431/keene
Cool!
This example does, BTW, amplify the point already discussed in this thread about Primes vs Subs. The Stimulus Watch page reports only Primes (and actually, just 4 of 5 of them), summing to $1.2 million. But there are 25 Keene-related awards summing to $10.9 million.
ARRA’s default data model doesn’t deliver that wider view, but we can — I would argue must — augment it so we can expose and annotate more of what’s going on.
Jon, any further work seeing where the money is going? Thanks!