Freedom of Information Laws and Access to Government Data in the Age of AI: Two Recent Cases

Jake Karr / Jul 27, 2023

Jake Karr is the Deputy Director of NYU's Technology Law & Policy Clinic and a fellow at the Engelberg Center on Innovation Law & Policy, where he is a member of Knowing Machines, a research project tracing the histories, practices, and politics of how machine learning systems are trained to interpret the world.

In recent years, there has been a surge of concern about the massive amounts of data that governments collect about us. Calls for transparency and accountability have never been louder, especially when it comes to the ways that this data is fed into government AI systems to help “train” them to make decisions that directly but often secretly impact our lives—“from facial recognition and autonomous weapons to criminal risk assessments and public benefits administration,” as scholars Kate Crawford and Jason Schultz have put it.

There is growing consensus that a necessary if insufficient component of any effective regulatory regime for AI is greater access to and understanding of these training datasets that serve as “ground truth” for machine learning. But AI regulation in the United States is still in its “early days.” While we wait for the federal and state governments to get their acts together on AI, we’re left with the familiar, flawed tools of government transparency: freedom of information laws (FOILs). Although FOILs have come under sustained and legitimate criticism in recent years, two recent court decisions demonstrate the role that FOILs can continue to play in this age of big data, machine learning, and AI to compel meaningful public oversight of government-held data—what data governments are collecting, what they are doing with the data, and what decisions, automated or otherwise, they are making based on this data.

Government Records, Government Data

Every state and the federal government has some variation of a FOIL, which establishes a right to inspect executive agency records. Under these statutes, such records are presumptively open to the public upon request, though they are subject to certain broad exemptions from disclosure—exceptions that sometimes swallow the rule—for personal privacy, commercial trade secrets, national security, sensitive law enforcement information, and the like.

Importantly, FOILs don’t require governments to generate answers to questions or create new records in response to a request. When an individual or entity submits a FOIL request, agencies are only obliged to hand over whatever responsive records they retain in the ordinary course of their operations, and the requester has to take those records as they are. When most FOILs were initially enacted decades ago, government “records” generally meant, simply enough, paper files and documents. But as governments transitioned from analog to digital, the question of what a “record” is—and what it looks like for an agency to create one—became that much more contested. As Amy Kapczynski has noted, “data and information flows today are mediated by processes that are fundamentally more opaque”—and more complicated—“than their historical counterparts,” and this complicates conventional understandings of exactly what a government “record” is and should be.

At every turn, government agencies have attempted to use advances in their record keeping technologies to exclude important information from the records that they produce in response to FOIL requests. Legislative amendment was needed, for example, just to clarify that “records” under the federal Freedom of Information Act (FOIA) applied to electronic documents and data. And advocates had to fight subsequent court battles to ensure that FOIA extended to information like metadata or that merely searching a database to create search results was not tantamount to the creation of new records.

Two Cases, Two Crucial Wins for Access to Government Data

Two recent court decisions rejected the latest attempts by government agencies to duck their FOIL duties by weaponizing their recordkeeping technologies against data transparency. The issue at the heart of both cases arose out of the structure of the relational databases that the relevant agencies—federal in one case, state in the other—use to track, sort, and govern. A relational database is a basic data storage technology that allows for the efficient organization of large datasets. The database may contain numerous tables with different data points, but they are all nevertheless connected to each other by what are known as relational “keys.” These keys are just that—they unlock and make sense of a relational database by linking together related data points that may be scattered across the various tables in the database.

ACLU Immigrants’ Rights Project v. U.S. Immigration & Customs Enforcementinvolved the relational databases that Immigration and Customs Enforcement (ICE) uses to store vast amounts of data relating to millions of noncitizens and enforcement operations. ICE uses noncitizens’ A-Numbers—unique, personally identifiable numbers that DHS assigns to each individual under ICE’s purview—as the relational keys for its databases. Data about a specific individual may be scattered across multiple tables, but when ICE queries its databases by the individual’s A-Number, it can pull together all the data associated with that individual—tracking the individual across their interactions with ICE, from initial apprehension, detention, bond hearing, and removal.

Hoping to shed light on how noncitizens move through and are treated by the immigration system, the ACLU submitted a FOIA request back in 2018 for individual-level data about these parts of the deportation machine. Although there was no real dispute that the ACLU was presumptively entitled to the data, the request implicated legitimate privacy concerns. On the one hand, disclosing the databases’ relational keys—individual A-Numbers—could allow for easy identification of noncitizens, who never asked for their interactions with ICE to be made public. On the other, without the keys that relate data stored in one table to data stored in another, the requested data points would be stripped of context and meaning.

The ACLU proposed a workaround, asking ICE to replace the A-Numbers with randomly assigned, but still unique values. That way, the A-Numbers themselves wouldn’t be revealed, but the function they served as the keys to the agency’s databases would remain intact. Yet ICE rejected the ACLU’s proposal, arguing that substituting the A-Numbers with new numbers would require the agency to create new records. When the agency finally handed over the requested data—and only after the ACLU filed suit—it invoked FOIA’s privacy exemptions, redacted all of the A-Numbers, and produced dozens of separate, siloed tables of data, a useless data dump that rendered the agency’s overall system of immigration enforcement indecipherable.

Fortunately, in an opinion handed down in January, the U.S. Court of Appeals for the Second Circuit recognized the “perverse” implications of ICE’s move. In siding with the ACLU, the court held that “ICE may not rely on A-Numbers’ exemption from FOIA disclosure to deny the public equal access to non-exempt records.” The court distinguished between the “content” of a record and the “function” that the record serves in the agency’s system—here, the specific numerical content of an A-Number and its function as a relational key to unlock ICE’s databases. And the court agreed with the ACLU that the substitution of A-Numbers was a reasonable, required step “to shield the exempt content of A-Numbers while preserving the function necessary to afford public access” to the requested data on the same terms and “in the same manner as . . . the agency.”

The court emphasized that the way in which ICE had chosen to structure its databases created an unnecessary impediment to meaningful access to its data. ICE could have built its relational databases around randomly assigned, unique numbers and sidestepped this obvious and foreseeable privacy issue. But it didn’t, and allowing the agency to withhold the relational keys now would “encourag[e] agencies to make exempt records the singular means for gaining access to non-exempt records . . . and, thereby, effectively conceal those records from the public.”

Silverman v. Arizona Health Care Cost Containment System involved a similar request for data from a state-level agency—the Arizona Health Care Cost Containment System (AHCCCS)—and raised substantially the same question under Arizona’s state FOIL. (Disclosure: I helped litigate Silverman on behalf of the requesters.) AHCCCS is Arizona’s Medicaid agency, and it oversees the Arizona Long-Term Care System, which provides services to the elderly and individuals with intellectual and developmental disabilities. Like ICE, AHCCCS tracks the millions of individuals who interact with this system on relational databases. And like ICE, AHCCCS claimed to use personally identifiable information—AHCCCS IDs—as relational keys.

In 2020, two Arizona journalists submitted a request for data related to the agency’s benefits eligibility determinations, as part of an award-winning series of articles for the Arizona Daily Star and ProPublica investigating how the state cares for disabled individuals. After more than a year of negotiations, AHCCCS eventually offered to furnish some of the requested data, but it maintained that sharing this data alongside any relational keys would raise insurmountable medical privacy issues. Keen to avoid and accommodate this objection, the journalists had proposed that the agency instead encrypt the relational keys to shield the AHCCCS IDs (content) while preserving their relational role (function). But like ICE, AHCCCS refused, arguing that substituting the AHCCCS IDs with randomly assigned values through simple encryption would require the agency to create new records.

Last month, the Arizona Court of Appeals joined the Second Circuit in rejecting the agency’s empty vision of data access. The court held that this process of “redaction-by-encryption” does not constitute the creation of a new record under Arizona’s FOIL, and that the encryption of AHCCCS IDs—“substituting a unique hashed value that masks protected information without destroying its function in the database—is necessary to ensure a requestor receives, to the extent possible, a copy of the real record.” (AHCCCS may ask the Arizona Supreme Court to review the decision, so this might not be the final word in this case.)

In both ACLU and Silverman, the courts recognized that allowing an agency to store data in such a way that it is unintelligible without additional information that the agency possesses but need not share would render FOILs dead-letter law for government-held data. The decisions may have settled seemingly narrow technical disputes about the structure of relational databases, but they should be read to reflect a broader, fundamental understanding that FOILs cannot allow governments that adopt new recordkeeping technologies to invoke those very technologies to stymie meaningful data access.

To ensure this form of access, the courts evaluated the systems in question to determine what information the requestor would need to see and understand the data in the same way that the government does—not just the data points themselves, but any other “functional” information necessary to preserve the data’s intelligibility when produced. The simplicity of the central technical issue in these cases rendered the analysis correspondingly straightforward. To make the data stored in their relational databases understandable, the agencies needed to provide the relational keys.

A Future for FOILs?

The logic of these rulings can be extended and tailored to the various data collection, processing, and storage systems that government agencies use to monitor and govern us. FOILs might not impose affirmative obligations on governments to create new data or new explanations for the data in these systems. But in underscoring the indispensability of functional information about the relationships among and between government data—that is, “relational” information—ACLU and Silverman recognize an obligation to preserve for the public the context and intelligibility of government data. FOILs should afford the public not just mere access to data, but also access to sufficient information to give the data meaning in the context of the system in which the data is an input or output. In other words, the “real record” encompasses the connections that governments make among and between the data, and access to data comprises not only access to the data itself, but “equal access” to the ways in which the government sees and understands the data.

Of course, the AI systems powering the turn to automated governance are more complex than relational databases. At root, though, information organization and retrieval are what all modern database, machine learning, and AI systems are designed to do. And FOILs should prohibit governments from claiming that the technical capacity or opacity of a system prevents them from providing a fuller picture at least of the ways in which the government itself sees, understands, and connects the information in that system.

In applying these old laws to a new database technology, ACLU and Silverman thus gesture toward a significant if subtle role for FOILs to play as forcing mechanisms for meaningful, mediated government data transparency in the age of AI.


Jake Karr
Jake Karr is the Deputy Director of NYU's Technology Law & Policy Clinic and a fellow at the Engelberg Center on Innovation Law & Policy, where he is a member of Knowing Machines, a research project tracing the histories, practices, and politics of how machine learning systems are trained to interpr...