Unmasking EdTech's Surveillance Infrastructure in the Age of AI
Danai Nhando / Feb 6, 2026
"Datafication" by Kathryn Conrad (Better Images of AI / CC 4.0)
In December 2024, PowerSchool, a leading provider of cloud-based software that manages student grades, attendance, and records for K–12 schools, detected unauthorized access to its Student Information System, the administrative backbone for approximately 16,000 schools serving nearly 50 million students across North America. By January 2025, the scope of the breach became clear: more than 62 million student records and nearly 10 million teacher records had been exfiltrated, representing the largest breach of children's data in US history.
The compromised data extended beyond basic identifiers. Names, addresses, birthdates, and contact information sat alongside Social Security numbers (SSNs), medical conditions, disability accommodations, individualized education plans, disciplinary records, and family income data linked to free and reduced lunch programs. For millions of children, their most sensitive educational and personal information, data they never consented to provide and cannot revoke now circulates in underground markets and, increasingly, as inputs to AI systems.
Eight months before the beach occurred, the EdTech Law Center issued a prescient warning. In May 2024 litigation, the organization cautioned that "[b]y collecting vast amounts of data from both students and their families, PowerSchool puts that data at risk." The risk materialized exactly as predicted.
The intrusion vector was unremarkable; PowerSchool’s system, with administrative access across thousands of districts, lacked mandatory multi-factor authentication for all accounts. In cybersecurity terms, this constitutes a category 1 control failure that industry standards have treated as a baseline requirement for over a decade. The breach exposed the edTech industry‘s governance model that has normalized the centralization of children's data at an unprecedented scale without commensurate security architecture, regulatory oversight, or enforceable data minimization requirements. One year later, that model remains largely intact.
The legal response: accountability without remediation
The litigation response followed predictable contours. Within weeks of disclosure, lawsuits proliferated across state and federal courts. By April 2025, 55 cases spanning nine districts had been consolidated under multidistrict litigation and assigned to Judge Roger T. Benitez in the Southern District of California. Plaintiffs alleged negligence, breach of contract, unjust enrichment, and violations of state consumer protection statutes.
Yet multidistrict litigation, while procedurally efficient, offers a limited structural remedy. It consolidates discovery, streamlines pretrial proceedings, and may ultimately produce settlements or judgments compensating victims for harm. What it does not do is compel systemic reform in how edTech platforms and school districts collect, retain, or govern student data. Courts adjudicate past harms; they do not redesign data architectures or impose ongoing compliance obligations beyond the parties before them.
The criminal prosecution followed a similar pattern. On October 14, 2025, Matthew Lane, a 20-year-old Massachusetts resident, was sentenced in US District Court to four years in federal prison after pleading guilty to unauthorized computer access, cyber extortion, and aggravated identity theft. He was ordered to pay $14 million in restitution, a figure that prosecutors acknowledged would likely never be collected in full. Lane's sentence illustrates the fundamental asymmetry between criminal accountability and data harm. A four-year prison term addresses culpability for the act of theft. PowerSchool reportedly paid a ransom to the hacker, but no technical or legal mechanism exists to enforce data deletion once information has propagated beyond the original breach point.
This creates a critical policy gap. Civil and criminal legal frameworks in the United States conceptualize data breaches as discrete events with finite consequences. Data, however, is non-rivalrous, infinitely replicable, and effectively permanent once released. For the 62 million students whose records were exposed, the breach is not an incident with a defined endpoint. It is an ongoing condition of vulnerability that will persist throughout their lives.
AI and the amplification of data manipulation
The breach occurred at an inflection point in edTech: the rapid integration of AI systems into K-12 environments. AI fundamentally alters the risk calculus for compromised student data in three critical ways.
First, AI enables large-scale data linkage and inference at speeds and scales previously unattainable. A student's PowerSchool record, even stripped of direct identifiers, can be probabilistically re-identified when combined with other datasets. Attendance patterns, course selections, accommodation records, and disciplinary histories create distinctive behavioral signatures. When cross-referenced with publicly available data (census records, property databases, social media), these signatures allow algorithmic re-identification with high confidence.
Rocher, Hendrickx, and de Montjoye’s research finds that 99.98% of Americans could be correctly re-identified in anonymized datasets using just 15 demographic attributes. For student records that include birthdates, ZIP codes, and school enrollment data, the re-identification threshold is even lower. AI systems can exploit these linkages to reconstruct identities that formal anonymization was intended to protect.
Second, AI systems treat breached data as training inputs, creating persistent exposure vectors. LAION-5B, a dataset used to train Stable Diffusion and other prominent AI models, contained hundreds of millions of images scraped from the internet, including extensive personally identifiable information. The dataset had been freely available for years, downloaded thousands of times, and incorporated into derivative models before the exposure was documented.
Once student data enters AI training pipelines, it becomes functionally irretrievable. Models trained on compromised datasets do not simply "store" that data; they encode patterns derived from it. Even if the original dataset is deleted, the trained model retains statistical representations that can reproduce sensitive information under certain query conditions. Recent research on membership inference attacks has demonstrated that AI models can be probed to reveal whether specific individuals' data were included in their training sets, and in some cases, to reconstruct portions of that training data. For example, in November 2023, researchers successfully extracted several megabytes of ChatGPT's training data by using a simple attack that prompted the model to repeat words indefinitely, causing it to emit verbatim training examples, including personal email addresses, phone numbers, and contact information.
For PowerSchool victims, this means their compromised records may already be embedded in AI systems they will encounter throughout their lives, including credit scoring algorithms, employment screening tools, insurance underwriting models, and educational analytics platforms. The breach created not only a one-time exposure but also the potential for permanent incorporation into companies’ prediction infrastructure.
Third, AI enables synthetic identity fraud at scale. According to RCB Bank, synthetic identity fraud, in which real and fabricated information are combined to create new identities, has emerged as the fastest-growing form of financial crime, surpassing traditional credit card fraud and identity theft. TransUnion reports that fraudsters now use generative AI to stitch together stolen data fragments into convincing synthetic identities, often enhanced with deepfake documents, and to combine stolen data into fake identities that appear legitimate, create counterfeit documents, and use automated AI programs to submit thousands of fraudulent applications. Fraudsters typically begin by obtaining authentic personal data, often focusing on SSNs from vulnerable groups, and then combine this information with fake names, addresses and other details to create a new, synthetic identity. Children are ideal targets. Unlike adults with established credit histories and monitoring systems, children typically have clean records and no reason to check credit activity for years or decades.
A study from Sentlink, a company that monitors customer data, found that 97% of individuals whose names, addresses, and SSNs were exposed in data breaches and subsequently traded on dark web markets experienced attempted identity theft. For children, the detection lag can span a decade or more. The PowerSchool breach exposed SSNs for millions of students, many of whom are elementary school age. By the time they apply for college loans, employment, or housing, fraudulent credit lines, tax filings, and criminal records may already be associated with their identities because someone misappropriated their identity information in the PowerSchool breach.
What should have changed—and what must
One year after the PowerSchool breach, while some major statewide contracts (notably in North Carolina) have transitioned to competitors like Infinite Campus, many prominent US districts continue to use the platform. The breach should have catalyzed a fundamental reckoning with the premise underlying edTech's sprawling surveillance infrastructure: that exhaustive data collection about children inherently serves educational purposes and therefore warrants institutional approval by default. Instead, the incident has been treated as a company-specific security failure rather than a symptom of systemic overreach, a missed opportunity to question whether schools should be amassing such comprehensive digital dossiers on students in the first place.
Three governance interventions would meaningfully reduce both the likelihood and the impact of future breaches in edTech platforms, both in the US and across the globe:
1. Impose enforceable data minimization and deletion requirements. Schools should collect only data that serves a specific, documented educational purpose; use it solely for that purpose; and retain it only as long as necessary before securely destroying it. Enrollment forms should clearly distinguish required fields from optional fields. SSNs should be collected only when legally mandated and segregated in systems with heightened access restrictions. Learning analytics and behavioral tracking data should be subject to retention limits and periodic audits. These requirements should be codified in federal law and enforced through regular compliance reviews, not treated as voluntary best practices.
2. Recognize student data governance as a children's rights issue, not merely a compliance framework. This requires rejecting the premise that schools can outsource responsibility for children's data to third-party vendors while claiming to act in students' interests. Vendor contracts should include enforceable security standards, data-use restrictions, breach-notification timelines, and deletion obligations. Students and families should have meaningful transparency into what data is collected, who accesses it, and when it is deleted. And critically, children should not bear the long-term consequences of institutional data governance failures they had no power to prevent.
3. Mandate security-by-default standards for any system handling student data. Multi-factor authentication, principle of least privilege access controls, encryption at rest and in transit, independent security audits, and breach detection systems should be non-negotiable baseline requirements, enforced through federal procurement rules and state contracting standards. These are not cutting-edge measures; they are decade-old best practices that have been optional in K-12 contexts for far too long.
The PowerSchool breach was the predictable outcome of a system that prioritizes administrative convenience and vendor profitability over children's privacy and security. The breach exposed 62 million students to lifelong identity theft risk, embedded their personal information in AI training datasets, and demonstrated that even basic security controls remain optional in systems entrusted with children's most sensitive data.
Children cannot audit their school's data retention policies or negotiate the terms under which their information is collected and shared. That responsibility falls to the institutions and decision-makers who design, regulate, and profit from these systems: school boards, state legislators, federal agencies, and edTech companies themselves. The PowerSchool breach was a test of that responsibility. One year later, it has been collectively shirked.
Authors
