Content Moderation, Encryption, and the LawJustin Hendrix / Jul 16, 2023
One of the most urgent debates in tech policy at the moment concerns encrypted communications. At issue in proposed legislation, such as the UK’s Online Safety Bill or the EARN It Act put forward in the US Senate, is whether such laws break the privacy promise of end to end encryption by requiring that content moderation mechanisms like client-side scanning. But to what extent are such moderation techniques legal under existing laws that limit the monitoring and interception of communications?
Today’s guest is James Grimmelmann, a legal scholar with a computer science background who-- along with Charles Duan-- recently conducted a review of various moderation technologies to determine how they might hold up in under US federal communication privacy regimes including the Wiretap Act, the Stored Communications Act, and the Communications Assistance for Law Enforcement Act (CALEA). The conversation touches on how technologies like server side and client side scanning work, the extent to which the law may fail to accommodate or even contemplate such technologies, and where the encryption debate is headed as these technologies advance.
What follows is a lightly edited transcript of the discussion.
James, I'm happy to have you back on the podcast this time to talk about a paper that I believe is still in the works, Content Moderation on End to End Encrypted Systems: A Legal Analysis with your co author, Charles Duan.
I would love to just get you to, in your own words, say why it is you chose-- at this moment-- to set out to write this piece of work.
So this comes out of work that some of my colleagues at Cornell Tech have been doing. Tom Ristenpart, who's a computer scientist, and his group have been working on, let's call it online safety.
With the technologies people use now, so one branch of their work, which has been very influential, deals with securing people's devices in cases that involve intimate partner abuse. Those are cases where the threats are literally coming from inside the house and the abusers may have access to people's devices in ways that traditional security models didn't include.
Another major strand that Tom and his team have been working on has to do with abuse prevention in end to end encrypted systems. So encrypted messaging is where the message is scrambled in a way so that nobody besides the sender and recipient can read it. Well, if you're sending that message through a server, through email or through a messaging system like Facebook Messenger or WhatsApp or Signal.
Then the question arises, is the message encrypted on its way from you to the Facebook servers and then from Facebook servers to its recipient, or is it encrypted in a way that not even Facebook can read it? If it's encrypted in a way that only you and the person you're sending it to can read it, and Facebook sees it as just an equally random string of gibberish, that's called end to end encryption.
And this has been promoted as an important privacy preserving technology, especially against government agencies and law enforcement that might try to surveil communications or have the big platforms do it for them. A challenge, however... With end to end encrypted messaging is that it can be a vector for abuse.
If the platform can't scan its contents, it can't look for spam or scams or harassment. Somebody who sends you harassing messages through Facebook Messenger, you'll receive it. But Facebook's detectors won't know it. And If you try to report it to Facebook, then Facebook doesn't have direct evidence of its own.
This was actually received through its platform. It's open to potential false reports of abusive messaging. And so in that context, Tom and other computer scientists have been trying to find techniques to mitigate abuse. How can you report abusive messages to a platform? Or if you're a member of a group that uses encrypted communications for all members of the group, and some platforms do now have, and encrypted group chats, how can you and the other participants say, so and so is being a jerk in our community. We don't want further messages from them. And so there’s this broad heading of computer science work on abused mitigation in end to end encrypted communications.
Long background on a bunch of computer science stuff, I am here as the law talking guy. So my postdoc Charles and I– he, like me, has a background in computer science as well as law– have been working with the computer scientists on the legal angles to this.
And in particular, Charles and I have been asking, do these abuse prevention mechanisms comply with communications privacy law? There are laws that prohibit wiretapping or unauthorized disclosure of stored electronic communications. Do these techniques for preventing abusive communications comply with the various legal rules that aim to preserve privacy?
Because in many ways, it would be a really perverse result if people using a technology designed to preserve their privacy can't also use a technology that makes those messaging safe because they would be held to have violated each other's privacy. Something very backwards about that result, but our communications privacy laws are so old that it takes a full legal analysis to be certain that this is safe to do.
So our draft, which is very long, goes through a lot of those legal details.
So I want to get into some of the questions that you pose, including some of the normative questions that you kind of address towards the end of the paper, which pertain to news of the moment questions around. The Online Safety Bill in the UK, for instance, and the fight over encryption that's happening there, et cetera.
I do want to give the paper its due and go through what you've tried to do methodically on some level. But I do want to start perhaps with that last point you just made, which is this idea that these technologies, encrypted messaging apps, are a different generation of communications technology that the law didn't anticipate.
Is that broadly true in your view?
That's probably true. Our communications privacy laws were written. Literally, with previous generations of technology in mind, it's called wiretapping because this applies to wire communications, which is a telegraph or a telephone that has a physical wire running and ultimately from one person to the other.
And we still use that terminology. And there are still a lot of assumptions from older technologies baked into how the laws are written and the concepts that they use.
So let's talk about the spate of laws that you looked at here. You looked at the Wire Tap Act; the Stored Communications Act; Pen Registers and Trap and Trace Devices; the Computer Fraud and Abuse Act; and the Communications Assistance for Law Enforcement Act– CALEA, as some folks will know it. Are there other laws that perhaps you'll have to look at in the final analysis?
So we've been taking this paper around to conferences, and we got excellent feedback that we also need to address mandatory reporting laws around child sexual abuse material. Because those two impose certain obligations on telecommunications providers or possibly participants when they become aware of certain kinds of material and so moderation techniques that could make them aware of those materials definitely trigger the obligations of those laws. I think it's ultimately the five you mentioned plus the CSAM laws.
So let's talk about the moderation approaches. And maybe it would be helpful for us to just go through them one by one. And in your words, if you can offer a description of what these technologies are.
Okay, let's start with message franking, which is really a technique designed to address the kind of scenario I mentioned to you before. You're using an end to end encrypted messaging system, and somebody sends you something abusive.
Pictures of their genitals, repeated messages saying, ‘I hope you die,’ something that you really don't want to receive. And the technical challenge that it's trying to solve is how do you make this reportable to the platform so the platform can help you without undermining the privacy guarantees of end to end encrypted messaging in the first place.
And the solution, which is incredibly ingenious technically, is to allow for a kind of verified reporting in which the recipient of a message can send a report to the platform that is provably based upon actual messages. The recipient can't forge the message and say, ‘Oh, this person sent me this abuse of content,’ when they didn't actually send it.
So the sender is locked in. They are committed to anything that they send. And if the recipient decides it's abusive, they can report it at the same time. The platform should learn nothing. Unless the recipient actually chooses to make a report unless and until that person says, I didn't want to receive this.
This violates platform policies. The platform should have to tell nothing about the message at all. And it turns out that by basically putting a couple of well designed electronic signatures on each message, you can design a system that does this. It's called message franking. The idea being like you frank a message with a stamp, and rubber stamp, you know, carries all the information the platform recipient will later need in case of an abuse report.
And I'm lumping forward tracing together with message franking because it's basically an extension of it. In forward tracing, if a message is reported as abusive, The platform can trace it back not to the person who sent that specific message, but to everybody before them in a chain if it was forwarded, and that might be relevant.
If a message gets forwarded to somebody and says, this is actually like illegal material that I did not want to be involved with. The platform can then run it back to the original sender who introduced it to the network, which could be useful in rooting out somebody who is using it for abusive purposes.
So basically, it's a clever application of cryptographic techniques that have been invented in this millennium after all of the communications privacy laws we discussed were drafted.
And which of the encrypted messaging apps that folks are familiar with at the moment are using this technique?
So it's basically research stage. Facebook is the one that is leading the way in terms of developing this technology. Facebook was one of the– their research arm was one of the original creators of one of the original message franking proposals. So they're the one that has invested the most in making this workable.
And of course, Facebook intends to make its Messenger encrypted by the end of the year, it's promised. So perhaps it's interested in doing so alongside the introduction of technologies like this. Let's talk about whether this comports with the various laws and frameworks that you've assessed. How does it stand up when you look back at the statute?
So this is an answer I'll probably give you repeatedly, which is, we think it's okay, but we're less certain about that than we would like to be.
So let's take the wiretap act. The wiretap act, as you might expect, prohibits intercepting electronic communications in a way that lets you learn their contents. And the classic case here is like the literal wiretap plugging into a phone cable. Or also connecting to a network box and just grabbing a copy of somebody's incoming email in flight as it arrives.
And it might seem like, well, there's no interception here because only when there's an actual abuse report made to the platform does the platform learn the contents of a message, but it's not quite that clean because the definition of contents in the Wiretap Act is quite broad. The statute defines it as any information concerning the substance purport or meaning of a communication.
And there's a non frivolous argument that this little franking tag, the little stamp that the platform gets applied to each message actually does contain some information about the substance of the message. It does allow the platform to verify the message's authenticity, and there are courts that have expressed at least doubt about whether this kind of metadata verifies a message's contents is in fact itself also contents. And if you go down that road, you wind up then asking a whole bunch of other statutory questions under the Wiretap Act. Does the participation of the platform in applying the franking tag to a message as it gets sent through from sender to recipient, Is that an interception under the statute again textually a hard question, and then perhaps most interestingly, and this was one really opens up a thorny set of issues.
Should we think about the participants in this communication as having consented to this process. Should the sender of the message be able to say, ‘Wait a minute. I didn't consent to all of this cryptographic mumbo jumbo that you did when I sent a message. I did not consent to the steps necessary to verify me as the sender. I thought I was using a completely encrypted end to end messaging system. I did not agree to any of this.’
And from one perspective, this is a bad argument for a person sending abusive messages to make. But from another, they do have a point that this does not completely comport with the way that end to end encrypted messaging is used in the broad public discourse.
If you think of it as meaning no one besides you and the recipient can ever learn anything about your message, then this is a small inroads on the privacy guarantees of E2EE.
So we're going to come back to that last comment I think more than once as we go through this and perhaps we'll address it in the summary conversation as well because I think you might be able to say that about each of these things.
But next you go to server side automated content scanning. A lot of folks like to toss out this phrase, homomorphic. Encryption. I liked the somewhat artful description you have of this technique where the server learns nothing. I'll read it.
“Imagine a blindfolded chef wearing thick mittens who follows instructions to take things out of a box, chop them up, put them in the oven for an hour at three 50 degrees, and then put it back in the box. This chef can roast vegetables for you, but doesn't learn whether you were roasting potatoes or parsnips. It's a pretty good description, I suppose, of how this is supposed to work, technically.”
Let's talk first, perhaps, about whether this technology works at all.
So, homomorphic encryption is another one of these really interesting modern developments in cryptography.
The idea is that you can perform a computation on some data without learning anything about the data. And this seems like a kind of pointless thing to do if it's just you working with your own data. But if you have some untrusted party who has a lot of Processing capacity and you want them to do some work for you.
It's actually quite valuable. Like if the chef can run an efficient enough kitchen, we might all hand off our vegetables to them to do this for us. And in particular, homomorphic encryption could be used to scan content for matching against certain kinds of. Like CSAM, Child Sexual Abuse Material registries, or certain kinds of spam detection, without letting the person doing the scanning know that it has been scanned in that way.
And you might think, well, what's the point then? Well, you can modify the message being transmitted. To flag it for the recipient so that before you open that picture of somebody's genitals, you might get a warning saying the attached image appears to be of somebody's genitals. Do you wish to proceed? And that would actually be a meaningful anti abuse factor that the server does this matching against a complicated model for you.
You don't have to have the whole huge database of these pictures on your device, and you might not be in a position to do it yourself easily. The platform can do this to help warn people about the messages that they're receiving.
Is this a legal technology, at least according to the laws that you reviewed?
Again, we think it's legal, but we're not as certain as we would like to be. Take the wiretap attack analysis. The platform can do things that manipulate the message. Once again, we're in that world of asking, is it receiving contents? Here, the argument against liability depends, I think, on some of the exceptions to Wiretap Act liability that the Act includes in it.
So, for example, the Wiretap Act has this exception for the ordinary course of business. In which platforms can inspect messages part of their ordinary operations and platforms routinely do spam detection and antivirus scanning on our message attachments already. So this seems to fit within the class of things that they already do.
The analysis under the other statutes is also pretty good. One of the nice things about this kind of encryption is that platforms don't retain any information once they do the processing. They send it out, it leaves their system. That means that they are not retaining the kinds of stored communications that could trigger the Stored Communications Act.
Thank you. We like it. We would like this to be legal. We think it is. We don't have 100% certainty.
And is it the case, based on your review, that this technology is still fragile, still unlikely to work at scale?
It's not scalable currently. Ordinary computation is fast. Applying and removing encryption is reasonably fast.
Homomorphic encryption is kind of slow. The work you have to do in order to compile your computation down into the kind of thing you can do blindfolded with mittens on makes it a lot less efficient. It's not surprising. Anything you do wearing thick, heavy gloves is going to be a lot less effective because you can't feel what you're doing.
And so it's not a scale worthy technology yet, but it's impossible enough than it might be that it's worth thinking in advance about its legality.
So next we'll talk about what is, you know, perhaps the most discussed potential form of content moderation for encrypted. Messaging apps these days, client side automated content scanning.
Of course, Apple proposed one such system. Apparently the UK Home Office is funding the development of prototypes in this space, perhaps in anticipation of the potential passage of the Online Safety Bill there. How does client side scanning work? Do you have another cooking metaphor that could explain this one to us?
And not quite as elegantly client side scanning is really you have the client that you are using to send messages. So the Facebook Messenger app or the Signal app or Apple's messaging app would perform some kind of computation, some check of your content on the device before it sent or when it's received, and the scanning then can flag either for the user or for some external authority, whether it matches against some database of concerning communications.
And is it legal?
This gets really complicated, in part because of the diversity of these systems. There are a lot of different architectures. Some of them involve trying to scan against databases without revealing to the client what's in the database. Because if you figure if a database is a prohibited content, you can't just give everybody a complete copy of the things you're not supposed to have.
And also because they involve communications, that is, if I'm trying to query what I've got on my device against some database of things. It may involve sending a comp, a digest of what I've got out to the network and back. And does that process constitute an interception? This brings us back to the same kinds of questions we asked when we were doing message franking.
Have I, as the user of this app, consented to have my data scanned in this way? And possibly to have some flag about its status being sent to the third party who's providing this app. Again, this is a hard question. I don't think you can answer it fully on the technical side. You can't just say, well because this app works this way and you ran the app you consented to it.
That same argument would say you consented to spy on your phone. But you also can't Just say, well, I didn't want this. So it's, there's no consent at some point. People have to know how the software they've been chosen to run. It's been explained to them works, or we have, you know, serious, you know, computer law violations.
Every time anybody is surprised by an app feature. So it's going to be very fact dependent in a slightly uncomfortable way.
You've mentioned there's some variability in terms of how these client side scanning schemes work. Are there versions of client side? scanning that you are more comfortable with than others?
Are there those that you've seen that you would regard as, you know, potentially spyware or very concerning from a privacy standpoint and ones that perhaps, I guess, are a little more responsible?
I mean, the obvious dividing line here is a client side app. That reports the results out to a third party versus one that merely reports it to parties to the communication.
That is, I might very well as a recipient want to have had the sender's device do a client side scan and have a cryptographic certification that it didn't include stuff in this abusive database. I could see that, and if that's not revealed to anybody outside the communication, it seems reasonably privacy friendly.
If it's scanning against the government provided database of terrorist supporting content, or the kinds of safety concerns that the UK Home Office would like to be monitoring for, that's a bigger intrusion on privacy. Now, it may be that the particular things on this list are particularly concerning, but you get into the fact that this is scanning your messaging for reporting out to the government, and you get into serious questions about the transparency of the process by which things are added to that database.
And so you really can't assess the privacy implications without having a larger conversation about the institutional setting.
Meredith Whittaker, who's president of Signal, was just on this podcast a few weeks ago. She recently said, “Client side scanning is a Faustian bargain that nullifies the entire premise of end to end encryption by mandating deeply insecure technology that would enable the government to literally check with every utterance before it is expressed.”
I find myself kind of fundamentally as concerned as Meredith about the idea of client side scanning, the idea of the introduction of, you know, some mechanism onto the device that essentially obviates the purpose of having encrypted communication. How do you feel about that? Having done this review?
It's complicated because the development of these computer science client side scanning techniques complicates what has previously been an easier binary. If your choice is we have communications privacy and the government cannot read our communications technically, or we don't and they can read everything we're sending, that's a very easy normative choice to make.
Thank you. Privacy against overwhelming surveillance and against the enormous power of the government. It's clearly worth preserving. What client side scanning does is it blurs that line a bit. It says the government isn't learning everything you say. It's only learning about certain specified kinds of communications.
And if you're not on that list, you actually not just have nothing to fear. You literally have revealed no information to the government because all the government learns is that this message did not match what was on the blacklist. So it's an intermediate case. Now, you could be seriously opposed to that case on the grounds that this is still a terrible situation involving unaccountable power. You might say, if we don't know with verifiable audits what's on that list, if the government can arbitrarily add things to that list and flag it, then in fact we're in a situation that's almost as bad as the first one because they can use it to root out subversion or criticism of the government and they can use it in really abusive ways that we could not detect But then you have computer scientists saying Well, if that's the concern, we need to have technical verification.
We need to have the government or third parties commit what's in the database that they're scanning for. And we can use cryptography to do that. And at that point you have this ongoing back and forth concern about let's negotiate over the exact details. I think, not unreasonably, a lot of privacy advocates say we cannot go down that road.
If we argue about the details, then We've conceded some essential point that this kind of surveillance is ever okay, and therefore we have to cut it off at the pass. We must not allow that.
You write in the paper, “a larger lesson that arises from our legal analysis is that these technologies challenge the notion of what end to end encryption is in the first place.”
I suppose that in many ways does sound quite a lot like Meredith Whittaker.
Yeah. And I think there is a view that you see sometimes in the encryption debates. That any compromise on end to end encryption is impossible, and that it's just math, that asking for government backdoors, or key escrow, or any of the other many schemes that have been proposed is fundamentally incompatible with end to end encryption, and the government is asking for something that technically, technologically cannot be done.
And I agree with that to the extent that a lot of times. When government asks, “We're saying, we're not asking for a backdoor. We're just asking for authorized access.” That is a meaningless distinction. And they're asking for something that, in practice, gives them the ability to read messages when they want to.
But I don't think the argument for “it's just math, and this is impossible” holds up when there are computer scientists producing work that produce intermediate positions between unbroken complete encryption and no encryption at all. Computer scientists are exploring that space in interesting and complex ways.
And you can say this particular compromise is a bad one, or you can say we should not do this work at all because it leads us down a dangerous road, but I don't think you can say it doesn't work. The computer scientists are proving that in fact it does.
So where do you think this is headed? You know, you're suggesting, I suppose, that the technology is going to evolve. We may see some other breakthrough down the line, or we may begin to evaluate, say, client side scanning schemes against one another and look at the potential of one system versus another, both in terms of its ability to be executed, but also in terms of, you know, whether it preserves privacy in some maximal way.
What, what do you see happening with this kind of dance between the law and the technology over the next five, ten years?
I think these techniques are going to continue to develop because the constituency for them is not just government snoops looking to surveil communications. The constituency is also internet users and communities who want online safety from other abusive users.
And so people are looking for techniques that are more privacy preserving than having centralized administrators. Or communications that are all in the clear. So how can we add, is the original motivation. Abuse prevention features to systems that have most of the privacy guarantees we associate with end to end encrypted messaging.
And then once, when you have those technological developments, I think we can have a very reasonable debate about which of them are normatively reasonable trade offs to have. And we don't have to have the same answer. For all of these technologies, I think in many respects, message franking is a pretty good set of technologies for a baseline for helping communities self-moderate and helping individuals protect themselves.
I think client side scanning in particular is a much harder genie to bottle. There are in some ways more moving parts, more trust required. And so I'm more skeptical of approaches there. I think they offer less bang for the buck. At the end of the day, you have to have these conversations about the specific technologies that could be deployed.
We know there's a lot at stake here. Just a couple weeks ago, there was a piece in the New York Times about how Russia is essentially developing a sort of surveillance supply chain, looking for any scrap of metadata it can collect from The use of encrypted apps like WhatsApp and Signal in order to monitor location, look at perhaps the groups that people are in and who they communicate with.
We know there are many other governments around the world that are probably doing very similar things. Do you see any connection between the legal landscape in the US and the safety of folks abroad who are trying to use these applications. Is there a connection between what is sort of possible in the kind of techno-legal swamp or stew that we're in here in the U.S. and implications for people abroad.
There are all kinds of connections. One of them in particular is that we should be extremely careful in the United States that we don't trigger CALEA. around the design of the systems. Because once CALEA is in play, you have to build the system with pretty pervasive wiretapping capacity built in.
That's essentially what all of the more repressive governments are asking for. That's a fundamental compromise. So it really does like the United States does not want to end up in that box because it basically gives authoritarian governments everything they're asking for. Beyond that. A lot of the things that I want to see are what will make encrypted messaging thrive.
It has to be legal, of course, but it also has to be usable for people. And there's a really fine trade off here between having appropriate safety features. And hiding information from governments and other surveillors who want to do harm. In fact, it's a very similar conversation about how you protect people from targeted surveillance and abuse.
And this is a thing that the encrypted messaging technologists at companies from Facebook to Signal think about. They're asking, what are the threat models? Who are the actors who are trying to surveil our users in various ways, or subject them to communications that are abusive, or subject them to abusive communications that deny their ability to effectively use these technologies.
Because if the technologies are such cesspits that people cannot feel safe on them, they won't use them and they will turn to other technologies to protect them more against the immediate threat, but may protect them less against pervasive governmental surveillance.
This is one of these papers where we just set out at the beginning to go and do the legal analysis and see where it led. And other than the fun of engaging pretty deeply with a bunch of very technical statutes, which I think is fun for only Charles and me in a very small group of people, it was illuminating to us the degree to which we have all of these different communications laws that have very similar concepts using different terms. That we have multiple different definitions of contents versus metadata across the statutes, and there is honestly no good reason to have that much complexity in our communications privacy laws.
Of the bills that are commonly considered to present a challenge to encryption in the US at the moment, things like the EARN It Act. Is there any bearing of your analysis on how those bills are written or how they would potentially be enacted?
Basically all of those bills that I've seen are bad ideas, but you don't need our paper to tell you that.
That's probably a good place to leave it. James Grimmelmann, thank you very much.
It's been a pleasure.