Generative AI, Free Speech, & Public Discourse: Why the Academy Must Step Forward

Tim Bernard / Feb 23, 2024

"Generative AI, Free Speech, & Public Discourse," Columbia University, February 20, 2024.

On Tuesday, Columbia Engineering and the Knight First Amendment Institute at Columbia University co-hosted a well-attended symposium, “Generative AI, Free Speech, & Public Discourse.” The event combined presentations about technical research relevant to the subject with addresses and panels discussing the implications of AI for democracy and civil society.

While a range of topics were covered across three keynotes, a series of seed funding presentations, and two panels—one on empirical and technological questions and a second on legal and philosophical questions—a number of notable recurring themes emerged, some by design and others more organically:

  • The importance of interdisciplinary study
  • A critical role for academia
  • Recommendations for government regulation
  • Parallels and overlap with social media governance
  • The transitory nature of the generative AI paradigm shift

Interdisciplinary study

This event was part of one partnership amongst others in an effort that Columbia University president Manouche Shafik and engineering school dean Shih-Fu Chang referred to as “AI+x,” where the school is seeking to engage with various other parts of the university outside of computer engineering to better explore the potential impacts of current developments in artificial intelligence. (This event was also a part of Columbia’s Dialogue Across Difference initiative, which was established as part of a response to campus conflict around the Israel-Gaza conflict.) From its founding, the Knight Institute has focused on how new technologies affect democracy, requiring collaboration with experts in those technologies.

Speakers on the first panel highlighted sectors where they have already seen potential for positive societal impact of AI, outside of the speech issues that the symposium was focussed on. These included climate science, drug discovery, social work, and creative writing. Columbia engineering professor Carl Vondrick suggested that current large language models are optimized for social media and search, a legacy of their creation by corporations that focus on these domains, and the panelists noted that only by working directly with diverse groups can their needs for more customized models be understood. Princeton researcher Arvind Narayanan proposed that domain experts play a role in evaluating models as, in his opinion, the current approach of benchmarking using standardized tests is seriously flawed.

During the conversation between Jameel Jaffer, Director of the Knight Institute, and Harvard Kennedy School security technologist Bruce Schneier, general principles for successful interdisciplinary work were discussed, like humility, curiosity and listening to each other; gathering early in the process; making sure everyone is taken seriously; and developing a shared vocabulary to communicate across technical, legal, and other domains. Jaffer recalled that some proposals have a lot more credibility in the eyes of policymakers when they are interdisciplinary. Cornell Tech law professor James Grimmelman, who specializes in helping lawyers and technologists understand each other, remarked that these two groups are particularly well-equipped to work together, once they can figure out what the other needs to know.

Academia and AI

President Shafik declared that if a responsible approach to AI’s impact on society requires a “+x,” Columbia (surely along with other large research universities) has “lots of xs.” This positions universities as ideal voices for the public good, to balance out the influence of the tech industry that is developing and controlling the new generation of large language models.

Stanford’s Tatsunori Hashimoto, who presented his work on watermarking generative AI text outputs, emphasized that the vendors of these models are secretive, and so the only way to develop a public technical understanding of them is to build them within the academy, and take on the same tasks as the commercial engineers, like working on alignment fine-tuning and performing independent evaluations. One relevant and striking finding by his group was that the reinforcement learning from human feedback (RLHF) process tends to push models towards the more liberal opinions common amongst highly-educated Americans.

The engineering panel developed a wishlist of infrastructure resources that universities (and others outside of the tech industry) need to be able to study how AI can be used to benefit and not harm society, such as compute resources, common datasets, separate syntax models so that vetted content datasets can be added for specific purposes, and student access to models. In the second panel, Camille François, a lecturer at the Columbia School of International and Public Affairs and presently a senior director of trust & safety at Niantic Labs, highlighted the importance of having spaces, presumably including university events such as the one at Columbia, to discuss how AI developments are impacting civil discourse. On a critical note, Knight Institute executive director Katy Glenn Bass also pointed out that universities often do not value cross-disciplinary work to the same degree as typical research, and this is an obstacle to progress in this area, given how essential collaboration across disciplines is.

Government and regulation

Proposals for regulation were made throughout the symposium, a number of which are listed below, but the keynote by Bruce Schneier was itself an argument for government intervention. Schneier’s thesis was, in brief, that corporation-controlled development of generative AI has the potential to undermine the trust that society needs to thrive, as chatbot assistants and other AI systems may present as interpersonally trustworthy, but in reality are essentially designed to drive profits for corporations. To restore trust, it is incumbent on governments to impose safety regulations, much as they do for airlines. He proposed a regulatory agency for the AI and robotics industry, and the development of public AI models, created under political accountability and available for academic and new for-profit uses, enabling a freer market for AI innovation.

Specific regulatory suggestions included:

  • Transparency, in specific:
    • Quantitative data about how commercial models are being used (e.g. by foreign adversaries)
    • Special access for researchers and journalists, including interaction with the models without the weak outer guardrails against abuse
    • Data set transparency
  • Safe harbor laws to protect good-faith researchers who breach terms of service
  • Explainability requirements
  • Determination of mandatory ethical standards
  • Requiring non-discriminatory licensing to prevent model owners from privileging their own products and those of close partners.

A couple of cautions were also voiced: Narayanan warned that the “Liar’s Dividend” could be weaponized by authoritarian governments to crack down on free expression, and François noted the focus on watermarking and deepfakes at the expense of unintended harms, such as chatbots giving citizens incorrect voting information.

Social media

There was surprisingly little discussion during the symposium of how generative AI specifically influences public discourse, which Jaffer defined in his introductory statement as acts of speaking and listening that are part of the process of democracy and self-governance. Rather, much of the conversation was about online speech generally, and how it can be influenced by this technology. As such, an earlier focus of online speech debates, social media, came up a number of times, with clear parallels in terms of concern over corporate control and a need for transparency.

Hashimoto referenced the notion that social media causes feedback loops that greatly amplify certain opinions. LLMs can develop data feedback loops which may cause a similar phenomenon that is very difficult to identify and unpick without substantial research. As chatbots become more personalized, suggested Vondrick, they may also create feedback on an individual user level, directing them to more and more of the type of content that they have already expressed an affinity for, akin to the social media filter bubble hypothesis.

Another link to social media was drawn in the last panel, during which both Grimmelmann and François drew on their expertise in content moderation. They agreed that the most present danger to discourse from generative AI is inauthentic content and behavior overwhelming the platforms that we rely on, and worried that we may not yet have the tools and infrastructure to counter it. (François described a key tension between the “Musk effect” pushing disinvestment in content moderation and the “Brussels effect” encouraging a ramping up in on-platform enforcement via the DSA.) At the same time, trust and safety approaches like red-teaming and content policy development are proving key to developing LLMs responsibly. The correct lesson to draw from the failures to regulate social media, proposed Grimmelmann, was the danger of giving up on antitrust enforcement, which could be of great value when current AI foundation models are developed and controlled by a few (and in several cases the same) corporations.

The transitory nature of the generative AI paradigm shift

One final theme was a framing of the current moment as one of transition. Even though we are grappling with how to adapt to realistic, readily available synthetic content at scale, there will be a point in the future, perhaps even for today’s young children, that this will be intuitively understood and accounted for, or at least that media literacy education, or tools (like watermarking) will have caught up.

Several speakers referenced prior media revolutions. Narayanan was one of several who discussed the printing press, pointing out that even this was seen as a crisis of authority: no longer could the written word be assumed to be trusted. Wikipedia was cited by Columbia Engineering professor Kathy McKeown as an example of media that was initially seen as untrustworthy, but whose benefits, shortcomings, and suitable usage are now commonly understood. François noted that use of generative AI is far from binary and that we have not yet developed good frameworks to evaluate the range of applications. Grimmelman mentioned both Wikipedia and the printing press as examples of technologies where no one could have accurately predicted how things would shake out in the end.

As the Knight Institute’s Glenn Bass stated explicitly, we should not assume that generative AI is harder to work through than previous media crises, or that we are worse equipped to deal with it. However, two speakers flagged that the tech industry should not be the given free rein: USC Annenberg’s Mike Ananny warned that those with invested interests may attempt to prematurely push for stabilization and closure, and we should treat this with suspicion; and Princeton’s Narayanan noted that this technology is producing a temporary societal upheaval and that its costs should be distributed fairly. Returning to perhaps the dominant takeaways from the event, these comments again implied a role for the academy and for the government in guiding the development of, adoption of, and adaptation to the emerging generation of generative AI.


Tim Bernard
Tim Bernard is a tech policy analyst and writer, specializing in trust & safety and content moderation. He completed an MBA at Cornell Tech and previously led the content moderation team at Seeking Alpha, as well as working in various capacities in the education sector. His prior academic work inclu...