Generative Language Models and Social Progress: Concepts And Considerations

Scott Timcke / Jan 24, 2023

Scott Timcke is a Senior Research Associate at Research ICT Africa, an African think tank based in Cape Town, where he leads the Information Disorders and AI Risk & Cybersecurity Projects.

Open AI's ChatGPT research interface was released last year. Composite image/Tech Policy Press

On November 30, 2022, the California start-up OpenAI publicly released ChatGPT. ChatGPT is a prototype chatbot interface that leverages a large language model to generate answers to prompts provided by users. Enthusiasm over the human-like responses of ChatGPT in niche online communities led to the software becoming an internet and media sensation. In less than a week the tool surpassed 1 million users. Use cases multiplied and commercial prospectors began to think of ways to harness this software to reduce overheads, induce sales or create new kinds of businesses.

While artificial intelligence (AI) and society researchers have cautioned about proverbial public road-testing of immature technologies, the cascade of proclaimed easy breakthroughs creates pressures for people to adopt ChatGPT lest they lose out on first-mover advantages. Proponents provide standard promises of refinement later, although experience tells us that “if an unreliable linguistic mash-up is freely accessible, while original research is costly and laborious,” future innovation will be driven by the “bad money.”

This narrative about the power of generative AI models shares features with other technologies that have followed the hype curve. Yet, where enthusiasm for blockchain-based technologies like crypto and non-fungible tokens (NFTs) appears primarily about creating a new class of assets, ChatGPT and other generators are promoted as poised to “diversify and intensify the creative powers of human labor,” perhaps even to the extent that we are “on the verge of qualitatively transforming the way in which we work, play, and live together,” to invoke Richard Barbrook and Andy Cameron’s 1996 essay, The Californian Ideology.

Although writing prior to the dot.com crash that took place at the turn of the century, Barbrook and Cameron were prescient observers of how extraordinary capital investment into information and communication technology – guided by optimistic beliefs about private gains and public benefit – created conditions ripe for rapid, iterative product development. For example, Microsoft is notorious for shipping unfinished software packaged with its Windows operating systems, with the result that early users effectively act as unpaid software testers for the multi-billion dollar company. (Microsoft, notably, is betting big on OpenAI.)

Captured in adages like “fail fast” or Mark Zuckerberg’s directive to “move fast and break things”, rapid iterative development in and through the public was deemed an acceptable business risk – a necessity even in lean manufacturing thought. The first benchmark was “good enough”. Fixes would follow if the technology proved viable. Returning to ChatGPT, Sam Altman, a co-founder of OpenAI, is forthcoming about the limitations of the technology: “It’s a mistake to be relying on it for anything important right now. It’s a preview of progress; we have lots of work to do on robustness and truthfulness,” he tweeted last month. But a narrow focus on the practical reliability of generative models may distract from more foundational critiques that have immediate impacts on the prospect for collective political life.

Writing in the Age of Computational Reproduction

Perhaps in part arising from the close association between the US software industry and US universities, there is an interest in how ChatGPT might shape knowledge production and other immaterial work. Even as the release of ChatGPT coincided with the end of the US fall semester, there was much discussion about how generative language could alter essay assignments in the humanities and social sciences by lowering the threshold for fabricated writing and cheating more generally. More dramatically, Stephen Marche declared that “the college essay is dead,” a cry that is reminiscent of claims about the potential negative impact of Wikipedia decades prior. Academic publishers are also thinking about the potential ramifications to their routine practices.

Within the wider software development community, Stack Overflow – a website for programmers to ask and answer software-related questions – has temporarily suspended ChatGPT-generated submissions. This is mostly because ChatGPT’s production of code is not particularly good. This means “the average rate of getting correct answers from ChatGPT is too low”. Just as with members of the academic community, there is worry among the developers both about devaluation of the service if junk answers flood the site, and how generated language models threaten human communication by drowning it in nonsense. Given the centrality of Stack Overflow for software production around the world, junk answers have knock-on effects for safe, effective production. There is wisdom to the decision taken by the Stack Overflow moderators.

Admittedly, much of what appears to be an exaggerated reaction in higher education is closely bound up in a nested conflict over the appropriate role of faculties that are frequently and readily dismissed for not being sufficiently vocationally oriented. For some, ChatGPT illustrates how science, technology, engineering and mathematics (STEM) fields have made the humanities obsolete. Yet as the case of Stack Overflow underscores, acquired judgement remains a keystone in trusted professional communities, the very kind of disposition the humanities seek to cultivate.

Even people pointing out the best examples of use cases are creating a selection effect – encouraging a skewed perception of the baseline utility of generative AI. This effect is magnified when these selections are shared on social media platforms designed to cater to popular engagement, not quality. Due to this social influence, mediocre ideas will dominate in the short term. This is simple reproduction, in which aesthetic instruments are reified as pure epistemological inquiry.

That the text outputs of ChatGPT seem plausible is another matter, albeit of a different kind that speaks to qualitative aspects of social change. As researchers point out, generative models are prone to elementary errors in fact, logic, and reasoning. Indeed, the output of these systems “sounds right even when it is making things up.” Perhaps this is because our expectations around writing have greatly changed. A serious, impartial style is thought to be substantive, even if there is little substance to the text itself. Whether from the rise of the administrative state leaving people disenchanted, or from the proliferation of legalistic hedged disclaimers, there is little room left for poetics in formulaic writing. When last did you read a policy brief that left you impressed with the author’s style?

Eclipse of Reason

One core problem with large, generative language models is that they are flawed in design and compromised during data collection. In addition to inherent inaccuracies from sampling effects due to the costs from working at this scale, typically only certain economically powerful countries and multinational corporations can fund the development of these models. Well documented inequalities associated with institutional prejudice and social filters shape who can work with these products, as well as the tacit assumptions that guide decision-making. The datasets are compromised in many other opaque ways too, especially when it comes to presumed ideological impartiality. Many of these kinds of issues have been identified in the various proprietary algorithmic risk-assessment tools deployed in US courts.

Presuming that most of these issues can be adequately addressed with inputs curated, ranked and vetted with justice foregrounded, generative large language models are still little more than sequence predictors. As Gary Marcus has cautioned, generative AI platforms are “unreliable and could create an avalanche of misinformation” because in a real sense these systems are a “pastiche” of prior human linguistic actions. “It merely predicts the statistically most probable next word in a sentence based on its training data, which may be incorrect,” writes Edward Ongweso Jr.

Recombinatory sequences of words may have trace elements of what Jacques Derrida called a human signature in his book, Of Grammatology – there is, after all, a labor process that created these words – but they lack the full intent, context, and meaning of the texts that subsequently came to be incorporated into the datasets used in the modeling process. For that matter, there are limits of sympathetic critique when language models are so disconnected from speech acts and situations. Ultimately, “AI reinforces a different definition of what reality is and how it can be understood.”

On the issue of making reality, there is an entire labor force that trains generative AI models to filter out violence, hate speech, and gross abuse. To make ChapGPT safer for users, OpenAI outsourced this task to a firm in Kenya, with senior data labelers earning $2 per hour at most. These workers had to repeatedly encounter “situations in graphic detail like child sexual abuse, bestiality, murder, suicide, torture, self harm, and incest,” often to the detriment of their own health. Much like call center workers across the world, there is an entire cheap labor regime used to create the circuits of global capitalism even while tech companies discourage inquiries about how AI might depend on constant human touch. There is no financial incentive; the valuations of these billion-dollar companies rests on the denial of any substantial human labor beyond the initial coding.

Generative models are confined to the interplay of text, parameters and associations in their datasets. The premise is that current data about the world is sufficient to understand the world in the future. This is a grave error that can lead to an unhelpful mixture of mimesis, regurgitation and the uncanny valley of meaning. It also leaves little room to account for the role of the not-yet-known, or the unknowable. In that respect, the technology could be said to favor conservative temperaments as generative models encode the past assumptions of (and preferred readings by) the powerful as a kernel for their outputs. Historical inequalities are already an evident problem in training datasets used for AI, including generative language models like ChatGPT.

This is another reason why it is an error to reduce patterned prejudice to matters of implicit bias and other kinds of cognitive effects. Doing so allows people to claim that changing their language will lead to equality. But the primary cause of contemporary social inequalities are hierarchies established by a private property regime which legally justifies the domination of the many by a few. Injustices cannot be remedied through utilitarian inclusion. If ChatGPT is really a “preview of progress” as Altman claims, then it is important to evaluate the subtle ideological properties that give rise to the generated text that works against the possibility of establishing a post-capitalist– or perhaps simply more just– society.


Scott Timcke
Scott Timcke is a Senior Research Associate at Research ICT Africa, an African think tank based in Cape Town, where he leads the Information Disorders and AI Risk & Cybersecurity Projects. His primary area of expertise is in democratic development policy, industrialization, and the role of the state...