Assessing the safety risks of software written by artificial intelligence

Cooper Raterink / Jul 14, 2021

One of the most intriguing developments in artificial intelligence (AI) is the advance of systems that can write code, including tools that make “autocomplete” suggestions to developers. Such tools are popular because they can help to save time and reduce errors in the software development process. But there are inherent risks in automating software development that can create cascading problems.

Over the past two weeks, OpenAI, an AI R&D firm, and the software giant Microsoft announced the launch of Github Copilot, a code autocomplete plug-in for Microsoft Visual Studio Code, a tool for editing source code and building software applications. The team of researchers behind the project published a 35-page paper describing Codex, the AI model behind the autocomplete tool. A section of the paper describes OpenAI’s investigations into the broader social impacts of releasing such a model. The findings can help us answer the questions: what are the novel safety risks posed by large code generation models like Codex? Which risks do they inherit from the large language models they are derived from? Codex, viewed from an AI safety perspective, serves as a compelling example of how a relatively straightforward extension of existing AI technology can introduce new and remarkably unique implications for society.

The Transformer-based language model underpinning Copilot, Codex, was derived from a 12-billion parameter GPT3 language model fine-tuned on over 100 million lines of Python code publicly available on Github. (The Codex model described in the paper is not precisely what is deployed to Copilot, but it is very similar.) The best resulting Codex model was capable of generating Python code that correctly completed over 70% of the 164 basic programming challenges the OpenAI team crafted to evaluate the model—OpenAI released these challenges in the HumanEval dataset. Codex performs more than 40 percentage points better than other state-of-the-art models, for example EleutherAI’s GPT-J, and TabNine, an industry provider of code autocomplete tools. Codex and its productionized form, Github Copilot, represent a significant leap forward in AI code generation quality.

For developers, Codex’s capabilities will most likely unlock speedups in code-writing and may lower barriers to trying new coding languages and tools. However, the widespread introduction of code generation tools into software workflows, if imprudently implemented, poses certain risks. Some of these risks echo those around language models; for example, there are legal and ethical concerns about the fair use of publicly available open-source code. (in the case of Codex, Github experiments indicate very little chance of code duplication, so although the tool has upset some open-source developers, the legal risk is probably low due to how copyright law is written). Other risks are likewise fundamentally similar, but have unique considerations that deserve highlighting:

1. Code generation models can proliferate harmful social biases, especially allocational biases. OpenAI researchers found that, when prompted to return the gender of an input variable, the model assumed gender was a binary trait; similarly, the model categorized races into only a few categories, sometimes only “white,” “black,” and “other.” Categorizations in general have a long history of being problematic, and when harmful ideas about how identities should be grouped are encoded into computer programs, they have the unique potential to reach millions of users quickly and cause allocational harms. Notably, based on initial experiments reported by OpenAI, code generation models may be less likely to propagate harmful representational biases when compared to large language models (that is, they pose less risk of generating text that represents identities in a problematic way, such as through stereotypes). This may be intuitive, however, when you consider that code and code comments are less likely to contain abusive or biased text than Internet content is, so fine-tuning on that data may steer the model toward more innocuous priors. Also, code generation tools have fewer opportunities to generate the kind of natural language that may contain stereotypes or other representational biases, as the only time this would be probable is during code comment generation.

2. Cybercriminals could use code generation tools to build malware that fools security software. OpenAI experiments witnessed Codex perform poorly in generating hacks like SQL injection attacks, although the model was able to generate auxiliary components, such as encrypting files in a directory. It seems unlikely that Codex would improve at producing overtly malicious code, as only minimal amounts of it would be publicly shared on Github. However, the tool could unintentionally aid cybercriminals in building malware that fools security techniques such as signature matching. “Polymorphic malware” is code that can change its implementation without changing its core function, and can thus change its signature to evade detection by security software. This type of code transformation is enabled by code generation tools; securing against this and associated threats is a crucial future research direction. Interestingly, this contrasts the threat posed by disinformants using language model outputs, where the concern is that people will be fooled by machine-generated messages. Whether a person can detect malicious, machine-generated codedoes not change the risk; what matters is if the code correctly executes and goes undetected by security software.

Code generation models pose novel concerns, as well. In comparison to the original GPT-3 language model, concern around user security is higher, AI alignment is a much clearer problem, and there are potentially more drastic and near-term economic effects:

1. Code generation can be insecure. Given the well-documented effects of “automation bias,” the effect that compels people to over-rely on automated tools for decision-making, users of code generation tools should be wary when adding suggested code to their codebases. The Github code used to train Codex was public, meaning there is a low bar to contributing and thus the training code may include errors such as security flaws, outdated packages, and incorrect algorithms and comments. The same is true for Internet text, which leads to the issue of toxic degeneration in language models, but insecure degeneration by code autocomplete tools poses a particularly worrisome threat to software developers, companies, and users. OpenAI experiments found that Codex models regularly introduced security flaws, even across their higher-performing models. Investigation by an initial user of Github Copilot reinforced OpenAI’s findings, showing that Copilot suggested code vulnerable to attacks that take advantage of SQL injection and buffer overflow.

2. Larger models are less likely to do what their users want. OpenAI calls this “misalignment.” Researchers discovered that “the model will complete confused code with confused code, insecure code with insecure code, or biased code with similarly biased code, regardless of the model’s capability to produce secure, unbiased, and high-quality code.” This also applies to malicious code, even though code autocomplete in this case is highly undesirable from a safety point of view. Notably, OpenAI found that the Codex misalignment effect increased as model size increased, which indicates that it is not due to poor modeling of the training data (in fact, misalignment is likely directly correlated with models’ goodness of fit to the training data). These findings add weight to two ideas relevant to the larger conversation about AI alignment: 1) AI agents should assume humans imperfectly execute their intentions; and 2) AI agents should incorporate societal values into decision-making when following a (possibly ill-wishing) individual’s lead.

3. The economic effects of the release of Copilot are unknown and will impact people very soon. With over 14 million users (for reference, there are just less than 5 million software developers in the US), Visual Studio Code could bring Codex’s capabilities to the fingertips of myriad people working on the world’s apps, websites, AI programs, and other software tools. This could warp labor allocation for the industry—though it is unknown how exactly—and could reinforce the use of popular code libraries. (See Appendix H of the Codex paper for a more detailed analysis of the economic implications of code generation tooling.)

This analysis of Codex, a GPT-style model fine-tuned on Github code, highlights how the threat landscape around an already-available AI model can completely transform with the introduction of well-crafted fine-tuning data and clever sampling methods. In its paper, OpenAI takes a great first step in forecasting the possible social impacts of models like Codex. The paper and Copilot launch underscore the importance of pursuing research investigating bias, security, alignment, and economic effects of code generation models; research in which policymakers, social scientists, and technologists should all actively participate and collaborate; research which should be prioritized and funded now.


Cooper Raterink
Cooper Raterink recently graduated from the MSCS program at Stanford University, where he studied human-centered artificial intelligence and analyzed misinformation with Stanford Internet Observatory and the Election Integrity Partnership. He currently researches large language model safety and resp...