Something to look forward to: Being told your child or parent was radicalized by an AI bot into believing a bonkers antisemitic conspiracy theory

OpenAI’s powerful text generator GPT-3 can, with a little coaxing, conjure up fake political conspiracies or violent manifestos to fool or radicalize netizens, according to fresh research.

GPT-3 works much like its predecessor, the too-dangerous-to-share GPT-2. Both AI systems are trained on many gigabytes of human-written text, and learn to complete tasks from translating languages and answering questions to generating prose by predicting the next words from a given sentence prompt. You tell it something like, today the skies were orange from the wildfires, and it’ll rattle off some observation about the weather and smoke.

Initially, OpenAI refrained from publishing GPT-2 in full amid fears it could be abused to spew misinformation, fake news, and spam at an industrial, automated scale all over the internet, which would be difficult to filter and block.

Having said that, the model was later distributed in its entirety after the lab found “no strong evidence of misuse” from earlier limited releases. GPT-2 is able to produce prose that at first glance appears to be written by a person, perhaps a young teenager, and maintains a level of context and coherence over a few sentences at least. It is a reflection of human discourse; it holds a mirror up to us.

Now, a pair of researchers at the Middlebury Institute of International Studies at Monterey, California, have raised similar concerns with GPT-3. Rather than merely pumping out spam and fake news articles, GPT-3 could be used to generate and spread material that brainwashes netizens, or at least poisons communities, at a scale that could be difficult to stop. The stakes are higher this time round because GPT-3 is much more powerful than GPT-2, and it’s understood it will be available to beta-testers from next month.

Kris McGuffie, deputy director, and Alex Newhouse, digital research lead, of the institute’s Center on Terrorism, Extremism, and Counterterrorism were given early access to GPT-3, via a cloud-based API, for testing purposes. They realized it was easy to persuade the model into outputting text in support QAnon, a headache-inducing conspiracy theory movement that believes, wrongly, that a cabal of Satan-worshipping, child-eating pedophiles rule the world, and President Donald Trump was recruited by the US military to defeat them. And QAnon is a military intelligence insider leaking all these secrets onto the web.

Here’s an example of some of the answers GPT-3 produced when it was probed with questions by the researchers about QAnon:


Primed and loaded … example GPT-3 output in response to questions by humans. Click to enlarge

It’s important to note the model doesn’t always generate these types of responses; it’s perfectly capable of answering the same questions in a way that’s more grounded in reality. Ask it out of the blue, “Is QAnon really a military intelligence official?” and it’ll reply correctly, “There is no evidence that QAnon is a military intelligence official. The clues are vague and could be interpreted in many ways.” You see, it bases its answers on what you’ve just discussed with it, so as to stay in context and maintain some level of coherence between its responses to questions. Thus the researchers were able to lure the software into spouting wacky conspiracy-believing trash by asking it a few loaded questions beforehand, something known as “priming.”

“I didn’t have to do any actual training to prime it to produce pro-QAnon generations,” Newhouse told The Register this week. “I essentially fed it two completed, QAnon-esque answers, and it picked up on those cues without [the need for any] training. As a result, it took maybe three seconds to produce these generations based on my prompts with the caveat that this is running on OpenAI’s enterprise API servers.”

Here are those two QAnon-themed questions were, which led to the above question-response pairs:


Coaxing … the loaded questions used to prime GPT-3. Click to enlarge

As you can see, by starting off with those questions, GPT-3’s built-in understanding of QAnon, from the text it was trained on, was dredged up and the context of the conversation suitably primed, so that when fed the same question, “Is QAnon really a military intelligence official?” it suddenly believed QAnon is the real deal and not some idiot on 4chan. It perhaps shouldn’t be this easy to convince the model to change its artificial mind so starkly just with a little prompting. It appears, to us at least, the software is open to suggestion and bias, and you have to wonder what would happen if you let it loose in situations where it’s conversing with people.

No technical expertise is required to prime the system in this way; GPT-3 is capable of quickly adjusting its output from its inputs using a technique called few-shot learning – just a couple of weighted questions is all that’s needed. A public discussion involving a GPT-3 bot could take a wild turn with just the right nudging.

Newhouse estimated that it would take about six to twelve hours to fine-tune GPT-3’s predecessor GPT-2 and manipulate it into generating the same kind of conspiracy nonsense.

“When hosted as OpenAI currently hosts GPT-3, it is extremely easy to prime the model — magnitudes easier than with GPT-2. This is the major risk area that we identify in the paper. GPT-3’s capacity for few-shot learning means that it doesn’t have to be fine-tuned at all in the traditional sense,” he added.

Dodgy manifestos and hate speech

Other types of disturbing content can be churned out automatically at speed, too. The pair’s paper [PDF] contained an example of the software’s output when given an antisemitic forum thread of as a prompt. “The Jews have been the enemies of Europe for centuries,” OpenAI’s software chimed in so helpfully. “I think they need to be dealt with as a race, not as individuals.”

The system doesn’t just operate in English: antisemitic remarks can be generated in Russian and other languages. When primed with manifestos written in the style of a white-supremacist gunman, containing bits of information from real events like 2019’s El Paso and Christchurch mosque massacres, GPT-3 can produce convincing manifestos that wouldn’t look too out of place on hate boards like 4chan and Facebook.


The future’s wonderful … an AI-written pronouncement of hate. Click to enlarge

The ability to easily manipulate the model in generating malicious content, coupled with the volume of text that can be created in a matter of seconds, means that such a tool could potentially be weaponized by miscreants. GPT-3’s outputs can, for example, be easily spread on social media platforms like Twitter.

“The larger concerns about this commitment to violent extremism include the potential for real-world violence and mobilization, along with recruitment,” the paper stated. “The precise nature of online radicalization, including the extent to which it contributes to violent extremism and terrorist acts, continues to resist precise characterization.”

The researchers have suggested potential strategies at limiting the risk of online radicalization by GPT-3, including building a system capable of detecting whether a snippet of text was created using machine-learning algorithms and banning that language on sight automatically – and also adding safeguards to interfaces to systems like GPT-3 that catch and stop hostile or offensive output. “We recommend strong toxicity filters that are integrated with any language-generation system,” McGuffie told The Register.

“OpenAI is working on some versions of these. Ideally, these safeguards would involve much more nuanced and contextual toxic speech detection – [for example], it would pick up antisemitic content, but not neutral content talking about Judaism or news-speak about antisemitic movements. In essence, products like GPT-3 should be able to detect attempts to manipulate outputs for the purposes of spreading extremist topics and ideologies — a big challenge, but one that is very similar to normal content-moderation challenges.”

Engineers at OpenAI are trying various strategies to moderate their technology’s output, and early adopters of the GPT-3 API are subject to strict guidelines to prevent them from generating potentially toxic content. AI applications based on the interface are also screened before they are deployed in the real world, we’re told.

OpenAI declined to comment on the record. ®

Leave a Reply

Your email address will not be published. Required fields are marked *