IARPA wants new options to spot when large language models exhibit potentially harmful behavior

The intelligence community’s primary research arm is exploring new ways to detect and combat vulnerabilities, biases and threats associated with emerging generative AI and large language model technologies that are increasingly informing U.S. intel analyses.

By Brandi Vincent

August 7, 2023

This photograph taken in Toulouse, southwestern France, on July 18, 2023 shows a screen displaying the logo of Bard AI, a conversational artificial intelligence software application developed by Google, and ChatGPT. (Photo by LIONEL BONAVENTURE/AFP via Getty Images)

The intelligence community’s primary research arm is exploring new ways to detect and combat vulnerabilities, biases and threats associated with emerging generative AI and large language model (LLM) technologies that are increasingly informing U.S. intel analyses.

Officials from the Intelligence Advanced Research Projects Activity (IARPA) Office of Analysis detailed their intent to “elicit frameworks to categorize and characterize” such security risks, via a request for information that invites organizations to respond with input by Aug. 21.

“Recent generative AI/LLM models are complex and powerful, and researchers are only beginning to understand the ramifications of their adoption,” IARPA program manager Tim McKinnon told DefenseScoop in an interview over email on Friday.

“Through this RFI, IARPA hopes to gain a broad view of the landscape of threats and vulnerabilities posed by this technology, with the ultimate goal of better understanding which aspects are most critical to the Intelligence Community safely adopting this technology,” he explained.

Since they started being unleashed for broad use by the public late last year, large language models and generative AI-enabled products — like OpenAI’s ChatGPT, Microsoft’s Bing chatbot, or Google’s BardAI — have attracted a great deal of attention around the world. This is “due, among other things, to their human-like interaction with users,” IARPA officials note in their RFI.

Broadly, LLMs refer to deep learning algorithms that are trained with massive, evolving datasets to recognize, summarize, translate, predict and generate convincing, conversational text and other forms of media.

According to McKinnon, “IARPA’s interest in LLMs long predates the public release of ChatGPT.”

“While the colossal scientific achievements in human language technology only entered the public eye over the past few months, this field has been a critical focus area for IARPA and the IC for the past decade. IARPA has been a major driver of LLM technology, with over 600 publications on human language technology in recent years,” McKinnon said.

Performers on past and present IARPA-led projects (such as REASON, MATERIAL, BETTER and HIATUS) have researched and engineered large language models to address what he called some of the intelligence community’s biggest challenges.

“Research addresses machine translation and summarizing texts from low-resource languages — languages with very little model training to date, like Somali and Pashto — identifying and retrieving personalized, mission-relevant event data from large multilingual news streams; and generating linguistic fingerprints to both attribute authorship of a document and protect an author’s privacy,” McKinnon noted.

As suggested, these technologies hold a great deal of promise to substantially transform how intelligence analysts work in the forthcoming years, but IC and other U.S. government leaders are also concerned about their potential for harm.

Through additional research, spotlighted in the recently released request for information, McKinnon’s team aims to advance agencies’ capacity to pinpoint and mitigate any threats to their users posed by model-based vulnerabilities.

In IARPA’s new request, respondents are asked to share frameworks their organizations have developed for making sense of large language model threats and vulnerabilities — and approaches for targeting and reducing the trackable risks.

“LLMs have been shown to exhibit erroneous and potentially harmful behavior, posing threats to the end-users,” officials wrote in the RFI.

Prompt injections, data leakage and unauthorized code execution mark some of the threats and vulnerabilities characterized in existing taxonomies. IARPA is interested in those, as well as others that are more novel and less identifiable.

Notably, the agency is interested in the characterizations and methods for both white box and black box models.

“In some cases, the IC will have full access to LLMs — such as by downloading open source LLM models — while in other cases the IC will only be able to interact with a given model through a user interface, like ChatGPT, Bing, and others. White box methods would be applied in the former scenario, since they assume access to model-internal information, while black box methods are used in the latter scenario and assume limited access, including model inputs and outputs,” McKinnon told DefenseScoop.

He also emphasized that IARPA depends on RFIs like this one to support and drive the development of future innovation-pushing initiatives.

“Understanding state-of-the-art technologies and the potential impacts of disruptive technologies on intelligence analysis is critical when making decisions to pursue high-risk, high-payoff research programs,” McKinnon said.

IARPA wants new options to spot when large language models exhibit potentially harmful behavior

More Like This

Trump authorizes use of AI for defense supply chain mapping

ONR launching ‘research by AI’ initiative as it looks to speed the delivery of cutting-edge tech to the fleet

Pentagon’s next APFIT round will prioritize tech that ‘can be made cheaply at scale’

Top Stories

SOF community puts novel acquisition strategy to the test during Accelerator challenge

Pentagon announces Trump nominee to lead U.S. Army Europe and Africa

DIU’s first active-duty military deputy wants to help U.S. forces meet future fights ‘with swagger’

Pentagon adds ‘bombers’ to Drone Dominance Program

Northrop Grumman launches in-space servicing satellites for life-extension missions

Pentagon investigating mysterious UAP event reported by the Navy near Virginia’s coast

More Scoops

Army says it’s using AI to help produce doctrine, but acknowledges the technology’s flaws

New Pentagon report on China’s military notes Beijing’s progress on LLMs

Air Force aiming to turbocharge wargaming with AI

SOCOM to evaluate industry hardware solutions for powering AI workloads

A first look at IBM’s new large language model that’s fine-tuned for defense applications

Army CIO reining in AI use cases to prevent excess costs

DISA launching experimental cloud-based chatbot for Indo-Pacific Command

Latest Podcasts

How the Navy is reducing workforce friction to improve mission outcomes

How DARPA is looking to AI to fend off cyber vulnerabilities through a challenge program

How the DOD protects national security interests by monitoring climate change

Security involves more than checking boxes; it’s about accelerating defense innovation

Tech

AI

Weapons

Cyber