The Defense Advanced Research Projects Agency recently initiated a dynamic engagement effort to uncover and explore new directions for building and deploying national security-aligned artificial intelligence and machine learning applications that people can trust without hesitation.
Through AI Forward, DARPA plans to collaborate with and help shape future research paths for the community of experts now working to make trustworthy intelligent technologies a near-term reality — and simultaneously help inform the agency’s own associated IT investment strategies going forward.
“DARPA has historically over decades, like dating back into the ’60s, made significant advancements in AI,” Dr. Matt Turek, deputy director of the agency’s Information Innovation Office, told DefenseScoop in an interview.
“Those investments, and certainly industry’s investments, have really led, I think, to the transformational change in AI we’re seeing now in the context of generative AI systems like ChatGPT. Those capabilities are very compelling and very impressive — but there’s still issues with them in the sense that they don’t have the level of reliability that we probably need, from a DOD perspective at least for mission-critical use cases, and for life or death decision-making. And so we think now is a really good time to engage with the broader community of industry and academia to talk about: What do we need to do to get to highly trustworthy AI? What are the right sort of investments to make there? What are the right directions to go in, particularly directions that might not be driven by commercial investment?” he explained.
ChatGPT was immediately, immensely popular online when it was unleashed for public use in November 2022. The interactive chatbot, which completes diverse tasks with some accuracy and interacts with people conversationally and convincingly as it evolves, marks a tangible application of generative AI. That emerging subfield underpins the making of large language models that can generate audio, code, images, text, videos and other content when prompted by humans to do so.
“The transformational element here is just how broadly capable these large models ‘see.’ And so, then it’s not just unlocking a possibility in a very narrow focus domain, but unlocking a very broad set of possibilities, both positively and negatively, right? Positively, it might enable use of AI across a whole host of fields; negatively, it opens up a large attack surface that might evolve at a speed and scale that we’re just not used to dealing with,” said Turek, who briefed DefenseScoop on DARPA’s ultimate intent and the possible national security implications of this new pursuit.
It’s all ‘emergent’
DARPA is not shy about its “definitive views” on AI and where such capabilities are heading. In that context, agency experts generally think about this technology across three waves.
The first wave is foundational and encompasses rule-based AI systems, Turek noted. Those intelligent machines can follow rules defined by humans. Second wave AI introduces statistical methods by which the machines can cluster and classify information — then predict any next moves based on the data.
When third wave AI is completely realized, intelligent machines will essentially have the capacity to recognize and respond to the physical world and provide accurate, contextual explanations regarding their decision-making.
Turek confirmed that he considers today’s large language models like ChatGPT in the second wave category — and that he “hasn’t seen any successful examples yet of third wave AI.” Still, his DARPA team “feels like that third wave AI that combines the rules-based and statistical approach is particularly important.”
“If you think about how an AI system can go wrong — even just in terms of misunderstanding a goal — it’s because it doesn’t understand that context for the appropriate ways to achieve that goal that aren’t going to have side effects. And so that third wave AI, which we also talked about as having contextual understanding, I think is becoming even more important in the context of these breakthrough technologies,” Turek explained.
In his view, the phrase “emergent capabilities” exists and is buzzy right now “really, because we build these very large models, and we don’t comprehensively understand what do they know, what don’t they know, what are they capable of? And that’s a super difficult problem.”
“It’s going to take significant, dedicated effort to be able to really understand these models and their capabilities,” he added — which is part of what AI forward is all about.
This week, more than 1,000 technology leaders and scientists penned an open letter calling for an immediate pause to the production of AI models more powerful than GPT-4 (or the latest version of the viral text generator engine) for at least the next six months.
But AI Forward was thought up by DARPA even before ChatGPT initially went public last year, as Pentagon components have a deeper and different level of concern about potential disruption and risk than commercial industries.
“I think that DOD has been very careful and thoughtful about how we think about AI systems. And certainly, something like GPT-4 might be useful in particular roles within the DOD — there’s lots of documentation that’s developed and generated or might need to be summarized — so, that might be an appropriate application for these sorts of technologies. But from a DARPA perspective, we also need to be thinking out long-term Al, right? What’s the potential for strategic surprise or transformational change? And, DARPA, as an agency our mission is to prevent or create strategic surprise. So, we always sort of look at things through this lens,” Turek said.
Bracing for misuse
To kickstart AI Forward, Turek noted, DARPA is set to host a virtual workshop in June and an in-person workshop in Boston starting in late July, where select participants will brainstorm “compelling new directions” to engineer highly trustworthy AI.
As those events come to a close, the agency will invite participants to produce white papers to inform rapid explorations of specific topics the agency could fund down the line.
“This is an opportunity to have direct dialogue with DARPA leadership about [existing pain points] and potential areas where DARPA can just empower the research community, writ large,” Turek said.
He detailed the high-level categories that his team views as likely essential to creating trustworthy AI systems. The top two of those areas involve the foundational, underlying theory by which humans understand AI or multiple classes of such systems — and how the technologies are engineered.
“Think about how we build bridges today, right?” Turek said, posing an engineering exercise to help visualize the aims.
Humans know the size of the load the bridge needs to carry and how far it should span, he explained, and they can “decompose that into trust work” with support from standards to essentially know what particular beams could span that load and allow the appropriate amount of deflection. Well-rounded and compositional engineering processes by which people can “decompose this big problem into smaller subproblems, and then know that it’s going to map back into that larger problem” like that currently exist, he pointed out.
“We don’t really have the ability to do that for AI now, right? We’re building AI systems primarily by trial and error. The analogy there is we build a bridge. We drive a bunch of cars and trucks over it, see if it falls down. If it falls down, then we build the bridge in a different way and we go back, and we repeat the process until it stops falling down and we call it good. So, we really need to drive from the sort of trial and error process that we’re heavily reliant on in AI into a more engineering-based process,” Turek said.
This requires foundational theories that essentially inform test and evaluation options, he added, and the exploration of questions like “how do I decompose the capabilities of an intelligent system into parts and pieces that I can wire together and that I know how they will work once I put them together?”
Even with the emergence of nascent, large multimodal models like GPT-4 — “we certainly don’t have a theory” by which to build the system and undoubtedly get the performance results and outcomes desired, according to Turek. Instead, with those “emergent capabilities” experts build intelligent systems and then puzzle out all that they can do after their creation.
That contemporary approach “certainly opens the door to misuses and to huge surprises — both positive and negative,” Turek said.
A third major pillar DARPA’s seeking to pursue with collaborators in this initiative is around human-AI teaming. The agency wants to pave the way for machines that can serve as fluent, intuitive, trustworthy teammates to people across diverse backgrounds.
DARPA officials are open about all that this effort could enable and don’t just have one solution in mind. It might lead to new models and algorithms, Turek said — or “foundational techniques for understanding how we assess emergent capabilities and models.”
It could also perhaps lead to “straightforward things like new benchmarks,” he noted, or “maybe more sophisticated approaches that understand” the dimensions of human intelligence.
“How might those be reflected in different types of measurement systems? What do we need to do to make those be valid measurements for AI systems? Because AI — at least current generation and foreseeable near-term generation AI — functions and operates differently than humans, that might mean we need different ways of measuring and assessing them,” Turek explained.
During the interview, the deputy director also pointed to several potential national security concerns that AI forward might help the government confront.
“One is that our adversaries are also pursuing these sorts of capabilities and they might be willing to use them and deploy them much more quickly” than the U.S. does, he noted. Experts, therefore, need to fully grasp the shifting strengths and weaknesses of these models — and how they could be used as part of adversaries’ intelligence analysis processes, for instance.
These models can also potentially generate myths and disinformation faster and disseminate it more widely than what has been seen previously. They also require a lot less human skill to deploy.
“There’s a nice example of how GPT-4 could be used to construct a functioning website quickly. You could start putting all these pieces together and think about generating not just misinformation as content, but the sites that host it … We’ve seen people use [targeted phishing attacks and other tactics] already — but think about automating that process and being able to do it much more quickly at-scale,” Turek told DefenseScoop.
He will discuss this new initiative and other AI topics on a panel at the Sea-Air-Space summit next week.