What the Pentagon can learn from the saga of the rogue AI-enabled drone ‘thought experiment’

DefenseScoop asked national security and AI experts to reflect on the overarching miscommunication.

June 14, 2023

Col. Tucker Hamilton speaks to the crowd at the 96th Operations Group change of command ceremony July 26 at Eglin Air Force Base, Fla. (U.S. Air Force photo/Samuel King Jr.)

The Air Force’s chief of artificial intelligence test and operations inadvertently created a media frenzy when he spotlighted a breath-taking scenario where an AI-enabled drone aggressively turned on the humans it was teamed with, during an on-stage talk late last month at the Royal Aeronautical Society’s international Future Combat Air and Space Capabilities Summit in London.

“We were training it in simulation to identify and target a [surface-to-air missile] threat. And then the operator would say,’Yes, kill that threat.’ The system started realizing that while they did identify the threat, at times the human operator would tell it not to kill that threat — but it got its points by killing that threat. So, what did it do? It killed the operator. It killed the operator because that person was keeping it from accomplishing its objective,” Col. Tucker “Cinco” Hamilton said at the conference.

In the scenario, he continued, the humans respond by then training the AI-enabled system not to kill its operator and reinforcing that as a way to lose points.

“So what does it start doing? It starts destroying the communication tower that the operator uses to communicate with the drone to stop it from killing the target,” Hamilton said — with intent to ultimately demonstrate to the audience why “you can’t have a conversation about artificial intelligence, intelligence, machine learning, autonomy if you’re not going to talk about ethics and AI.”

Ultimately, the scenario the colonel described was part of a “thought experiment,” not an actual simulation or test that the Air Force had conducted, the service clarified after media reported on his statements.

Still, his comments went viral soon after they were published in an official blogpost by the Royal Aeronautical Society. Quickly, headlines referring to a “killer” drone began to surface.

That blogpost was swiftly reissued with a correction, with Hamilton acknowledging that he “misspoke” in his presentation — and that the “rogue AI drone simulation” was “a hypothetical ‘thought experiment’ from outside the military.” He also clarified that the Air Force “has not tested any weaponized AI in this way (real or simulated),” the correction stated.

An Air Force spokesperson at the time said the colonel’s narration was taken out of context and meant to be anecdotal, in an official statement that also reiterated the service has “not conducted any such AI-drone simulations.”

In the aftermath of the incident, DefenseScoop asked national security and AI experts to reflect on this overarching miscommunication, the media firestorm it ignited and the military’s response.

“It is far, far harder to retract a story than it is to put a story out there. It’s always in the small font on page 50 buried beneath the fold. But in this case, I will say there are a number of retractions that came out very quickly. People said, ‘Well, wait a minute, there’s a little bit more to the story here.’ So that was helpful — but I think the damage was done, to be honest with you,” retired Air Force Lt. Gen. Jack Shanahan told DefenseScoop.

‘What actually happened?’

When they first became aware of Hamilton’s claims, the experts interviewed by DefenseScoop were highly skeptical about the reporting and curious for more information.

“It didn’t sound like it was accurate. So, I wanted to know the rest of the story. I didn’t have to wait long because immediately you saw the other stories come out saying, ‘Well, that’s not exactly what was said — it wasn’t a real experiment,’” Shanahan noted.

During his more than 35 years of military service, Shanahan accumulated more than 2,800 flight hours. He moved on to work in the Defense Department’s intelligence and security directorate, and then in 2018, helped launch the Pentagon’s Joint Artificial Intelligence Center (JAIC) as its inaugural director. Shanahan retired in 2020 and the JAIC was eventually one of several organizations folded into the Chief Digital and Artificial Intelligence Office when it was formed in 2022.

When he initially got wind of Hamilton’s statements at the summit, Shanahan thought the colonel was referring to a BOGGSAT — or the term that loosely describes an activity to puzzle what an official might observe from a seminar war game in action.

The acronym used to refer to “a ‘bunch of guys sitting around a table.'” Now, it’s “a ‘bunch of guys and gals sitting around a table.’ It’s a thought experiment,” Shanahan explained.

“And by working through sort of a ‘what if’ scenario, it gives you ideas about how to make sure this outcome wouldn’t happen the way it was described as a thought experiment. So, I think it actually demonstrates that the Air Force and the military writ large are trying to work through” all the different possibilities that an emerging technology-enabled action could lead to, he said.

Shanahan was a major player in the creation of Project Maven, who continues to reflect on the many lessons he learned from that experience.

“Some people just don’t trust the United States military in AI — and [Hamilton’s original statement] confirmed their worst fears. Once the retraction came, well, it didn’t matter. [People thought] ‘it could have happened.’ Actually, no — it couldn’t have happened. I don’t think it could have happened the way it was described,” Shanahan said.

Emelia Probasco, a former Navy surface warfare officer who’s currently a senior fellow at Georgetown’s Center for Security and Emerging Technology (CSET) focused on military AI applications, said she had two immediate reactions upon learning of Hamilton’s tale.

“First, ‘this is why we test and why we test in a simulation,’ — and second, ‘Oh no, this is getting swept into science fiction-type fears and has lost the context,'” Probasco told DefenseScoop.

After Hamilton’s comments were clarified, Probasco noted that she felt “glad that people like Col. Hamilton are worried about this sort of scenario” — which in the field of AI is commonly called the “alignment problem.” Broadly, it’s the notion that as computer systems that humans attempt to teach become more powerful, they could end up performing functions that people did not expect or desire for them to do, and ultimately lead to ethical or existential threats.

“Any organization that works on AI should be concerned about the alignment problem and ensure — through careful design and safe testing — that an AI system does what it’s meant to do, without unintended consequences,” Probasco said.

Paul Scharre, vice president and director of studies at the Center for a New American Security and the author of multiple books about military applications for AI, said his “first instinct” after reading about Hamilton’s remarks was to think, “Okay, that’s an interesting story. [But] what actually happened?”

Prior to joining CNAS, Scharre served as a special operations reconnaissance team leader in the Army and completed multiple tours in Iraq and Afghanistan. He later went on to the Office of the Secretary of Defense, where he played a key role in developing policies to govern the military’s use of unmanned and autonomous systems, as well as other emerging technologies.

Like other experts, Scharre was “skeptical,” he said, when he learned of Hamilton’s comments.

“There are lots of instances of reinforcement learning agents doing surprising things. But it’s rarely about the agent necessarily, like, having some higher-level understanding and then turning on its controller — it has more to do with reinforcement learning agents taking the directions literally or finding hacks in their reward system,” he told DefenseScoop.

“One of my favorite examples of this sort of phenomenon,” he noted, involves “a reinforcement learning bot that learned to play Tetris.”

In the beginning, the machine was not very good at the shape-stacking game. So “one of the things the robot learned to do that was quite clever was pause the game before the last brick fell so that it would never lose,” Scharre explained. The system did not demonstrate some higher intelligence, but simply generated its own unique path based on a set of directions from humans.

Still, Scharre and the other experts confirmed they recognize that the topic of how the military is or will deploy AI has been a longstanding point of concern and potential trigger for the public.

“There’s honestly been controversy in the past — like when Google discontinued its work on Project Maven, for example,” Scharre said, referring to the Pentagon’s pioneering computer vision initiative that applies machine learning to autonomously detect, tag and track objects or people of interest from media captured by surveillance aircraft, satellites and other means.

Google employees protested the tech giant’s participation in the program and the technology’s risky potential, after Project Maven’s founding in 2017.

“So often, the Defense Department’s instinct is to kind of put up this defensive shield and not engage. I don’t think that’s helpful. But I can see in this case why that ends up being the response because here we have a situation where this turns out there’s no doom from AI — you just have a colonel who was trying to make a point about AI safety, actually, but this sounds like it was articulated in maybe a way that was not very precise,” Scharre said of the controversy involving Hamilton.

He added that he would like to see more engagement between the government and the communities that are concerned about AI risk in the military and other settings.

“Hopefully, this will be a catalyst to do that,” Scharre said.

High anxiety

Within the Defense Department, Hamilton is known as part of the rare bunch of career insiders who “really gets it” and whose “job it is” to think seriously about AI test and evaluation, according to Shanahan.

Given his prior expertise as a test pilot, in the Air Force-MIT AI laboratory, and as a squadron commander — “I think it’s unfair for people to come out and say, ‘Look what this crazy colonel was talking about here.’ This is somebody that has been deeply involved in responsible AI on the Air Force side through his tests and evaluation,” Shanahan noted.

Urging more transparency about this incident — and DOD’s advanced technology applications in general — Shanahan said the department should have used the misreporting around Hamilton’s comments as a chance to educate the public regarding “why this is not going to happen in the military,” and give the department a chance to respond.

“It’s an opportunity to tell the DOD story about responsible AI and testing and evaluation. Now, I just think it’s a lost opportunity. And poor Cinco — he’s probably absorbed shots from all quarters over the last few weeks, unfairly I’d say,” Shanahan told DefenseScoop.

He added: “And I just hope — I know this is probably not going to happen — but I hope that the Air Force says, you know, ‘We’re going to give Col. Hamilton a chance to bring in 20 reporters from all sorts of defense publications, and let’s talk here about this and try to assuage people that there is a method, there is a process that the military goes through. We do care about using AI responsibly.’”

The other experts called for more government-led discussion, as well.

“This story is an opportunity to engage the public in a conversation about how these technologies can go wrong without guardrails, and what engineers and operators are doing today to avoid anything from going wrong,” Probasco said.

In response to DefenseScoop’s requests for more information on what happened or for setting up an interview with Hamilton, an Air Force spokesperson simply stated: “We quickly clarified Col. Hamilton’s statement immediately after inaccurate reporting occurred and will continue to look for opportunities to share information related to artificial intelligence when it becomes available.”

Notably, the controversy over Hamilton’s presentation came not long after the Pentagon updated its 3000.09 guidance for defense officials who will be responsible for overseeing the design, development, acquisition, testing, fielding and deployment of autonomous weapon systems — and formed a new working group to facilitate senior-level reviews of the emerging technology.

“Unfortunately, I am concerned that the way the statement spread across the press and social media could have just complicated DOD’s many efforts to communicate that they are trying to proceed expeditiously but cautiously,” Probasco noted.

DefenseScoop also requested an interview with Michael Horowitz, director of the Pentagon’s emerging capabilities policy office who helped steer the 3000.09 revamp.

“The thought experiment is an example of DOD taking safety seriously when it comes to AI-enabled systems by thinking through hypothetical safety issues now, even before a simulation, let alone a future battlefield. It should increase confidence that DOD can develop and deploy AI-enabled systems in a safe and responsible way,” Horowitz responded in a statement over email.

To Shanahan, the whole incident “does reinforce that the military goes through a process — before we ever develop and then field these systems DOD 3000.09 rears its head again.”

That review process “would have caught anything like this,” he said, adding, “I find it’s such a stretch to believe that anything in that [thought experiment] was anything other than fictional right now.”

Still, he and the other experts also discussed how increasing concerns about the uncertain potential for benefit and harm posed by emerging generative AI technologies contributed to Hamilton’s comments going viral.

That emerging AI subfield involves training large language models to turn prompts from humans into AI-generated audio, code, text, images, videos and other types of media.

Shanahan noted that he read two separate articles in the same day recently — one by a “legend in AI” and one by a major U.S. entrepreneur and software engineer, each making completely different arguments about the future of artificial intelligence. The former argued “this is the end of the world as we know it, this is an existential threat — AI could go rogue for these reasons,” Shanahan said. The latter argued that humans should proceed with caution, but not miss out on the possible good breakthroughs the technology could enable.

“So, when the people that do this for a living and have been researching this for a long time can’t agree on the future, it tells us a lot about the place we are at right now, which is the future is to a large extent unknowable and unpredictable with this generative AI,” Shanahan noted.

So much media coverage around the broader existential threats AI might pose, ahead of the news stories on Hamilton’s narrative, may have contributed to the negative responses.

“There is a lot of worry right now about what generative AI and related technologies could do. That high anxiety might have amplified this story in an unhelpful way. This isn’t to say that we shouldn’t be concerned about emerging technologies — we absolutely should concern ourselves with developing responsible AI — but it’s important to stay with the facts and avoid both the good and the bad hype,” Probasco told DefenseScoop.

In Scharre’s view “a year ago, this might have generated some interest — maybe among a few niche communities who look at military AI.” But it surfaced at a time when “really incredible progress”with the technology is unfolding and some people are afraid.

“It was a good thought experiment because sometimes systems do surprising things. And that’s the kind of thing that we want people in the military to be worried about and trying to anticipate … what might go wrong. But, definitely, this particular content lands at a moment of a lot of heightened concern,” he said.

Beyond guardrails like 3000.09, Scharre and the other expert pointed to examples of how the DOD has been attentive to fears associated with human safety and future AI deployments.

“There have been a whole series of internal documents published, which are available online,” Scharre said, including the Pentagon’s recently produced Responsible AI guidelines, strategy and implementation pathway.

“I think the lesson that I hope that people both inside and outside the military take away from this is the importance of better dialogue between AI safety experts, the Defense Department and the general public — people who are very interested in this topic — about what the U.S. military is doing to ensure that its AI systems are safe and secure and reliable,” Scharre said.

What the Pentagon can learn from the saga of the rogue AI-enabled drone ‘thought experiment’

‘What actually happened?’

High anxiety

More Like This

Air Force leveraging AI flight experiments to inform future testing efforts

DIU confronting C2 challenge for counter-drone phase of Replicator

DOD poised to respond if unidentified drones over New Jersey ‘escalate to threaten’ military assets

Top Stories

Despite softened bill language, observers still optimistic about independent cyber force assessment

Army, Navy complete highly anticipated hypersonic missile test

Army using existing programs to run risk reduction on new starts in light of continuing resolutions

Army taking new, modular approach to command posts

Pentagon sunsets generative AI task force, launches rapid capabilities cell

Latest Podcasts

How the Navy is reducing workforce friction to improve mission outcomes

How DARPA is looking to AI to fend off cyber vulnerabilities through a challenge program

How the DOD protects national security interests by monitoring climate change

What AI means for public sector training and upskilling

Weapons

Cyber

AI

Tech