Army Human Resources Command recently trained and deployed a machine learning algorithm to review the files of all active component officers opting to compete for the Command Assessment Program (CAP) — and ultimately generate an official list of invitees to vie for battalion command selection in fiscal 2025.
The algorithm completed the invitation-generating process — which typically takes humans up to 8 weeks — in less than 18 hours.
This marks the first time the Army command’s Innovation Cell automated the centralized selection list (CSL) board tasks to screen the top-tier files for invitations to the CAP. The team behind it is exploring a wide range of other artificial intelligence applications that could improve and accelerate organizational HR functions in the near future.
“I think it’s very important for us to let people know when we are introducing automation because there’s a lot of misconceptions about what it is, what it does, and how it works,” Col. Kristin Saling, director of that Innovation Cell told DefenseScoop.
Saling is in her eighth year serving in the Army’s “people enterprise,” where she now leads the innovation cell and advises the commanding general on how to improve business processes and technologies and enable management to incorporate more data to drive decision-making. Over the last two years, she and her colleagues — including the command’s chief data scientist, Maj. Tom Malejko, in particular — generated and tested several selection and evaluation screening algorithms to augment typically tedious selection board processes.
In a recent interview, the two briefed DefenseScoop on the impetus and evolution of that work, and how their team is serious and deliberate about deploying still-emerging AI technologies for HR in a responsible manner.
“Fortunately, with the good work that [Malejko] did, we were able to pick a model that was not only explainable, but it was also neck-and-neck accuracy with some of the more complex models. But we want to be able to walk people through that process so that they know we’re being responsible with their data — and we’re trying to do these advancements in a way that isn’t going to unnecessarily impact somebody’s career,” Saling said.
Why the Random Forest
While this initial project — named ‘AutoCSL,’ by Malejko — has been a couple of years in the making at this point, Saling noted that she and other Army leaders viewed it as part of “something we’ve been wanting to get after forever, because our board process is very ponderous.”
It can take weeks for a number of senior officers to complete the process. After a lot of preparation, they travel out to Fort Knox “and sit in the boardroom to review these files manually, where it’s all scanned,” Saling said. Ten based on that review, they make their choices.
But “the way this all got brought up,” according to the colonel, was by Al Eggerton, a senior leader in the Army’s Directorate of Military Personnel Management, who at the time was reviewing how the service managed promotions and command selection boards, which determine how personnel advance in their careers.
“The command selection board comes out with our [command selection list, or CSL] — or our order of merit list. And then based on that, we send out invitations to the Command Assessment Program,” Saling said. Those invitees who opt to participate in CAP do so, and then “they’re racked and stacked through another interview board process, and a lot of assessment,” she noted.
From there, the determination is made whether or not the service members are certified for command, and if they’re going to go into command ballots.
“[Eggerton] came to us and he said, ‘Hey, I’m running the whole board, and then we’re running a whole assessment with a board in it. Is there a way that we can do something with all these files to introduce some automation, or do something smart, so I don’t have to run two boards?’ basically — because it was taking up a lot of senior officer time,” Saling said.
She then brought in Malejko, her team’s “resident expert in natural language processing and textile analytics,” who she said is now “leading the charge on how we use text analytics to evaluate our evaluations, for lack of a better term.”
Malejko is a data scientist who has strong expertise working with unstructured text strings, which is rare expertise in the Army but needed to parse out useful information from evaluations, certificates, and other documents the service has in its people space and must effectively extract data from.
Notably, the algorithmic technologies in use in this project weigh in on the initial starting population, at the earliest step in the reviews.
“In the case of the battalion Command Assessment Program, it’s all of the eligible officers in the eligible branches and we net that down so that we have our list of competitors — the ones who have the strongest files, who receive the invitation,” Saling explained.
In the joint interview, Malejko noted that he recognized quickly in this effort that the process the colonel described falls in line with “a multiple hurdle selection strategy,” where the first event in the series is instantly designed to be a screen-out step to eliminate candidates who will not likely be competitive throughout the rest of the process.
“What then happens, when they get to that next phase — which is CAP, a really time- and resource-intensive event — we then have more time and resources to apply to them, and then be able to better identify and select talent for the Army. And so as you think through that process, we’re not using the algorithm necessarily to select individuals — but it’s really to enable and provide our senior leaders more time to spend with those individuals that are competitive, and they will ultimately be able to make better decisions going forward from there,” Malejko told DefenseScoop.
When given the proposition to automate, Malejko and team first looked into the contents of the officer evaluation reports. “For all intents and purposes,” he noted, there are three principal areas that board members look at when assessing an officer evaluation report.
“They typically look at what’s called the Senior Rater Narrative, which is a free text field consisting of several hundred characters that usually follows a pretty rigid structure and format, just because of the way we are and how we typically write those. There’s a profile constraint forced-choice item — we call it a simulator block check — where an individual can only rate up to 49% of the population in the topmost competitive block. And the rater also has one of those as well, where again, they can only rate a certain number of people in that topmost competitive block,” Malejko said.
His team removed the data from other fields that were not relevant and ultimately trained the system on those specific elements: Senior Rater Narrative, Senior Rater Block Check and Rater Block Check.
“We also retained certain demographic information — branch, race, gender of the individual. Those were not necessarily fed into the training algorithm, we retained those so that when we are doing the analytics on the back end, we could see if there was any kind of bias that we weren’t accounting for or other issues with the algorithm that we may have not been projecting. So those were retained just for that purpose. But then, like I said, they were not used for the training, or actual testing and evaluation. It was more for this analysis thereof,” Malejko explained.
There are many complexities associated with systemic problems in the military causing women and minorities to historically be less well-represented in combat arms positions — like infantry, armor, artillery and fighter pilots. This made it tougher for those military personnel to get promoted to higher ranks because such officers traditionally came from those types of backgrounds.
“As we dig into files, I mean, we have seen that in terms of strength of file score, you don’t see the same strength of file with women and minorities that you do with white males. However, one of those white males are in armor and infantry, and some of the kind of high-prestige command positions — so we don’t have enough information right now. That’s a study we want to do. There’s a lot of studies we want to do in the future, to figure out how much of that is a simple selection of branch opportunities afforded to more diverse branches versus the less diverse branches,” Saling explained.
“That’s kind of well beyond what we’re doing with the boards,” she said. But “because it’s a complex problem, and because now we understand a little bit of the board behavior and trends and rating terms used — other things that we can continue to study — we’re going to be able to dig into that a little bit deeper.”
From an algorithmic standpoint, the Army HR command’s team considered a number of open-source generative AI models to apply — and ultimately chose one known as the Random Forest model.
“In the grand scheme of things, it is a relatively naive model compared to what is out there in terms of ChatGPT or Bard AI. So, by using a more simple model, what that allows us to do is to gain a greater level of explainability and interpretability from that model,” Malejko said.
Broadly, a Random Forest refers to a machine learning technique (or algorithm) that generates a collection of decision trees — and then their results are aggregated into one final output.
“On the back end, because we used that model, that allows us to be able to ‘dive under the hood’ and look into what it is actually doing and why it is doing it. We were then able to start to learn why the model was generating the particular scores that it was. I think, for us, that really helped us gain a lot of senior leader buy-in as well — because not only do they see it as a block box, but they can then begin to see ‘Oh, that is why that algorithm picked it that way. That makes sense,’” Malejko said.
Essentially, this meant that if the system displayed abnormal behavior, humans could interject and clean up whichever data feed was causing the error in the first place — before it became a problem that got propagated by the algorithm.
Malejko’s team trained the model with about 300,000 real, past evaluations and specifically their Senior Rater Narrative, Senior Rater Block Check and Rater Block Check inclusions.
“Using this algorithm that we developed — it reads a combination of those three things — we were able to score 96.4% of individual evaluations to within a half-point of a human-generated score,” Malejko said.
AutoCSL also produced roughly the same number of invitations as personnel did in previous years.
“I think the biggest takeaway from what [Malejko] did is he took this weekslong process, ran the algorithm on a Thursday night, sent the order of merit list out to the branches and spent that Friday morning verifying with them seeing if they wanted to move anybody around and have the approved invitation list to the general by about 2 pm that Friday afternoon. So, we took weeks and condensed it into probably a period of about 18 hours,” Saling said.
Multiple times during the joint interview, Col. Saling and Maj. Malejko emphasized how their ultimate intent is not to eliminate people from this invitation-producing process — but to free up senior leaders’ time to do the things that humans are inherently good at, like making complicated determinations.
“If we can remove the rote and the tedious process for the human and allow them to then focus and make those hardline decisions, I think that’s where we can really use technology to more efficiently make better decisions,” Malejko said.
Only a little more than a dozen “substitutions” were made by Army leaders to the AI-generated invitee list.
“So, if they felt like the algorithm had overlooked one of their up-and-coming performers, they were able to nominate and say, ‘Hey, we recommend this person get included.’ And so in that way, we’re not trying to remove people from the process, we’re just trying to make the process more efficient so we can spend more time with those candidates that are deemed to be competitive,” Malejko said.
He and the team conducted a great deal of research and testing around outliers, where it seemed the model scored something very differently from how the human reviewers did.
“It was interesting to drill down into those. There were times where we found something that a reviewer had missed, and [cases where with language processing] somebody was disadvantaged due to somebody who’s writing a bad Officer Evaluation Report, or OER. Although we could kind of sift through and figure out what it is they were trying to say, there were several occasions, I think, we found that the model scored the evaluation much more fairly than the reviewer did. Sometimes we had some harsh reviewers that undercut some of the scores,” Saling said.
Notably, the model deployed is not one that is continuously learning. This project marked what’s known as a “batch situation,” in which a system is only trained up to a certain point.
“Starting this fall, we’ll go through and train it for the next iteration — updating it based on the latest year’s worth of terms and OERs — because, obviously, things in the Army are changing and shifting quite rapidly. So we want to make sure it’s picking up on some of the newest types of organizations that exist,” Malejko explained.
And as they move forward, officials in the Innovation Cell are also pulling data from across the enterprise and pinpointing other opportunities to apply AI and emerging technologies to accelerate and bolster Army outcomes.
“What we’re trying to do is not automate away all of our human processes. We’re just trying to augment them so that we can really focus on the differential variables that matter,” Saling told DefenseScoop.
Reflecting on the “bigger picture,” she also noted people are often surprised to hear that her team is effectively accelerating workflows with automation — particularly in the human resources realm.
“My boss at the time and I went to talk to Google in 2019, and they said that there was a lot of cultural resistance to bringing in any automation. And this is, you know, kind of ‘the’ tech company to bring automation into the people space. But it’s like one of the things I think we’ve successfully demonstrated here — and I love that we’ve been able to get so much buy-in from senior leaders. It’s showing AI-augmented processes. It’s not AI coming in and making the decision for you, it’s showing what it can do to better enable a decision process,” Saling said.