When it comes to games such as chess or Go, artificial intelligence (AI) programs have far surpassed the best players in the world. These “superhuman” AIs are unmatched competitors, but perhaps harder than competing against humans is collaborating with them. Can the same technology get along with people?
In a new study, MIT Lincoln Laboratory researchers sought to find out how well humans could play the cooperative card game Hanabi with an advanced AI model trained to excel at playing with teammates it has never met before. In single-blind experiments, participants played two series of the game: one with the AI agent as their teammate, and the other with a rule-based agent, a bot manually programmed to play in a predefined way.
The results surprised the researchers. Not only were the scores no better with the AI teammate than with the rule-based agent, but humans consistently hated playing with their AI teammate. They found it to be unpredictable, unreliable, and untrustworthy, and felt negatively even when the team scored well. A paper detailing this study has been accepted to the 2021 Conference on Neural Information Processing Systems (NeurIPS).
When playing the cooperative card game Hanabi, humans felt frustrated and confused by the moves of their AI teammate. Credit: Bryan Mastergeorge
“It really highlights the nuanced distinction between creating AI that performs objectively well and creating AI that is subjectively trusted or preferred,” says Ross Allen, co-author of the paper and a researcher in the Artificial Intelligence Technology Group. “It may seem those things are so close that there’s not really daylight between them, but this study showed that those are actually two separate problems. We need to work on disentangling those.”
Humans hating their AI teammates could be of concern for researchers designing this technology to one day work with humans on real challenges — like defending from missiles or performing complex surgery. This dynamic, called teaming intelligence, is a next frontier in AI research, and it uses a particular kind of AI called reinforcement learning.
A reinforcement learning AI is not told which actions to take, but instead discovers which actions yield the most numerical “reward” by trying out scenarios again and again. It is this technology that has yielded the superhuman chess and Go players. Unlike rule-based algorithms, these AI aren’t programmed to follow “if/then” statements, because the possible outcomes of the human tasks they’re slated to tackle, like driving a car, are far too many to code.
“Reinforcement learning is a much more general-purpose way of developing AI. If you can train it to learn how to play the game of chess, that agent won’t necessarily go drive a car. But you can use the same algorithms to train a different agent to drive a car, given the right data” Allen says. “The sky’s the limit in what it could, in theory, do.”
Bad hints, bad plays
Today, researchers are using Hanabi to test the performance of reinforcement learning models developed for collaboration, in much the same way that chess has served as a benchmark for testing competitive AI for decades.
The game of Hanabi is akin to a multiplayer form of Solitaire. Players work together to stack cards of the same suit in order. However, players may not view their own cards, only the cards that their teammates hold. Each player is strictly limited in what they can communicate to their teammates to get them to pick the best card from their own hand to stack next.
The Lincoln Laboratory researchers did not develop either the AI or rule-based agents used in this experiment. Both agents represent the best in their fields for Hanabi performance. In fact, when the AI model was previously paired with an AI teammate it had never played with before, the team achieved the highest-ever score for Hanabi play between two unknown AI agents.
“That was an important result,” Allen says. “We thought, if these AI that have never met before can come together and play really well, then we should be able to bring humans that also know how to play very well together with the AI, and they’ll also do very well. That’s why we thought the AI team would objectively play better, and also why we thought that humans would prefer it, because generally we’ll like something better if we do well.”
Neither of those expectations came true. Objectively, there was no statistical difference in the scores between the AI and the rule-based agent. Subjectively, all 29 participants reported in surveys a clear preference toward the rule-based teammate. The participants were not informed which agent they were playing with for which games.
“One participant said that they were so stressed out at the bad play from the AI agent that they actually got a headache,” says Jaime Pena, a researcher in the AI Technology and Systems Group and an author on the paper. “Another said that they thought the rule-based agent was dumb but workable, whereas the AI agent showed that it understood the rules, but that its moves were not cohesive with what a team looks like. To them, it was giving bad hints, making bad plays.”