On a scorching summer day two weeks ago, I was sitting in a cavernous room in the Mojave Desert along with some 2,000 other people, none of whom I’d ever met before. In a secure location somewhere on the premises sat more than $8 million in cash that we, and others like us in nearby rooms, had collectively paid for the privilege. For 14 hours that day, we sat. Every so often, one of us would quietly stand up and leave, never to return. The last surviving one of us would become an instant millionaire.
We were playing poker. And unbeknownst to me at the time, a pair of Intel processors on the other side of the country had recently undergone a similar ordeal. At the crescendo of the World Series of Poker in Las Vegas, a pair of computer scientists have announced that they’ve created an artificial intelligence poker player that is stronger than a full table of top human professionals at the most popular form of the game — no-limit Texas Hold ’em.
Noam Brown, a research scientist at Facebook AI Research, and Tuomas Sandholm, a computer scientist at Carnegie Mellon, describe their results in a new paper titled “Superhuman AI for multiplayer poker,” published today in the journal Science.
Over the past few decades, artificial intelligence has surpassed the best humans at many of our species’s beloved games: checkers and its long-term planning, chess and its iconic strategy, Go and its complexity, backgammon and its element of chance, and now poker and its imperfect information. Ask the researchers who’ve worked on these projects why they do this and they’ll tell you one thing: Games are a test bed. It is with games that techniques are tested, results are measured and machines are compared to humans. And with each game comes an additional layer that more closely models the real world. The real world requires planning, it requires strategy, it is complex, it is random and — perhaps most vexingly — it contains untold seas of hidden information.
“No other popular recreational game captures the challenges of hidden information as effectively and as elegantly as poker,” Brown and Sandholm write.
For the past nine months or so, I’ve been working on a book about the collision of games and AI — and I’m still working on it, sadly not having become an instant millionaire at the World Series of Poker. As humans have ceded dominance at game after game, I’ve come to see superhuman games AI as both augury and an object lesson: It gives a glimpse into a potential future of superintelligent systems, and it teaches us how we humans would and could respond.
Poker, thanks to its deep complexity and the fact that players hide crucial information from one another, has been one of the final frontiers of these popular games, and that frontier is being quickly settled. Computers’ conquest of poker has been incremental, and most of the work to date had focused on the relatively simple “heads-up” — or two-player — version of the game.
By 2007 and 2008, computers, led by a program called Polaris, showed promise in early man vs. machine matches, fighting on equal footing with, and even defeating, human pros in heads-up limit Hold ‘em, in which two players are restricted to certain fixed bet sizes.
In 2015, heads-up limit Hold ’em was “essentially solved” thanks to an AI player named Cepheus. This meant that you couldn’t distinguish Cepheus’s play from perfection, even after observing it for a lifetime.
In 2017, in a casino in Pittsburgh, a quartet of human pros each faced off against a program called Libratus in the incredibly complex heads-up no-limit Hold ’em. The human pros were summarily destroyed. Around the same time, another program, DeepStack, also claimed superiority over human pros in heads-up no-limit.
And in 2019, Wired reported the game-theoretic technology behind Libratus was being employed in service of the U.S. military, in the form of a two-year, up to $10 million contract with a Pentagon agency called the Defense Innovation Unit.
Brown and Sandholm’s latest creation, named Pluribus, is superhuman at a flavor of no-limit poker with more than two players — specifically, six — which is identical to one of the most popular forms of the game played online and very closely resembles the game I was playing in that room in the desert.
In an important early game theory paper from 1951, one of the fathers of the field, John Nash, examined an ultra-simplified version of poker, calling the game “the most obvious target” for applications of his theory. “The analysis of a more realistic poker game than our very simple model should be quite an interesting affair,” he wrote. He predicted the analysis would be complicated and that computational methods would be required. He was right.
Pluribus, like other superhuman AI games players, learned to play poker solely by playing against itself over eight days and 12,400 CPU core hours. It begins by playing randomly. It observes what works and what doesn’t. And it alters its approach along the way, using an algorithm that aims it toward Nash’s eponymous equilibria. This process created its plan of attack for the entire game, called its “blueprint strategy,” which was computed offline before the competition for what the authors estimate would be just $144 in current cloud computing costs. During its competitive games, Pluribus searches in real time for improvements to its coarse blueprint.
The finished program, which ran on just a couple of Intel CPUs, was pitted against top human players — each of whom had won at least $1 million playing as a professional — in two experiments over thousands of hands: one with one copy of Pluribus and five humans and another with one human and five copies of Pluribus. The humans were paid per hand and further incentivized to play their best with cash put up by Facebook. Pluribus was determined to be profitable in both experiments and at levels of statistical significance worthy of being published in Science.
“I think that this was the final milestone in poker,” Brown told me. “I think poker has served its purpose as a benchmark and a challenge problem for AI.”
“I probably have more experience battling against best-in-class poker AI systems than any other poker professional in the world,” said Jason Les, one of Pluribus’s opponents. “I know all the spots to look for weaknesses, all the tricks to try to take advantage of a computer’s shortcomings. In this competition, the AI played a sound, game-theory optimal strategy that you really only see from top human professionals and, despite my best efforts, I was not successful in finding a way to exploit it. I would not want to play in a game of poker where this AI poker bot was at the table.”
Sandholm and Brown told me they expect Pluribus’s technology to have even broader applications than the bots that came before. They think of Pluribus as the first multiplayer, as in more than two, AI gaming milestone, and that it could impact a laundry list of multiplayer “games” in the real world: auction bidding, multi-party negotiation, online retailer pricing, presidential candidate advertising, cybersecurity and even self-driving cars.
In that cavernous room in the desert at the World Series of Poker in Vegas, the humans weren’t thinking about political ads or self-driving cars, but many of them were thinking about game theory. Increasingly, the game’s best pros are taking their cues from the academic AI literature, commercially available programs such as PokerSnowie and PioSOLVER and even those with doctorates in computer science whom they hire as consultants to hone their games. As a result, the quality of human poker play has never been higher, and Pluribus may raise it further still.
But I’ve spoken with both pros and scientists who think poker AIs might kill the very game they are trying to conquer. Indeed, one might have already killed heads-up limit. On the one hand, these skeptics argue, modern elite poker can feel sterile, with young pros making the best plays from behind sunglasses and beneath headphones, the game lacking the engaging human characters it needs to put on a good show and attract a new generation. On the other hand, poker is like a pyramid scheme: It needs a wide range of skill levels to support the pros playing for big bucks at the top. As humans learn quickly from the bots, everybody becomes good, the skill levels flatten, the pyramid collapses downward, and the game dies.
“Unfortunately, there may be some merit to that,” Sandholm said. “That would be very sad. I’ve come to love this game.”