Science & Technology

Scientists Came Up With 1,000 Questions That Stump Computers

We're nominated for an award! Please click here to vote for Curiosity Daily for Best Technology & Science Podcast in the 2019 Discover Pods Awards.

In 2011, America's favorite quiz show, Jeopardy!, exposed viewers to what was for many their first brush with artificial intelligence when IBM's Watson faced off against two of the show's greatest champions, Ken Jennings and Brad Rutter. Watson stumbled a bit on the first day but dominated day two and ended the three-round competition with more than three times its opponents' winnings.

"I, for one, welcome our new computer overlords," Jennings wrote on his video screen after Watson's win.

While artificial intelligence has developed tremendously since then and AI assistants like Siri and Alexa have proliferated in our pockets and homes, there are still plenty of scenarios where these assistants don't seem so smart. A team of researchers at the University of Maryland capitalized on this in an article published in the 2019 issue of the journal Transactions of the Association for Computational Linguistics, creating a set of questions that stumped computers in order to learn more about how they think — and perhaps teach them to think even better.

How to Train Your Computer

The goal of machine learning and artificial intelligence is to teach a computer to think like a human. This is done by feeding the computer data and letting it "learn" by making inferences based on probability and trial and error.

Language can be especially tricky for computers. Consider, for example, the challenge that slang poses to someone who's learning English. If someone tweets that they're "spilling tea" and "have receipts," it actually means they're gossiping and have evidence to support their gossip, not that they're making a mess with a hot beverage and printed cashier paper. That is, these phrases have completely different meanings than would be suggested by the dictionary definitions of the words "tea" and "receipts."

But just like a human learning a language, artificial intelligence can better understand these linguistic complexities the more it encounters them. But in order to do that, scientists need to understand which words and phrases are the most challenging for an AI system — and that meant coming up with quiz-show-like questions that were specifically designed to trip it up. In order to craft such questions, the scientists called in some help from collegiate quizbowl competitors.

Most sets of questions that have been used to improve artificial intelligence systems were written either by a human being or a computer. People and computers write questions very differently: When people write the questions, it's not clear what part of the question may be confusing to the computer. When computers generate the questions, they're either formulaic, fill-in-the-blank style questions or complete nonsense.

What sets this team's questions apart is that humans worked with computers in order to write the questions. The scientists developed a computer system that would allow the human authors to see what the computer was "thinking" while they typed their question by highlighting which words or parts of the sentence the computer was using to make its guesses at an answer.

For example, if the author wrote "What composer's 'Variations on a Theme by Haydn' was inspired by Karl Ferdinand Pohl?" and the system correctly answers "Johannes Brahms," the system would highlight the words "Ferdinand Pohl" and display other questions the computer had encountered that involved both Karl Ferdinand Pohl and Johannes Brahms to show that this phrase led the computer to the answer.

Understanding how the computer "thinks" allows the author to edit the question to be harder for the computer, but without changing the meaning of the question. So in this case, the author replaced the name Karl Ferdinand Pohl with a description of Pohl's job, "the archivist of the Vienna Musikverein." The computer was unable to answer the new question correctly, but the human quizbowl competitors had no trouble.

Better Together

In all, the quizbowl competitors and their computerized co-authors created 1,213 stumpers. Those questions were then put to the test in two live human vs. computer tournaments. In both tournaments, the humans beat the AI, with even the weakest human teams defeating the strongest computer system.

The researchers identified several phenomena that tripped up the computer's natural language processing ability. Some had to do with the language of the question itself, like paraphrasing or words used in unexpected contexts (such as a politician's appearance in a question that wasn't about politics). Others related to the computer's reasoning skills, like when a clue required logic and calculation or required multiple steps to reach a conclusion.

"Humans are able to generalize more and to see deeper connections," Jordan Boyd-Graber, an associate professor of computer science at UMD and senior author of the paper, said in a press release. "They don't have the limitless memory of computers, but they still have an advantage in being able to see the forest for the trees. Cataloguing the problems computers have helps us understand the issues we need to address, so that we can actually get computers to begin to see the forest through the trees and answer questions in the way humans do."

The team has made their questions and data freely available so other researchers can use them to help improve machine learning. It's a good thing, too. This project just goes to show how computers and people are more powerful together than when acting alone.

Get stories like this one in your inbox or your headphones: Sign up for our daily email and subscribe to the Curiosity Daily podcast.

Learn more about the human-computer relationship in "Hello World: Being Human in the Age of Algorithms" by Hannah Fry. The audiobook is free with an Audible trial. We handpick reading recommendations we think you may like. If you choose to make a purchase, Curiosity will get a share of the sale.

Written by Steffie Drucker August 30, 2019

Curiosity uses cookies to improve site performance, for analytics and for advertising. By continuing to use our site, you accept our use of cookies, our Privacy Policy and Terms of Use.