Artificial Intelligence

An Algorithm Made New Scientific Discoveries by Reading Old Studies

When Alexander Fleming first discovered penicillin, he published his findings in a scientific paper — one that sat mostly unread for a decade, until another scientist found it and put Fleming's discoveries to the test, changing the world in the process. Something similar happened with the 1849 discovery of testosterone. The fact is that there's an unfathomably large quantity of published scientific research out there, and scientists can only hope to fully comprehend a small fraction of it. That means they could be missing some truly Earth-shattering discoveries. To solve that question, a team of researchers wondered: Can artificial intelligence comb through the research to find the breakthroughs humans can't? The answer is a resounding yes.

Siri, Make a Discovery

Lead author Vahe Tshitoyan of the U.S. Department of Energy's Lawrence Berkeley National Laboratory had a problem that was specific to his work as a researcher, but was also all too familiar to anyone who's tried to keep up with the news — scientific or otherwise.

"In every research field there's 100 years of past research literature, and every week dozens more studies come out," he said in a press release. "A researcher can access only fraction of that."

So Tshitoyan and his team turned to machine learning, specifically a technology known as natural language processing (NLP). Every time you use Google Translate or ask Siri for directions, you're taking advantage of NLP, which helps computers read, decipher, and make sense of human language. One of the biggest breakthroughs in NLP has been in word embeddings, where a machine learns the usage or meaning of a word based on a variety of individual dimensions, including the words it usually appears next to. In essence, it deciphers meaning from the words' relationships with each other.

People do this all the time. If you hear the word "redundant" every time someone uses two synonyms to describe one thing, you'll eventually learn that it means something like "repetitive" or "unnecessary." Likewise, you'll probably figure out that it has some negative connotation, and you might even start to understand that it means slightly different things depending on whether the topic is grammar ("That word is redundant"), engineering ("We included a redundant component, just in case"), or employment ("Your position has been made redundant").

For a study published in the journal Nature, the Berkeley Lab scientists did just this with published research, using a machine-learning algorithm called Word2Vec. They fed the algorithm a whopping 3.3 million scientific abstracts published between 1922 and 2018, comprising a vocabulary of half a million words. Since the team was made up of materials scientists, all of the research came from journals that centered on or included studies on materials science. Then, they let the algorithm run, with no extra human intervention or even science training.

Unrealized Potential

The algorithm immediately demonstrated a deep understanding of the research. For example, from the lithium-ion cathode compound lithium cobalt oxide (LiCoO2), it identified five other compounds that were chemically similar — and which the scientists already knew were also lithium-ion cathode materials.

"Without telling it anything about materials science, it learned concepts like the periodic table and the crystal structure of metals," said Anubhav Jain, the lead researcher on the study. "That hinted at the potential of the technique. But probably the most interesting thing we figured out is, you can use this algorithm to address gaps in materials research, things that people should study but haven't studied so far."

That was the truly remarkable thing about this experiment: By only analyzing the similarity between various words and the word "thermoelectric," the algorithm was able to identify new thermoelectric materials. That's a material that can efficiently convert heat to electricity, and that's hopefully safe, cheap, and easy to produce. The team took the top 10 materials that the algorithm predicted to be good thermoelectric candidates and ran calculations to determine their power factor — basically, how much energy they could generate. All of them had higher than average power factors, and the top three were at or above the 95th percentile of known thermoelectric materials.

To see if the algorithm could have made material discoveries that have since been made by actual scientists, they fed it studies that were at least a few decades old. Again, a substantial number of its predictions turned up in later studies, and a handful had been discovered in the intervening years.

"This study shows that if this algorithm were in place earlier, some materials could have conceivably been discovered years in advance," Jain said.

The researchers have released the top 50 thermoelectric materials that the algorithm predicted, along with the word embeddings so that other researchers can make use of their work. Next, the team wants to create a search engine that can make it easier to search scientific abstracts for these novel relationships. It doesn't happen every day, but sometimes when machines and humans work together, truly great things can result.

Get stories like this one in your inbox or your headphones: Sign up for our daily email and subscribe to the Curiosity Daily podcast.

For more on artificial intelligence, check out "Hello World: Being Human in the Age of Algorithms" by Hannah Fry. The audiobook is free with an Audible trial. We handpick reading recommendations we think you may like. If you choose to make a purchase, Curiosity will get a share of the sale.

Written by Ashley Hamer August 28, 2019

Curiosity uses cookies to improve site performance, for analytics and for advertising. By continuing to use our site, you accept our use of cookies, our Privacy Policy and Terms of Use.