AI Protein-Folding Algorithms Solve Structures Faster Than Ever

The race to crack one of biology’s grandest challenges – predicting the 3D structures of proteins from their amino-acid sequences – is intensifying, thanks to new artificial-intelligence approaches.

At the end of last year, Google’s AI firm DeepMind debuted an algorithm called AlphaFold, which combined two techniques that were emerging in the field and beat established contenders in a competition on protein-structure prediction by a surprising margin. And in April this year, a US researcher revealed an algorithm that uses a totally different approach. He claims his AI is up to one million times faster at predicting structures than DeepMind’s, although probably not as accurate in all situations.

More broadly, biologists are wondering how else deep learning – the AI technique used by both approaches – might be applied to the prediction of protein arrangements, which ultimately dictate a protein’s function.

“There’s a lot of excitement about where things might go now,”

says John Moult, a biologist at the University of Maryland in College Park and the founder of the biennial competition, called Critical Assessment of protein Structure Prediction, where teams are challenged to design computer programs that predict protein structures from sequences.

The latest algorithm’s creator, Mohammed AlQuraishi, a biologist at Harvard Medical School in Boston, Massachusetts, hasn’t yet directly compared the accuracy of his method with that of AlphaFold and he suspects that AlphaFold would beat his technique in accuracy when proteins with sequences similar to the one being analysed are available for reference.

He says that because his algorithm uses a mathematical function to calculate protein structures in a single step – rather than in two steps like AlphaFold, which uses the similar structures as groundwork in the first step – it can predict structures in milliseconds rather than hours or days.

It’s fed with known data on how amino-acid sequences map to protein structures and then learns to produce new structures from unfamiliar sequences.

The novel part of his network lies in its ability to create such mappings end-to-end; other systems use a neural network to predict certain features of a structure, then another type of algorithm to laboriously search for a plausible structure that incorporates those features.

AlQuraishi’s network takes months to train, but once trained, it can transform a sequence to a structure almost immediately.

His approach, which he dubs a recurrent geometric network, predicts the structure of one segment of a protein partly on the basis of what comes before and after it.

It compares a protein’s sequence with similar ones in a database to reveal pairs of amino acids that don’t lie next to each other in a chain, but that tend to appear in tandem.

This suggests that these two amino acids are located near each other in the folded protein.

DeepMind trained a neural network to take such pairings and predict the distance between two paired amino acids in the folded protein.

By comparing its predictions with precisely measured distances in proteins, it learnt to make better guesses about how proteins would fold up.

A parallel neural network predicted the angles of the joints between consecutive amino acids in the folded protein chain.

These steps can’t predict a structure by themselves, because the exact set of distances and angles predicted might not be physically possible.

Instead of another neural network, it used an optimization method called gradient descent to iteratively refine the structure so it came close to the predictions from the first step.

Moult said there was a lot of discussion at CASP13 about how else deep learning might be applied to protein folding.

Maybe it could help to refine approximate structure predictions; report on how confident the algorithm is in a folding prediction; or model interactions between proteins.

Although computational predictions aren’t yet accurate enough to be widely used in drug design, the increasing accuracy allows for other applications, such as understanding how a mutated protein contributes to disease.