BBC News 22 July 2021 - by Paul Rincon
Only a fraction of proteins made by the human genome have confirmed structures |
Artificial intelligence has been used to predict the structures of almost every protein made by the human body.
The development could help supercharge the discovery of new drugs to treat disease, alongside other applications.
Proteins are essential building blocks of living organisms; every cell we have in us is packed with them.
Researchers used a program called AlphaFold to predict the structures of 350,000 proteins belonging to humans and other organisms.
The instructions for making human proteins are contained in our genomes - the DNA contained in the nuclei of human cells.
There are around 20,000 of these proteins expressed by the human genome. Collectively, biologists refer to this full complement as the "proteome".
Commenting on the results from AlphaFold, Dr Demis Hassabis, chief executive and co-founder of artificial intelligence company Deep Mind, said: "We believe it's the most complete and accurate picture of the human proteome to date.
"We believe this work represents the most significant contribution AI has made to advancing the state of scientific knowledge to date.
"And I think it's a great illustration and example of the kind of benefits AI can bring to society." He added: "We're just so excited to see what the community is going to do with this."
Proteins are made up of chains of smaller building blocks called amino acids. These chains fold in myriad different ways, forming a unique 3D shape. A protein's shape determines its function in the human body.
The 350,000 protein structures predicted by AlphaFold include not only the 20,000 contained in the human proteome, but also those of so-called model organisms used in scientific research, such as E. coli, yeast, the fruit fly and the mouse.
This giant leap in capability is described by DeepMind researchers and a team from the European Molecular Biology Laboratory (EMBL) in the prestigious journal Nature.
AlphaFold was able to make a confident prediction of the structural positions for 58% of the amino acids in the human proteome.
The positions of 35.7% were predicted with a very high degree of confidence - double the number confirmed by experiments.
Traditional techniques to work out protein structures include X-ray crystallography, cryogenic electron microscopy (Cryo-EM) and others. But none of these is easy to do: "It takes a huge amount of money and resources to do structures," Prof John McGeehan, a structural biologist at the University of Portsmouth, told BBC News.
Therefore, the 3D shapes are often determined as part of targeted scientific investigations, but no project until now had systematically determined structures for all the proteins made by the body.
In fact, just 17% of the proteome is covered by a structure confirmed experimentally.
Commenting on the predictions from AlphaFold, Prof McGeehan said: "It's just the speed - the fact that it was taking us six months per structure and now it takes a couple of minutes. We couldn't really have predicted that would happen so fast."
"When we first sent our seven sequences to the DeepMind team, two of those we already had the experimental structures for. So we were able to test those when they came back. It was one of those moments - to be honest - where the hairs stood up on the back of my neck because the structures [AlphaFold] produced were identical."
Prof Edith Heard, from EMBL, said: "This will be transformative for our understanding of how life works. That's because proteins represent the fundamental building blocks from which living organisms are made."
"The applications are limited only by our understanding."
Those applications we can envisage now include developing new drugs and treatments for disease, designing future crops that can resist climate change, and enzymes that can break down the plastic that pervades the environment.
Prof McGeehan's group is already using AlphaFold's data to help develop faster enzymes for degrading plastic. He said the program had provided predictions for proteins of interest whose structures could not be determined experimentally - helping accelerate their project by "multiple years".
Dr Ewan Birney, director of EMBL's European Bioinformatics Institute, said the AlphaFold predicted structures were "one of the most important datasets since the mapping of the human genome".
DeepMind has teamed up with EMBL to make the AlphaFold code and protein structure predictions openly available to the global scientific community.
Dr Hassabis said DeepMind planned to vastly expand the coverage in the database to almost every sequenced protein known to science - over 100 million structures.