Machine learning reveals genetic roots of cancers and autism

In the decade since the genome was sequenced in 2003, scientists, engineers and doctors have struggled to answer an all-consuming question: Which DNA mutations cause disease? A new computational technique developed at the University of Toronto may now be able to tell us.
 
A Canadian research team led by Professor Brendan Frey has developed the first method for ‘ranking’ genetic mutations based on how living cells ‘read’ DNA, revealing how likely any given alteration is to cause disease. They used their method to discover unexpected genetic determinants of autism, hereditary cancers and spinal muscular atrophy, a leading genetic cause of infant mortality.
 
Their findings appear in the December 18 issue of the leading journal Science and are already grabbing headlines. (Read the article in Quanta Magazine; read the Globe and Mail story; see coverage at autismspeaks.org.) Think of the human genome as a mysterious text, made up of three billion letters.
 
“Over the past decade, a huge amount of effort has been invested into searching for mutations in the genome that cause disease, without a rational approach to understanding why they cause disease,” said Frey. “This is because scientists didn’t have the means to understand the text of the genome and how mutations in it can change the meaning of that text.”
 
It’s a puzzle that Frey points out was captured by biologist Eric Lander of the Massachusetts Institute of Technology in a famous quote: “Genome. Bought the book. Hard to read.”