In the decade since the genome was sequenced in 2003, scientists, engineers and doctors have struggled to answer an all-consuming question: Which DNA mutations cause disease?
A new computational technique developed at the University of Toronto may now be able to tell us.
Most existing methods examine mutations in segments of DNA that encode protein, what Frey refers to as low-hanging fruit. To find mutations outside of those segments, typical approaches such as genome wide association studies take disease data and compare the mutations of sick patients to those of healthy patients, seeking out patterns. Frey compares that approach to lining up all the books your child likes to read and looking for whether a particular letter occurs more frequently than in other books.
“It doesn’t work, because it doesn’t tell you why your kid likes the book,” he says. “Similarly, genome-wide association studies can’t tell you why a mutation is problematic.”
But looking at splicing can. Splicing is important for the vast majority of genes in the human body. When mutations alter splicing, genes may produce no protein, the wrong one or some other problem, which could lead to disease.
Frey’s team, which includes researchers from engineering, biology and medicine, developed a computer model that mimics how the cell directs splicing by detecting patterns within DNA sequences, called the ‘splicing code’. They then used their system to examine mutated DNA sequences and determine what effects the mutations would have, effectively scoring each mutation. Unlike existing methods, their technique provides an explanation for the effect of a mutation and it can be used to find mutations outside of segments that code for protein.
To develop the computer model, Frey’s team fed experimental data into machine learning algorithms, so as to teach the computer how to examine a DNA sequence and output the splicing pattern.
Their method works surprisingly well and has led to new discoveries. For example, using DNA sequences from five patients with autism provided by Scherer, the model was able to identify 39 new genes that could be implicated in autism spectrum disorder, a 40 per cent increase from about 100 previously known autism genes.
“Brendan’s work is groundbreaking because it represents a first serious attempt to decode the portions of that 98 per cent of the human genome outside the genes that are typically studied in genetic disease studies,” Scherer says. “This is particularly exciting since it is thought these segments of DNA may contain much of the missing information that we have been looking for in studies like autism.”
Scherer and Frey began collaborating at CIFAR meetings five years ago and they intend to use this model to analyze the genomes of 10,000 families with autism as part of the MSSNG study. The paper also sheds light on the genetic mechanisms that lead to spinal muscular atrophy, a leading cause of infant death, and nonpolyposis colorectal cancer.
Frey says his involvement in two CIFAR programs was crucial in making connections and in developing interdisciplinary expertise among his graduate students and postdoctoral fellows, including co-authors Hui Xiong, Babak Alipanahi, Leo Lee and Hannes Bretschneider. Also involved were Ben Blencowe of the University of Toronto and Nebojsa Jojic of Microsoft Research.
“My participation in the Neural Computation & Adaptive Perception program enabled my group to have access to the best techniques in deep learning,” Frey says. He adds that his interactions with members of the Genetic Networks program challenged him to take on some of the toughest questions in genetics.
CIFAR Senior Fellow Frederick Roth, co-director of the program in Genetic Networks, says Drs. Frey, Scherer and Hughes have been key members of the program and its efforts to interpret the genome. “Many of us will soon know our complete human genome sequence, which will be like having an encyclopedic guide to ourselves that is written in an alien language. This work promises to interpret the impact of mutations in a broader region of our genome than has been previously possible,” he says.