Language of Life
by David Pescovitz
Printer-friendly
version
Ian Holmes conducted his postdoctoral research in 2000-2001 as part of the UC Berkeley Drosophila Genome Project. He joined the Department of Bioengineering in 2004.
|
What does the work of famed theoretical linguist Noam Chomsky have to do with bioengineering? DNA is just another language that can be translated, says Ian Holmes, a UC Berkeley computational biologist. Holmes is applying Chomsky's theories about grammar and syntax to the piles of genetic data that's emerging from DNA sequencing efforts around the world. The professor of bioengineering's research could someday help shed light on the beginnings of evolution and even inform the development of new antiviral drugs.
"There is so much data available in computational biology today that we don't have to be satisfied with abstract hand-waving hypothesis about evolution, mutation, and selection," Holmes says. "We're trying to actually model evolution in a quantitative way to create a realistic picture of its underlying mechanisms, patterns, rates, and modes."
To do this, the researchers study how the genomes of related species of animals differ. By comparing genes, it's possible to identify the elements of a genome that evolution has conserved, sometimes for billions of years.
"That enables us to reconstruct ancient molecular history and make smarter predictions about where the genes are in the genome and what they might do," Holmes says.
The trick is aligning the sequenced genomes from various species so that they can be combed for similarities and differences. Bioinformatics tools already exist to do this, Holmes explains, but the datasets now available only contain hundreds of genes. What kinds of tools are necessary when thousands, millions, or even billions of genes are available for comparison?
Holmes and his colleagues are developing new tools built to scale up to these massive data sets. They draw freely from such seemingly disparate fields as statistical physics, machine learning, probability theory, and, yes, linguistics.
In the 1950s, Chomsky developed a method to mathematically analyze and describe the grammar of languages. The authors of computer programming languages found inspiration in Chomsky's approach and it's also commonly used in natural language computer interfaces and translation tools. For example, a simple translation system for dialects of English would automatically substitute the American word "diaper" for the British term "nappie" or replace "or" in "color" with "our." A more complex system that can parse syntax would use a "tree tranducer" to make even more advanced substitutions such as "I have already eaten" to "I already ate."
According to Holmes, these kinds of formal rules can be combined in complex ways for bioinformatics as well. While languages like English are based on sequences of letters and punctuation or utterances, the foundation of biology are sequences of nucleotides like DNA and RNA or proteins, the building blocks of all life.
"We're using versions of these grammatical models that can translate one thing into another to make good first approximations of how you can parse a genome into its various features" and relate genes to one another, Holmes says.
Through these translations, the researchers hope to "wind present-day sequences backwards in time to make inferences about our evolutionary past." In one research project, they're exploring a controversial theory in molecular biology suggesting that all life was originally based on RNA, and that DNA and proteins evolved later.
Holmes's tools may have biomedical applications as well, identifying parts of a virus that are evolving quickly or others that have been conserved all the way back to a common ancestor. A protein coding gene that "has been sticking around for a long time might make a good target for a vaccine," Holmes explains.
"DNA is the language of life," he says. "It's a mega cliché, but the cliché hides deep mathematical truth."
Daryl Ian Holmes's BioWiki and home page
Ian Holmes, Graduate Group in Computational and Genomic Biology
Berkeley Drosophila Genome Project
Lab Notes is
published online by the Marketing and Communications Office of the UC Berkeley
College of Engineering. The Lab Notes mission is to illuminate groundbreaking
research underway today at the College of Engineering that will
dramatically change our lives tomorrow.
Media contact: Teresa
Moore, Lab Notes editor, Director of Marketing and Communications
Writer, Researcher: David
Pescovitz
Web Manager: Michele
Foley
Subscribe or send comments to the Engineering Marketing and Communications Office: lab-notes@coe.berkeley.edu.
© 2006 UC Regents.
Updated 8/21/06.
|