Compared to mustard plants, humans have yet to ‘ketchup’

New sights and sounds will greet visitors to the Herbert S. Waxman Clinical Skills Center.

The early results from the Human Genome Project suggested that humans are at first glance genetically less complex than a mustard plant. In fact, our genome contains about 20,000 to 25,000 sequences suggestive of “genes” encoding proteins, while the mustard plant contains about 27,000. That shouldn't be, and it initially didn't make sense to many of the scientists working on the human genome. Obviously, humans are far more complicated than a mustard plant.

This paradox harkens back to one of the teachings of yesterday's high school and college biology and genetics courses. “About 98%-99% of the human genome appears to be junk, left over from evolutionary dead ends.” But any true student of biology could spot a problem here. Evolution tends to trim baggage and inefficiencies. Why would we use only 1%-2% of our genetic material after a few billion years of trimming the excess?

In 2007 genome scientists completed the first phase of a massive collaborative project, called ENCODE (Encyclopedia of DNA Elements), supported by the National Human Genome Research Institute (NHGRI) of the National Institutes of Health. ENCODE, initiated in 2003, was designed to enhance our understanding of the functional anatomy of the raw DNA sequence that the Human Genome Project revealed. (Go online to learn more about ENCODE, and for an interesting summary of its results, see the June 2007 issue of Genome Research.

ENCODE's first phase looked in detail at about 1% of the human genome's entire structure. All that junk DNA is—no surprise—not junk after all. The human genome is a complex ecosystem with a wide variety of different types of DNA elements, not simply genes encoding proteins. Much of it appears to play a role in determining what genes are expressed, in what order and at what levels. Substantial portions of our DNA encode information for the production of small RNA molecules that never become translated to proteins, but rather fold up on themselves and act in concert with peptides as regulators of gene expression. Some of these molecules likely act as RNA-based enzymes.


Very long stretches of DNA that don't seem to code for any proteins and therefore would not be predicted by previous models to be highly conserved by evolution are exceptionally highly conserved. In contrast, regions that previously would have been predicted to be conserved turn out not to be under as much evolutionary constraint as had been thought. Intriguingly, we don't really know the function of these long, highly conserved sequences.

How about the lower number of genes in humans compared to mustard plants? The emerging model is that human genes don't ascribe to the old teaching of “one gene, one protein,” but are much more akin to Russian stacking dolls. Genes are nested inside of genes, using alternate promoters, start sites, splicing sites and stop sites. DNA is double-stranded, and there is evidence that overlapping genes occur that run on opposite strands, encoding proteins with different functions. RNA molecules are translated to form proteins, and seemingly unrelated RNA molecules can be assembled (trans-spliced) to form templates for entirely new proteins.

To further complicate this picture, elements in the DNA known as pseudogenes exist which, save for minor variations, look exactly like other functional genes. ENCODE and related results suggest that they are numerous, that they may affect transcription of neighboring genes, and that they may not be as transcriptionally silent as once believed.

The first phase of ENCODE has demonstrated how little we really know about the human genome and its functions, and has provided tantalizing glimpses into novel strategies for thwarting human disease. To this end, NHGRI announced more than $80 million in grants to further our understanding of the functional elements in the 99% of the genome not covered in the pilot phase of ENCODE.

Unraveling the Gordian knot of the regulation of DNA function has clear implications for biomedicine. Many drugs in current use either directly or indirectly affect DNA transcription or translation, a classic example being steroids. A better understanding of the mechanism of action of drugs like steroids could allow the development of far more selective drugs that can target up-regulation or down-regulation of various genes important to human disease.

And, it seems likely that at least some of the hereditary burden of disease will stem from these newly described regulatory regions that affect how and when genes are expressed. As well, these newly discovered regulatory mechanisms and pathways might be co-opted to yield therapeutics. Proteins and peptides have a long history of being used therapeutically, so why not small RNA molecules?

In the late ‘90s Craig Mello, PhD, and Andrew Fire, PhD, described a mechanism by which double-stranded RNA molecules occurring naturally can modulate gene function. Now, RNA-based therapeutics involving small interfering RNA are being used in a handful of human clinical trials for chronic myelogenous leukemia and wet age-related macular degeneration (AMD) in the U.S. For wet AMD, preliminary data suggests a great deal of promise.

Given what researchers are learning from ENCODE suggesting the pervasiveness of small RNA molecules in the day-to-day regulation of our genome, these trials may be the tip of an iceberg.