A whirlwind tour of the human genome - Genome engineering: It never ends well

Putting the science in fiction - Dan Koboldt, Chuck Wendig 2018

A whirlwind tour of the human genome
Genome engineering: It never ends well

By Dan Koboldt

The human genome is present in virtually every cell of our bodies and contains the complete set of instructions to build a human being. The first effort to read that instruction book—the Human Genome Project—wrapped up in 2001. Even then, it was clear that our genome was a large, complex, and puzzling thing. Seventeen years later, we’re still working to unravel all of its mysteries. Here’s a whirlwind tour of what we know so far.

The big picture

The human genome comprises 3.2 billion base pairs, spread across twenty-two autosomes and two sex chromosomes. The autosomes are generally ordered by size; chromosome 1 is the largest (about 250 million base pairs), while chromosomes 21 and 22 are the smallest (48 and 51 million base pairs, respectively). Amusingly, the two sex chromosomes are dramatically different in size: Chromosome X is 155 million base pairs (about the size of chromosome 7), but chromosome Y is just 59 million.

There’s also a tiny chromosome in mitochondria, the energy-producing organelles found in human cells. The mitochondrial genome is miniscule in size (16,500 base pairs), but a single cell might have as many as 2,000 copies of it. Unlike autosomes and sex chromosomes, the mitochondrial genome is only inherited from the mother. Between that and the multiple copies, it can give rise to some odd patterns of genetic inheritance. If one mitochondrion acquires a disease-causing mutation, it usually doesn’t cause symptoms because there are hundreds or thousands of other mitochondria in the cell. Over time, however, more mitochondria pick up mutations. The cell continues to function until this reaches a certain threshold, which can take many years. As a result, many of the diseases caused by mitochondrial mutations (such as Leber optic atrophy) are inherited at birth but don’t cause symptoms until late into adulthood.

Chromosome structure

Most of us picture chromosomes as the X-shaped things we learned about when studying mitosis in high school biology. That’s how they look under a light microscope during metaphase, when two sister chromatids (the original and its shiny new copy) are joined together at the centromere, a region of highly repetitive DNA sequence where proteins bind to pull sister chromatids apart.

Because the DNA replication machinery can’t copy all the way to the end of the molecule, chromosomes also have special structures at each end called telomeres. These are stretches of a six-letter sequence (TTAGGG in humans) repeated over and over again. They’re essentially disposable bases, and they have to be, because a DNA strand gets progressively shorter every time a cell divides. The telomere-shortening process is so uniform that, by counting their size, it’s possible to estimate the number of times a cell has divided, and from that, the approximate age of the person.

Genes and functional elements

There are about twenty thousand known genes in our genome that encode proteins (i.e., make messenger RNA that’s translated into protein). The fraction of bases that eventually encode protein sequences is exceedingly small (about 1.5 percent). The rest of the genome, the non-coding genome, nevertheless contains many other types of elements that can regulate things happening in a cell. Many of the other elements—promoters, untranslated regions, splice sites, exons, and introns—are structures that help govern transcription (making messenger RNA) and translation (making proteins). We’ve discovered, however, that there are many other kinds of noncoding elements that help regulate when and how proteins are made:

· TRANSCRIPTION FACTOR BINDING SITES are short, specific base sequences that are recognized and bound by the proteins that drive transcription. For example, the sequence TATAAA is usually found in the gene promoter (upstream of the gene) and likely helps position RNA polymerase II—the enzyme that makes messenger RNA from DNA—to start in the right place.

· ENHANCERS are big stretches of noncoding DNA that help drive the activity of certain genes. These regions are believed to have binding sites for transcription factors and other proteins. Often, they are near the genes whose activity they enhance, but they can also be located thousands of base pairs away.

· REPRESSORS are elements that do the opposite of enhancers: They prevent genes from being transcribed. Usually this is accomplished by recruiting proteins that either bind or make chemical modifications to DNA so it’s inaccessible to the transcription machinery. For example, since females are born with two copies of the X chromosome, one of them is repressed (inactivated) in each cell. This ensures that the cell doesn’t get a “double dose” of the genes on the X chromosome.

· NONCODING RNA GENES are transcribed into various kinds of functional RNA, such as transfer RNA (tRNA, which matches amino acids to specific codons) and ribosomal RNA (rRNA, which aids in translation). There are also about eight hundred genes that encode micro-RNA, which are very short sequences (eighteen to twenty-four nucleotides long) that can block messenger RNA from being translated into proteins. They do this by binding complementary sequences in the untranslated region of the target micro-RNA.

If you counted the bases in all of the genes and other functional elements I’ve described so far, you’d come well short of 3.2 billion. Even if we understood all of these elements perfectly well (which we don’t), it begs the question: What the heck does the rest of the genome do?

Honestly, we don’t know. I think that a lot of it will probably turn out to have no function whatsoever. Other parts might have a function that we simply don’t know about.

The genome and genetic diseases

Get ready, because I’m about to make this relevant to speculative fiction.

When people hear the phrase “genetic disease,” the examples that often come to mind are severe inherited disorders, like sickle cell disease, cystic fibrosis, and Huntington’s disease. Most of these are caused by very rare mutations in the coding region of a gene. This makes sense, because a mutation that disrupts or alters protein sequence is understandably capable of having a severe, immediate effect. Yet the vast majority of human traits that are “heritable” (i.e., have a genetic factor) are not so simply explained.

Many researchers, myself included, think that the genetic variation behind these is outside of the known coding regions. Think about it: A subtle change to a regulatory element could easily have an effect on a human being. For this mental exercise, let’s use the low-density lipoprotein receptor (LDLR) gene. It makes a protein that transports LDL (the carrier of most cholesterol) out of the blood. Severe mutations in the coding region of LDLR cause autosomal dominant hypercholesterolemia, a severe lipid disease. Instead, picture a subtle change in a regulatory element that influences the LDLR gene activity. It might not cause a severe, obvious effect. Over the seventy-plus years of the average human lifespan, however, even a very minor change can have long-term ramifications.

Now, picture the same scenario, but change “transports LDL” to “prevents magic use” or “protects against becoming a zombie.” There’s your SF/F story.