Thousands of never-before-seen genetic variants in the human genome have been uncovered using a new genome sequencing technology. The findings, published this week in the journal Nature, close many important gaps in human genome mapping that have long resisted sequencing.
The technique, called single-molecule, real-time DNA sequencing (SMRT), may now make it possible for researchers to identify potential genetic mutations behind many conditions whose genetic causes have long eluded scientists, said Dr. Evan Eichler, professor of genome sciences at the University of Washington, who led the team that conducted the study.
“We now have access to a whole new realm of genetic variation that was opaque to us before,” Dr. Eichler said.
To date, scientists have been able to identify the genetic causes of only about half of inherited conditions. This puzzle has been called the “missing heritability problem.” One reason for this problem may be that standard genome sequencing technologies cannot map many parts of the genome precisely. These approaches map genomes by aligning hundreds of millions of small, overlapping snippets of DNA, typically about 100 bases long, and then analyzing their DNA sequences to construct a map of the genome.
This approach has successfully pinpointed millions of small variations in the human genome. These variations arise from substitution of a single nucleotide base, called a single-nucleotide polymorphisms or SNP. The standard approach also made it possible to identify very large variations, typically involving segments of DNA that are 5,000 bases long or longer. But for technical reasons, scientists had previously not been able to reliably detect variations whose lengths are in between — those ranging from about 50 to 5,000 bases in length.
The SMRT technology used in the new study makes it possible to sequence and read DNA segments longer than 5,000 bases, far longer than standard gene sequencing technology.
This “long-read” technique, developed by Pacific Biosciences of California, Inc. of Menlo Park, Calif., allowed the researchers to create a much higher resolution structural variation map of the genome than has previously been achieved. Dr. Mark Chaisson, a postdoctoral fellow in Dr. Eichler’s lab and lead author on the study, developed the method that made it possible to detect structural variants at the base pair resolution using this data.
To simplify their analysis, the researchers used the genome from a hydatidiform mole, an abnormal growth caused when a sperm fertilizes an egg that lacks the DNA from the mother. The fact that mole genome contains only one copy of each gene, instead of the two copies that exist in a normal cell simplifies the search for genetic variation.
More than 50% of euchrmatic gaps closed or narrowed
Using the new approach in the hydatidiform genome, the researchers were able to identify and sequence 26,079 segments that were different from a standard human reference genome used in genome research. Most of these variants, about 22,000, have never been reported before, Dr. Eichler said. “These findings suggest that there is a lot of variation we are missing,” he said.
The technique also allowed Dr. Eichler and his colleagues to map some of the more than 160 segments of the genome, called euchromatic gaps, that have defied previous sequencing attempts. Their efforts closed 50 of the gaps and narrowed 40 others.
The gaps include some important sequences, Dr. Eichler said, including parts of genes and regulatory elements that help control gene expression. Some of the DNA segments within the gaps show signatures that are known to be toxic to E. coli, the bacteria that is commonly used in some genome sequencing processes.
“It is likely that if a sequence of this DNA were put into an E. coli, the bacteria would delete the DNA,” Dr. Eichler said, This may explain why it could not be sequenced using standard approaches. He added that the gaps also carry complex sequences that are not well reproduced by standard sequencing technologies. “The sequences vary extensively between people and are likely hotspots of genetic instability,” he explained.
For now, SMRT technology will remain a research tool because of its high cost, about $100,000 per genome. Dr. Eichler predicted, “In five years there might be a long-read sequence technology that will allow clinical laboratories to sequence a patient’s chromosomes from tip to tip and say, ‘Yes, you have about three to four million SNPs and insertions deletions but you also have approximately 30,000-40,000 structural variants. Of these, a few structural variants and a few SNPs are the reason why you’re susceptible to this disease.’ Knowing all the variation is going to be a game changer.”