Genetic Big Data: What It Means

0
236

singularity university logo

Researchers finished the first draft of the human genome in the year 2000. Although the decreasing cost of the technology has far outpaced Moore’s Law since then, we have yet to fully leverage all that new information to make it really useful.

In a wide ranging talk on his work, from transcribing the first complete human genome to building synthetic life forms, genomic pioneer Craig Venter confessed he was disappointed that genomics has taken as long as it has to scale up.

“We just got to the starting line,” Venter said, speaking at Singularity University’s Exponential Medicine conference. “Hopefully it won’t take as long to get through it as it took to get started.”

What’s changed? Earlier this year, genomic sequencing company, Illumina, announced a new sequencing system that can produce 18,000 high quality human genomes per year at $1,000 per genome — a mark dreamed of for over a decade.

Venter’s new venture, Human Longevity, Inc., purchased two of Illumina’s new sequencers with the aim of ramping up to some 40,000 genomes a year. When asked what’s coming in the next five years, Venter quipped that predicting the future is hard unless you’re an investor, but he thinks a big acceleration is in the cards.

“[We’re going to go] from zero to a thousand miles an hour very quickly.”

craig venter human genome

Whereas we’ve sequenced around 225,000 genomes worldwide to date, Venter estimates we’ll have sequenced something like 20 times that total in only a third of the time — or roughly five million complete human genomes by 2020.

In addition to gathering genomic data from a wide population of participants, Human Longevity, Inc. will also set up “health hubs” around the world to gather a vast hoard of physiological information about each genome they sequence.

This data is also known as phenotype — the biological expression of genes in the body — and plays out in physical traits like blue eyes or genetic disorders like Huntington’s disease. By matching phenotype to genotype and comparing them over large populations, researchers hope to decipher which genes or groups of genes are responsible for which biological traits.

Venter said his firm aims to have a million integrated health records by 2020.

What they’ll do with all this information is another question entirely. Venter described Human Longevity, Inc. as chiefly a data analysis group. What does that mean? This is big data in the truest sense of the word, and like all big data ventures, it’s meaningless without effective methodologies and tools of analysis.

Venter said his genome has been online for 15 years, and we know about as much about it now as we did when it was first sequenced. More data is better, he said, but data isn’t the goal — the aim is to take data and generate knowledge with it.

Some are skeptical this can be done, according to Venter, due to the complexity of the dataset. But he said that some of that complexity can be compressed — noting how the complexity of a raw image is significantly reduced upon compression.

In an earlier talk, Jeremy Howard, founder and CEO of Enlitic and previously chief scientist at Kaggle, said machine learning algorithms are showing themselves capable of handling big data to make connections no human could see. Howard said machine learning algorithms trained on lung tumor MRIs were able to discover new diagnostic features. And software proved more accurate than humans at predicting five-year survival probabilities based on breast cancer biopsies.

These and other artificial intelligence and computational approaches may likewise, in the future, take the enormous sets of data generated by large genomic studies like Venter’s and make knowledge from them.

Asked if he foresees a time when science can create a superhuman, Venter shot back, “If you can define one, we can create one.” Pushed (and supplied with a slightly more specific definition), he said eventually we could perhaps, but we should be careful making wholesale genetic changes to the genome.

Let’s say, for example, that we find the genes for manic depression. Should we knock them out? Maybe, but manic depressives account for many major societal advances. He compared genetics to a jet airliner. You can knock out one engine and still fly the thing, but if you knock out both it’s all over.

There are other ways to use the information without rewriting the genetic code however — selection via preimplantation genetics, for example. And of course, as he noted earlier in his talk “to predict anything that can be predicted.”

Venter said Human Longevity Inc. already has 100 employees and is sequencing on the order of a few thousand genomes a month, and it’ll continue to ramp up the effort in the coming years.

“This next era is going to be really exciting.”