Big Data in Genomics, Diagnostics, and Precision Medicine James Han
3 billion bases in Human genome decoded. 1990 to 2001
= + Cost of sequencing 1st human genome -> today $2.7B -> $1,500 15 years -> 1 week International team of scientists from 20+ institutions -> 1 person Sequencing technologies Bioinformatics - New applications - Fast and memory efficient algorithms + =
The Genome Project and sequencing are driving cancer research Increasing number of associations between the genome and cancer. New sequencing technology Human genome start Human genome complete
Typical sequencing runs from different technologies Bases of sequence (max) 1800 Gb 16 Gb 1G Read length (max) 150 bp 400 bp >20,000 bp Error rate 0.1% 10%
Tissue based sequencing diagnostics Tumor Sequence Therapy Identify tumor Tumor tissue Tumor DNA + Clinical Trials Medical literature
Personalized medicine in lung cancer Common EGFR mutations and implications for Erlotinib response 10-35% of non-small cell lung cancer patients have a mutation in EGFR Erlotinib Greater resistance to Erlotinib T790M Greater sensitivity to Erlotinib Exon 19 deletion G719 L858R T790M mutation infers with proper binding of the Erlotinib
Cancer mutations detected from millions of sequencings reads @SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT Sequence manipulation, signal processing, statistical modeling Using computationally efficient algorithms Somatic SNV Small In/Del Structural Variation Normal
Tumor-specific aberrations in liquid biopsy
Source: nytimes
Clinical quality big data technologies Healthcare @SEQ_ID GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT This person has stage 3 Non-small cell lung cancer. Based on her genomic data, she should be treated with Erlotinib. Confidence: 90%
Thank You.