Big Data in Biology: A focus on genomics
Bioinformatics and Genomics O Applications: O Personalized cancer medicines O Disease determination O Pathway Analysis O Biomarker Discovery
An Interesting Point O “One article estimated that the output from genomics may soon dwarf data heavyweights such as YouTube” O “I don't know if a million genomes is the right number, but clearly we need more than we've got,” says Marc Williams, director of the Geisinger Genomic Medicine Institute.
Stephens, Z. D. et al. PLoS Biol.13, e (2015)
Genomics in the Past O DNA can have 4 different bases, A, C, G, T O Exons (1%): parts of the DNA that code for proteins O Look at nucleotides O ~13,000 single nucleotide variants. O Roughly 2% of these will affect protein composition O Unfortunately, research used cell cultures or animal modes. O However: Many of these associations were made with low levels of evidence.
Genomics Continued O Structural Variants – deletion, duplication, and translocation. O Much harder to detect than single mutations O Many genes do not code for proteins, but can still regulate protein creation, but it’s still not well known the function of many of these regions. O Capturing all such variation is desirable, but not the best in the short term O Tldr; genomics is hard.
Applications O Iceland deCODE Project: medical history records and genome data of 150,000 people O Led to Discovery of: O Genetic risk factors O Breast cancer O Alzheimer’s O Also found 10,000 people missing 1,500 different copies of both genes. O Drug responsiveness: ADHD medicine only works for one of ten preschoolers, cancer drugs are effective for 25% of patients, and depression drugs work with 6 of 10 patients. O Personalized Medicine
Issues with Bioinformatics O Icelandic work helped by a homogeneous population. O 1000 Genomes project captured some diversity, but mainly captured Caucasian populations. O “Because they come from the genetic mother ship, so to speak, people of African ancestry carry a lot more genetic variants than non-Africans… Variants that seem unusual in Caucasians might be common in Africans, and may not actually cause disease.” - says Isaac Kohane, a bioinformatician at Harvard Medical School in Boston, Massachusetts. O Reference genome: the comparison tool that many researchers use is flawed. O 1 st iteration: random donors of unidentified ethnicity. O Currently it incorporates more human genomic diversity.
Solutions O Relationships between doctors and researchers to create models between diseases and genetics. O Harvesting genomes produces up to 40 Petabytes (PB) per a year. O Computational power: The more variables you add, the more people you add, it gets harder and harder. O Silicon Valley Lure: people needed for bioinformatics need to be able to harness massive parallel computation.
Conclusion O Two Main Issues: O Difficulty of bioinformatics due to genomics O Computational power and the need for collaboration O Yet solving these problems, could easily lead to incredible improvements in medicine.