Presented by: Andrew McMurry Boston University Bioinformatics Children’s Hospital Informatics Program Harvard Medical School Center for BioMedical Informatics.

Slides:



Advertisements
Similar presentations
© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
Advertisements

The local police have recovered these three body parts from two backyards in Madison. Break into your groups and answer the following questions: - How.
Protein Synthesis $100 $200 $300 $400 $500 $100$100$100 $200 $300 $400 $500 Central Dogma Basics Transcription RNA Mutations FINAL ROUND Translation.
Outline to SNP bioinformatics lecture
Basic Biology for CS262 OMKAR DESHPANDE (TA) Overview Structures of biomolecules How does DNA function? What is a gene? How are genes regulated?
Biological Databases Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Bioinformatics Student host Chris Johnston Speaker Dr Kate McCain.
Future Trends: Translational Informatics James J. Cimino Chief, Laboratory for Informatics Development Mark O. Hatfield Clinical Research Center National.
The Central Dogma of Molecular Biology (Things are not really this simple) Genetic information is stored in our DNA (~ 3 billion bp) The DNA of a.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
PolyPhen and SIFT: Tools for predicting functional effects of SNPs Epi 244 Spring 2009 Sam S. Oh.
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
Human Genome Project Seminal achievement. Scientific milestone. Scientific implications. Social implications.
Proteins, Mutations and Genetic Disorders. What you should know One gene, many proteins as a result of RNA splicing and post translational modification.
Presented by Karen Xu. Introduction Cancer is commonly referred to as the “disease of the genes” Cancer may be favored by genetic predisposition, but.
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
An informatics approach to analyzing the incidentalome J.Berg et al. Genetics in Medicine Presented by Li Changjian.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Simple Nucleotide.
Alternative Splicing. mRNA Splicing During RNA processing internal segments are removed from the transcript and the remaining segments spliced together.
Data Analysis Summary. Elephant in the room General Comments General understanding that informatics is integral in medical sequencing and other –omics.
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
A little about how DNA works David Sloane, MD Special Studies, HGSE Brigham and Women’s Hospital Harvard Medical School 2/10/2014David.
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
RNA and Protein Synthesis
Gene Mutations Higher Human Biology Unit 1 – Human Cells.
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
GWAS Hits and Functional Implications Peter Castaldi February 1, 2013.
Online Mendelian Inheritance in Man (OMIM): What it is & What it can do for you Knowledge Management & Eskind Biomedical Library January 27, 2012 helen.
Korea BioInformation Center Byoung-Chul Kim
Molecular Biology in a Nutshell (via UCSC Genome Browser) Personalized Medicine: Understanding Your Own Genome Fall 2014.
Eukaryotic Genomes 15 November, 2002 Text Chapter 19.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
12.3 DNA, RNA, and Protein Objective: 6(C) Explain the purpose and process of transcription and translation using models of DNA and RNA.
SCRIPPS GENOME ADVISER Galina Erikson Senior Bioinformatics Programmer The Scripps Translational Science Institute Scripps Translational Science Institute.
Ch 15 -.Gene Regulation  Prokaryote Regulation Operon * not found in eukaryotes Operon * not found in eukaryotes Regulator gene = codes for repressor.
Gene Regulations and Mutations
Sackler Medical School
Gene, Proteins, and Genetic Code. Protein Synthesis in a Cell.
Bioinformatics and Computational Biology
Topics in Bioinformatics CS832b Bin Ma. Lecture 1: Basic.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Single nucleotide polymorphisms and Large scale variation
Lesson Four Structure of a Gene. Gene Structure What is a gene? Gene: a unit of DNA on a chromosome that codes for a protein(s) –Exons –Introns –Promoter.
Starter What do you know about DNA and gene expression?
Chapter 11 Review. Explain the difference between each of the following 1. Operator, promoter -Operator: DNA segment where an inhibitor protein binds.
Affymetrix User’s Group Meeting Boston, MA May 2005 Keynote Topics: 1. Human genome annotations: emergence of non-coding transcripts -tiling arrays: study.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Human Genomics Higher Human Biology. Learning Intentions Explain what is meant by human genomics State that bioinformatics can be used to identify DNA.
KEY CONCEPT Gene expression is carefully regulated in both prokaryotic and eukaryotic cells. Chapter 11 – Gene Expression.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
The regulation of Caspase 8 chIP-seq motifs mRNA expression DNA methylation.
Alternative Splicing. mRNA Splicing During RNA processing internal segments are removed from the transcript and the remaining segments spliced together.
Lesson Four Structure of a Gene.
Lesson Four Structure of a Gene.
Functional Mapping and Annotation of GWAS: FUMA
Interpretation Next Generation Sequencing (Bench Clinic)
Transcription Translation
Bioinformatic Tools for Epigenetic Research
2/23/15 Learning Objectives
Gene Hunting: Design and statistics
What makes a mutant?.
Content and Labeling of Tests Marketed as Clinical “Whole-Exome Sequencing” Perspectives from a cancer genetics clinician and clinical lab director Allen.
Topic 7: The Organization and Control of Eukaryotic Genomes
Genome organization and Bioinformatics
Ensembl Genome Repository.
DNA and the Genome Key Area 6a & b Mutations.
DNA and the Genome Key Area 6a & b Mutations.
The Structure of the Genome
Regulating gene expression
Presentation transcript:

Presented by: Andrew McMurry Boston University Bioinformatics Children’s Hospital Informatics Program Harvard Medical School Center for BioMedical Informatics This Presentation Available at:

Outline  Incidental Findings and Disconnected Patient Cohorts  Disease Association Studies Using SNPs  How SNPs cause disease  Computationally predict affect of SNPs within introns, exons, and regulatory regions  The Future Is Now: SNPs, Personalized Medicine, and Translational Research

Incidental Findings and Disconnected Patient Cohorts  IF the central dogma of Biology is: “From DNA ->RNA ->Protein”  THEN where is the patient data for association studies? Very little patient data spanning DNA/RNA/ protein/phenotype across a single cohort Need to obtain “robust” sample sizes to avoid incidental findings due to multiple testing [1] [1] Isaac Kohane, Daniel Masys, and Russ Altman. "The Incidentalome: A Threat to Genomic Medicine" JAMA 296(2): July 12, 2006.

Disease Association Studies Using SNPs  DNA sequencing technologies still very expensive  Stunningly few patients Minimal sequence coverage  Could change in time with Solexa/454  Even with solexa/454 there is a massive task of piecing together the results (often max sequence read shorter than single repeated gene)  Rate limiting step: Adoption rate of DNA sequencing  Use what is available in abundance! SNP chips  Abundance of SNP chips in public repos on many diseases Whole genome coverage 500k SNPs for $250

Disease Association Studies Using SNPs DNA to RNA to Protein  Associating DNA & RNA GEO alone well over 100k Gene Expression Arrays What if we could correlate SNPs affect on Gene Expression?  Associating DNA & Gene Product (protein) Countless public protein databases What if we could correlate SNPs affect on Protein Coding?  Association studies involving multiple genomic measurements What are the existing studies and models (HMMs/Bayes nets) that could be strengthened with evidence from SNP chips?

How SNPs cause disease  Intron  Likely no affect  Protein Coding  Missense Synonymous  Same Amino Acid Non Synonymous  Different Amino Acid Nonsense Premature STOP Splicing Regulation Incorrect final mRNA transcript Transcriptional Regulation Differential gene expression Post Translational Protein phosphorylation

So how do we measure all these affects of SNPs?

F-SNP : integrated approach 1.Classify SNP site using dbSNP Intron Coding Region Splice Site TF binding Site Post-Translational Site 2. Evaluate using the specialized algorithms/dbs Coding region (missense/nonsense mutations) Splice Site (intronic/exonic sites) TF binding Site (promoter/repressor/etc) Post-Translational Site (Phospho/Tyrosine/0-glycosylation) 3. “Majority Vote” across algorithms

F-SNP decision procedure for functional SNPs

F-SNP: User Interfaces & Data Download  Public Web Site  Federated Query = entire database cannot be downloaded  Currently: no SOAP (webservice) support no RSS support No source code available  However: Paper gives explicit instructions on how to reproduce the algorithm and construct the database using dbSNP, OMIM, etc.

“Large N Study” using F-SNP Functional Category# of Assessed SNPs# of Functional SNPs Protein Coding154,14066,899 Splicing Regulation73,0518,075 Transcriptional Regulation453,71078,296 Post Translation64,7364,477 Total559,322115,356

Evaluate Individual SNP (rs )

SNP summary and Functional Predictions

SNP Primary Information (rs ) Locus Alleles Ancestral Allele Validation (if any) Region Link to References

F-SNP: Functional Predictions

F-SNP Prediction Detail: PolyPhen = benign affect on protein coding

F-SNP Prediction Detail: SNPs3D = deleterious to protein coding NCBI Gene Information Product breast cancer 1, early onset Other names,BRCA1,BRCAI,BRCC1,IRIS,PSCP,RNF53 NCBI Entrez Gene Summary: This gene encodes a nuclear phosphoprotein that plays a role in maintaining genomic stability and acts as a tumor suppressor. (…) Mutations in this gene are responsible for approximately 40% of inherited breast cancers and more than 80% of inherited breast and ovarian cancers. Alternative splicing plays a role in modulating the subcellularlocalization and physiological function of this gene. Many alternatively spliced transcript variants have been described for this gene but only some have had their full-length natures identified. (…)

F-SNP functional prediction on Protein Coding  2 votes benign, 1 deleterious, 1 nonsynonymous on Splicing Regulation  predicted functional impact (by majority vote)

Gene level view of BRCA1 Query by gene name = “BRCA1” Returns list of SNPs in BRCA1 Returns list of Cancers associated with BRCA1

Gene level view of BRCA1 our SNP has functional impact our SNP has neighboring functional SNPS

Disease Level View : Breast Cancer

Show all disease genes associated with breast cancer Denote if SNPs are present in those genes (5k up/downstream)

Recap of Disease Level View

The Future Is Now: SNPs, Personalized Medicine, and Translational Research SNP profiling becoming part of routine care [2] Increase # of clinically annotated SNP chips  Increase # of disease association studies using SNPs Increase in NIH focus on “translational research” that bridges routine care delivery with research efforts Genome Wide Association Studies (GWAS) that actually get funded [2] Kohane IS, Mandl KD, Taylor PL, Holm IA, Nigrin DJ, Kunkel “LM. Medicine. Reestablishing the researcher-patient compact.” Science Nov 16;318(5853):1068.

F-SNP Summary  Incidental Findings and Disconnected Patient Cohorts  Central dogma of biology DNA->RNA-Protein, yet we lack cohort spans all measurements  Using limited sample size will inevitably lead to incidental outcomes  Disease Association Studies Using SNPs  Don’t wait for DNA sequencing to become widespread  SNPs are becoming an abundant resource and not going to disappear  How SNPs cause disease  Protein Coding  Splicing Regulation  Transcription Regulation  Post Translation  Computationally predict affect of SNPs within introns, exons, and regulatory regions  Multitude of existing SNP analysis tools and resources  F-SNP provides a single web based resource to mine SNP disease associations  Query and analysis by SNP, Gene, Disease  The role of SNPs in Personalized Medicine & and Translational Research