Integrative Genomics Data Concepts Analysis and Functional Explanation

Slides:



Advertisements
Similar presentations
IB Biology Definitions 2009 – 2013/ Define diffusion. Define diffusion.
Advertisements

Martin John Bishop UK HGMP Resource Centre Hinxton Cambridge CB10 1 SB
applications of genome sequencing projects
An Introduction to the application of Molecular Markers
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
Module 12 Human DNA Fingerprinting and Population Genetics p 2 + 2pq + q 2 = 1.
Preview: Some illustrations of graphs in Integrative Genomics Genomics  Transcriptomics: Alternative Splicing Genomics  Phenotype: Genetic Mapping Comparative.
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
The Central Dogma & Data DNA mRNA Transcription Protei n Translation Metabolite Cellular processes Phenotype Embryology Organismal Biology Genetic Data.
Mini project-examination It is expected to be 3 days worth of work. You will be given this in week 8 I would expect 7-10 pages You will be given 2-4 key.
Gene expression analysis summary Where are we now?
Genetics All Your Hopes and All Your Fears. Genetics Classical Genetics –Mendelian genetics Fundamental principles underlying transmission of genetic.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
Quantitative Genetics
Human Genetics Overview.
CISC667, F05, Lec24, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) DNA Microarray, 2d gel, MSMS, yeast 2-hybrid.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Observing Patterns in Inherited Traits
Quantitative Genetics
Cloning, genomes, and proteomes
Special Topics in Genomics Lecture 1: Introduction Instructor: Hongkai Ji Department of Biostatistics
Department of Biomedical Informatics Bioinformatics and Genetics Kun Huang Department of Biomedical Informatics OSUCCC Biomedical Informatics Shared Resource.
Chapter 3 -- Genetics Diversity Importance of Genetic Diversity Importance of Genetic Diversity -- Maintenance of genetic diversity is a major focus of.
Linkage and LOD score Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Methods of Genome Mapping linkage maps, physical maps, QTL analysis The focus of the course should be on analytical (bioinformatic) tools for genome mapping,
Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.
Broad-Sense Heritability Index
Chapter 5 Characterizing Genetic Diversity: Quantitative Variation Quantitative (metric or polygenic) characters of Most concern to conservation biology.
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly Applications Genome Epigenome Transcriptome.
The Complexities of Data Analysis in Human Genetics Marylyn DeRiggi Ritchie, Ph.D. Center for Human Genetics Research Vanderbilt University Nashville,
Molecular Biology Fourth Edition
Manipulation of DNA. Restriction enzymes are used to cut DNA into smaller fragments. Different restriction enzymes recognize and cut different DNA sequences.
 The process by which desired traits of certain plants and animals are selected and passed on to their future generations is called selective breeding.
MOLECULAR BIOLOGY – Basic Genetics MOLECULAR BIOLOGY AND GENETICS 1.Genetic basis 2.DNA structure and genetic code 3.Cell division,
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Discovery of a rare arboreal forest-dwelling flying reptile (Pterosauria, Pterodactyloidea) from China Wang et al. PNAS Feb. 11, 2008.
ABC for the AEA Basic biological concepts for genetic epidemiology Martin Kennedy Department of Pathology Christchurch School of Medicine.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
Central dogma: the story of life RNA DNA Protein.
Who was Mendel? Mendel – first to gather evidence of patterns by which parents transmit genes to offspring.
Lecture 24: Quantitative Traits IV Date: 11/14/02  Sources of genetic variation additive dominance epistatic.
Review 4: Heredity & Molecular Genetics AP Biology.
Genetics: Introductory Notes. Principal Points: Genetics can be divided into 4 subdisciplines – Transmission genetics – passage of genes from generation.
Computational Biology of Networks JH – Jotun Hein RL – Rune Lyngsø BT – Bhalchandran Thatte Biology Algorithmics.
Chapter 12 Assessment How could manipulating DNA be beneficial?
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Analyzing DNA using Microarray and Next Generation Sequencing (1) Background SNP Array Basic design Applications: CNV, LOH, GWAS Deep sequencing Alignment.
Gene Technologies and Human ApplicationsSection 3 Section 3: Gene Technologies in Detail Preview Bellringer Key Ideas Basic Tools for Genetic Manipulation.
Genetics of Gene Expression BIOS Statistics for Systems Biology Spring 2008.
An atlas of genetic influences on human blood metabolites Nature Genetics 2014 Jun;46(6)
From: Scheinfeld A (1965) Your heredity and environment. JB Lippincott Company, Philadelphia Phenotypic variation among humans is enormous.
Different microarray applications Rita Holdhus Introduction to microarrays September 2010 microarray.no Aim of lecture: To get some basic knowledge about.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
GENOME ORGANIZATION AS REVEALED BY GENOME MAPPING WHY MAP GENOMES? HOW TO MAP GENOMES?
All Your Hopes and All Your Fears
SNPs and complex traits: where is the hidden heritability?
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Linking Genetic Variation to Important Phenotypes
Inferring Genetic Architecture of Complex Biological Processes Brian S
In these studies, expression levels are viewed as quantitative traits, and gene expression phenotypes are mapped to particular genomic loci by combining.
Chapter 7 Multifactorial Traits
Molecular Biology Fourth Edition
On the Dependency of Cellular Protein Levels on mRNA Abundance
Presentation transcript:

Integrative Genomics Data Concepts Analysis and Functional Explanation G - genetic variation T - transcript levels P - protein concentrations M - metabolite concentrations F – phenotype/phenome Data Models: Networks Hidden Structures/ Processes Evolution Knowledge GF Mapping Concepts Analysis and Functional Explanation Single Data Analysis Single type + phenotype Analysis Multiple Data Types/ Integrated Analysis Molecular Dissection Detailed Dynamic Model

Cost of Disease Most research in the bioscience is motivated by hope of disease intervention. Alan D Lopez, Colin D Mathers, Majid Ezzati, Dean T Jamison, Christopher J L Murray Global and regional burden of disease and risk factors, 2001: systematic analysis of population health data Lancet 2006; 367: 1747–57 Major WHO projects have tried to tabulate the costs of different diseases Genetic Diseases are diseases where there is genetic variation in the susceptibility. Even small improvements would save many billions

What is a bacteria? A human being? From wikipedia Eukaryote 1013 atoms Central Dogma Prokaryote 1010 atoms DNA RNA Protein Metabolism & Cell Structure Organism Human 1014 cells

The Central Dogma & Data Protein-DNA binding Data Chip-chip protein arrays DNA mRNA Transcription Protein Translation Metabolite Cellular processes Phenotype Embryology Organismal Biology Genetic Data SNPs – Single Nucleotide Polymorphisms Re-sequencing CNV - Copy Number Variation Microsatellites Transcript Data Micro-array data Gene Expression Exon Splice Junction Proteomic Data NMR Mass Spectrometry 2D-gel electrophoresis Metabonomic Data NMR Mass Spectrometry 2D-Gel electrophoresis Phenotypic Data Clinical Phenotypes Disease Status Quantitative Traits Blood Pressure Body Mass Index Genetic Mapping Genetical Genomics Transcriptomics Proteomics Metabonomics

Structure of Integrative Genomics DNA mRNA Protein Metabolite Phenotype Classes Parts Concepts GF Mapping Physical models: Models: Networks Phenomenological models: Unobservered/able Hidden Structures/ Processes Knowledge: Externally Derived Constraints on which Models are acceptable Evolution: Cells in Ontogeny Individuals/Sequences in a Population Species Analysis: Data + Models + Inference Model Selection Functional Explanation

G: Genomes A diploid genome: Key challenge: Making a single molecule observable!! Classical Solution (70s): Many De Novo Sequencing: Halted extensions or degradation extension degradation 80s: From one to many: PCR – Polymerase Chain Reaction 00s: Re-sequencing: Hybridisation to complete genomes Future Solution: One is enough!! Observing the behavior of the polymerase Passing DNA through millipores registering changes in current

G: Assembly and Hybridisation Target genome 3*109 bp (unobservable) Reads 3-400 bp (observable) Contigs Sufficient overlap allows concatenation Contigs and Contig Sizes as function of Genome Size (G), Read Size (L) and overlap (Ø): Lander & Waterman, 1988 Statistical Analysis of Random Clone Fingerprinting {A,C} Complementary or almost complementary strings allow interrogation. {T,G} probe

E - Epigenomics

T - Transcriptomics Classical Expression Experiment: The Gene is transcribed into pre-mRNA Pre-mRNA is processed into mRNA Probes are designed hybridizing to specific positions Measures transcript levels averaging of a set of cells.

T - Transcriptomics RNA-Seq Expression Experiment: Advantages - Discoveries More quantitative in evaluating expression levels More precise in positioning Much more is transcribed than expected. Wang, Gerstein and Snyder (2009) RNA-Seq: a revolutionary tool for Transcriptomics NATURE REVIEwS genetics VOLUME 10.57-64 Transcription of genes very imprecise

M - Metabonomics

Concepts Knowledge: Externally Derived Constraints on which Models are acceptable GF Mapping Evolution: Cells in Ontogeny Species Individuals/Sequences in a Population Hidden Structures/ Processes Unobservered/able Models: Networks Phenomenological models: Physical models:

GF Mechanistically predicting relationships between different data types is very difficult Empirical mappings are important Functions from Genome to Phenotype stands out in importance G is the most abundant data form - heritable and precise. F is of greatest interest. DNA mRNA Protein Metabolite Phenotype Height Weight Disease status Intelligence ………. “Zero”-knowledge mapping: dominance, recessive, interactions, penetrance, QTL,. Mapping with knowledge: weighting interactions according to co-occurence in pathways. Model based mapping: genomesystemphenotype Environment

The General Problem is Enormous Set of Genotypes: Diploid Genome 1 3* 107 In 1 individual, 3* 107 positions could segregate. In the complete human population 5*108 might segregate. Thus there could be 2500.000.00 possible genotypes Partial Solution: Only consider functions dependent on few positions Causative for the trait Classical Definitions: Dominance Recessive Additive Heterotic Single Locus Multiple Loci Epistasis: The effect of one locus depends on the state of another Quantitative Trait Loci (QTL). For instance sum of functions for positions plus error term.

Genotype and Phenotype Covariation: Gene Mapping Time Sampling Genotypes and Phenotypes Reich et al. (2001) Decay of local dependency A set of characters. Binary decision (0,1). Quantitative Character. Dominant/Recessive Penetrance Spurious Occurrence Heterogeneity genotype Genotype  Phenotype phenotype Genetype -->Phenotype Function Result:The Mapping Function

Pedigree Analysis & Association Mapping Pedigree unknown Many meiosis (>104) Resolution: 10-5 Morgans (Kbases) 2N generations r M D Pedigree Analysis: Pedigree known Few meiosis (max 100s) Resolution: cMorgans (Mbases) r M D Finnd Likange eksempel og regn det igennem Adapted from McVean and others

Heritability: Inheritance in bags, not strings. The Phenotype is the sum of a series of factors, simplest independently genetic and environmental factors: F= G + E Parents: Siblings: Relatives share a calculatable fraction of factors, the rest is drawn from the background population. This allows calculation of relative effect of genetics and environment Heritability is defined as the relative contribution to the variance of the genetic factors: Visscher, Hill and Wray (2008) Heritability in the genomics era — concepts and misconceptions nATurE rEvIEWS | genetics volumE 9.255-66

Heritability Examples of heritability Heritability of multiple characters: Rzhetsky et al. (2006) Probing genetic overlap among complex human phenotypes PNAS vol. 104 no. 28 11694–11699 Visscher, Hill and Wray (2008) Heritability in the genomics era — concepts and misconceptions nATurE rEvIEWS | genetics volumE 9.255-66

Protein Interaction Network based model of Interactions 1 2 n PHENOTYPE NETWORK GENOME The path from genotype to genotype could go through a network and this knowledge can be exploited Rhzetsky et al. (2008) Network Properties of genes harboring inherited disease mutations PNAS. 105.11.4323-28 Groups of connected genes can be grouped in a supergene and disease dominance assumed: a mutation in any allele will cause the disease.

PIN based model of Interactions Emily et al, 2009 Single marker association Protein Interaction Network Gene 1 Gene 2 PIN gene pairs are allowed to interact SNP 2 SNP 1 Phenotype i 3*3 table Interactions creates non-independence in combinations

Summary of this lecture G - genetic variation T - transcript levels P - protein concentrations M - metabolite concentrations F – phenotype/phenome Data Models: Networks Hidden Structures/ Processes Evolution Knowledge GF Mapping Concepts GF Mapping General Function Enormous Used for Disease Gene Finding Can Include Biological Knowledge

P – Proteomics Cox and Mann (2007) Is Proteomics the New Genomics? Cell 130,395-99

P – Proteomics Hoog and Mann (2004) “Proteomics” Annu. Rev. Genomics Hum. Genet. 5:267–9 P uses Mass Spectrometry and 2D gel electrophoresis of degraded peptides and Protein Arrays using immuno-recognition of complete proteins http://www.hupo.org/