Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte, Xin Liu & Mark Pletcher.

Slides:



Advertisements
Similar presentations
Genomes and Proteomes genome: complete set of genetic information in organism gene sequence contains recipe for making proteins (genotype) proteome: complete.
Advertisements

Proteomics Examination Yvonne (Bonnie) Eyler Technology Center 1600 Art Unit 1646 (703)
Genome-wide Association Study Focus on association between SNPs and traits Tendency – Larger and larger sample size – Use of more narrowly defined phenotypes(blood.
A Lite Introduction to (Bioinformatics and) Comparative Genomics Chris Mueller August 10, 2004.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Outline to SNP bioinformatics lecture
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Richard, Rochelle, Zohal, Angie
Bioinformatics Student host Chris Johnston Speaker Dr Kate McCain.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
SNP Resources: Finding SNPs, Databases and Data Extraction Debbie Nickerson NIEHS SNPs Workshop.
SNP Resources: Finding SNPs Databases and Data Extraction Mark J. Rieder, PhD Robert J. Livingston, PhD NIEHS Variation Workshop January 30-31, 2005.
Whole Genome Polymorphism Analysis of Regulatory Elements in Breast Cancer AAGTCGGTGATGATTGGGACTGCTCT[C/T]AACACAAGCGAGATGAAGAAACTGA Jacob Biesinger Dr.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
SNP Resources: Finding SNPs Databases and Data Extraction Mark J. Rieder, PhD SeattleSNPs Variation Workshop March 20-21, 2006.
Course Overview Personalized Medicine: Understanding Your Own Genome Fall 2014.
From T. MADHAVAN, & K.Chandrasekaran Lecturers in Zoology.. EXIT.
Selecting TagSNPs in Candidate Genes for Genetic Association Studies Shehnaz K. Hussain, PhD, ScM Assistant Professor Department of Epidemiology, UCLA.
14.3 Studying the Human Genome
Bioinformatics and it’s methods Prepared by: Petro Rogutskyi
Human Genome Project by: Amanda Mosello. What is the Human Genome Project? created in 1990, by the National Institutes of Health and the US Department.
Single Nucleotide Polymorphisms Mrs. Stewart Medical Interventions Central Magnet School.
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
Epidemiology 217 Omics, Bioinformatics, & Resources at UCSF John Witte.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
A Lite Introduction to (Bioinformatics and) Comparative Genomics Chris Mueller November 18, 2004 Based on the Genomics in Biomedical Research course at.
Web Apollo and the VectorBase user community Gloria I. Giraldo-Calderón March 31, 2015.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
Molecular & Genetic Epi 217 Association Studies
CS177 Lecture 10 SNPs and Human Genetic Variation
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
SNP Haplotypes as Diagnostic Markers Shrish Tiwari CCMB, Hyderabad.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
SeattleSNPs Variation Discovery Resource Materials prepared by: Mary E. Mangan, PhD Updated: Q Version 1.
1 of 32 Sequence Variation in Ensembl. 2 of 32 Outline SNPs SNPs in Ensembl Haplotypes & Linkage Disequilibrium SNPs in BioMart HapMap project Strain-specific.
Lecture 9. Functional Genomics at the Protein Level: Proteomics.
Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.
Using a Single Nucleotide Polymorphism to Predict Bitter Tasting Ability Lab Overview.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
GVS: Genome Variation Server Materials prepared by: Warren C. Lathe, PhD Updated: Q Version 2.
Bioinformatics and Computational Biology
Clustering and optimization in genetic data: the problem of Tag-SNPs selection Paola Bertolazzi, Serena D‘ Aguanno, Giovanni Felici *, Paola Festa** *
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
The International Consortium. The International HapMap Project.
Johnson - The Living World: 3rd Ed. - All Rights Reserved - McGraw Hill Companies Genomics Chapter 10 Copyright © McGraw-Hill Companies Permission required.
Motivations to study human genetic variation
Copyright OpenHelix. No use or reproduction without express written consent1.
Chapter 12 Assessment How could manipulating DNA be beneficial?
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Gene Technologies and Human ApplicationsSection 3 Section 3: Gene Technologies in Detail Preview Bellringer Key Ideas Basic Tools for Genetic Manipulation.
COURSE OF BIOINFORMATICS Exam_30/01/2014 A.
Notes: Human Genome (Right side page)
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
The Transcriptional Landscape of the Mammalian Genome
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Consideration for Planning a Candidate Gene Association Study With TagSNPs Shehnaz K. Hussain, PhD, ScM Epidemiology 243: Molecular.
Week 5 Theory and application for setting up an RNA-Seq pipeline
Section 3: Gene Technologies in Detail
Bellwork: What is the human genome project. What was its purpose
14-3 Human Molecular Genetics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
DNA profile – procedure
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Gene Safari (Biological Databases)
Presentation transcript:

Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte, Xin Liu & Mark Pletcher

Questions on Assignment #4? CodingCCCTTT Co-dominant Dominant011 Recessive001 Log Additive012

Post-Genomic Era: Lots of Data!

“The study of genetic and other biological information using computer and statistical techniques.” A Genome Glossary, Science, Feb 16, 2001

Bioinformatics in Genetic Epi Some key aspects: Data management Candidate regions / genes (selection and SNP mining) Genetic Analyses (e.g., genotyping) Statistical Analyses

Data Management 5/20 Demogr. Database Laboratory Database Clinical Database Health and Habits Database Nutritional Database Genomic Database CaP Genes Databases Hub

Selecting Candidate Genes From candidate regions Or simply based on a priori information. Example: –Chrom 7 linkage for aggressive prostate cancer. –Region on chrom 7q –A number of plausible candidates in this region.

From gene to polymorphisms Given a gene, how do I… Find its polymorphisms? Find its polymorphisms? Find information about those polymorphisms? Find information about those polymorphisms?

Hands-on guide for browsing and analyzing genomic data. Contains worked examples, providing: –overview of the types of data available, –details on how these data can be browsed, and –step-by-step instructions for using many of the most commonly- used tools for sequence based discovery.

Nature Genetics: A User's Guide to the Human Genome 3 of the 13 worked example questions How does one find a gene of interest and determine that gene's structure? How would one retrieve the sequence of a gene, along with all annotated exons and introns, as well as a certain number of flanking bases for use in primer design? A user wishes to find all the single nucleotide polymorphisms that lie between two sequence-tagged sites. Do any of these single nucleotide polymorphisms fall within the coding region of a gene? Where can any additional information about the function of these genes be found?

Look for SNPs in Databases General databases: --- dbSNP ( --- UCSC Genome Bioinformatics ( --- HapMap ( --- The SNP consortium (TSC) ( --- Human gene variation base (HGVbase) ( Special databases: --- The UW-FHCRC Variation Discovery Resource (SeattleSNPs) ( --- Cancer Genome Anatomy Project - SNP500Cancer Database ( ( --- InnateImmunity ( --- Drug response ( More….

dbSNP Summary (Build 124, 2005) Reference SNP(rs) # : 10,054,521 5,054,675 validated SNPs (~50%)

dbSNP search by SNP ID

dbSNP search by gene

dbSNP search by ‘limits’

dbSNP search by ‘limits’ (Cont.)

UCSC Browser Provides additional information over dbSNP database Select SNP under the conserved region by comparing sequence across species.

UCSC Browser (Cont.) Comparative Genomics SNPs Gene structure

HapMap Four populations: 1.CEU: CEPH (Utah residents with ancestry from northern and western Europe), 2.HCB: Han Chinese in Beijing, China, 3.JPT: Japanese in Tokyo, Japan, 4.YRI: Yoruba in Ibadan, Nigeria Find htSNP based on the available genotype data

SeattleSNPs Resequencing the complete genomic region of each gene among 24 African-American (AA) subjects and 23 European (CEPH) subjects –2000 bp upstream of first exon –1500 bp downstream of poly-A signal –All exons and introns for genes below 35 kbp Summary data (2/18/05) –Number of genes sequenced: 208 –Total kilobases sequenced: –Number of SNPs found: 23,590 –SNPs in AA sample: 20,765 –SNPs in CEPH sample: 12,937

SeattleSNPs (cont.) It is convenient for tag SNP selection. Limitations: --- gene of interest may not be included in this database --- information is only available for AA and CAU

Summary: selecting SNPs If candidate gene is in SeattleSNP database: –tagSNPs for AA and CAU are available from the website If not, or interested in candidate region: –dbSNP: limit on functional class, validation, heterozygosity, etc. –UCSC: SNPs under the conserved region. –HapMap: htSNPs

5`3`DNA Pre-splicing RNA Post-splicing RNA Protein Exon, non-coding (5`UTR, 3`UTR) Exon, coding Promoter Enhancer Intron Poly-adenilation Anatomy of a gene

From Genomics to Proteomics Our ~ 25,000 genes carry the blueprint for making proteins, of which all living matter is made. Each protein has a particular shape and function that determine its role in the body. Proteomics is the study of protein shape, function, and patterns of expression.

Characterize proteins derived from genetic code Compare variations in their expression levels under different conditions Study their interactions Identify their functional role. Proteomics

Proteome Complexity Recall that genome is relatively static. In contrast, many cellular proteins are continually moving and undergoing changes such as: 1.binding to a cell membrane, 2.partnering with another protein, 3.gaining or losing a chemical group such as a sugar, fat, or phosphate, or 4.breaking into two or more pieces.

Size of Proteome? > 1 Million Proteins >>> 25,000 genes in humans. Large number due to complexity (a given gene can make many different proteins) Features such as folds and motifs, allow them to be categorized into groups and families. This should help make it easier to undertake proteomic research. But no proteome has yet been sequenced.

How to Analyze Proteomes Broad range of technologies Central paradigm: –2-D gel electrophoresis (2D- GE), and mass spectrometry (MS). –2D-GE is used to separate the proteins by isoelectric point and then by size. –MS determines their identity and characteristics.

2-D gel electrophoresis Large mixtures of proteins separated by electrical charge and size. The proteins first migrate through a gel- like substance until they are separated by their charge. They are then transferred to a second semi-solid gel and are separated by size.

Mass spectrometry MS measures two properties: 1.the mass-to-charge ratio (m/z) of a mixture of ions (particles with an electric charge) in the gas phase under vacuum; and 2.the number of ions present at each m/z value. The end product is a mass spectra (chart) with a series of spiked peaks, each representing the ion or charged protein fragment present in a given sample. The height of the peak is related to the abundance of the protein fragment. The size of the peaks and the distance between them are a fingerprint of the sample and provide a clue to its identity.

Bioinformatics in Proteomics Creation and maintenance of databases of protein info. Development of methods to predict the structure and/or function of newly discovered proteins and structural RNA sequences. Clustering protein sequences into families of related sequences and the development of protein models. Aligning similar proteins and generating phylogenetic trees to examine evolutionary relationships

Use of Proteomics in Clinical Research Example from Mark Pletcher

Final Project Describe a genetic, molecular, pharmacogenomic, or proteomic project that you feel would be worth undertaking. I.e., based on the current state of the literature (from your reviews). One page description, due March 7 th by to me minute presentation (informal) in class on March 8 th.