A Lite Introduction to (Bioinformatics and) Comparative Genomics Chris Mueller November 18, 2004 Based on the Genomics in Biomedical Research course at.

Slides:



Advertisements
Similar presentations
Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
A Lite Introduction to (Bioinformatics and) Comparative Genomics Chris Mueller August 10, 2004.
Phylogenetic Trees Understand the history and diversity of life. Systematics. –Study of biological diversity in evolutionary context. –Phylogeny is evolutionary.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Computational biology and computational biologists Tandy Warnow, UT-Austin Department of Computer Sciences Institute for Cellular and Molecular Biology.
Outline to SNP bioinformatics lecture
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
Structural bioinformatics
Predicting the Function of Single Nucleotide Polymorphisms Corey Harada Advisor: Eleazar Eskin.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
JYC: CSM17 BioinformaticsCSM17 Week 10: Summary, Conclusions, The Future.....? Bioinformatics is –the study of living systems –with respect to representation,
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Bioinformatics and Phylogenetic Analysis
Genome Browsers Ensembl (EBI, UK) and UCSC (Santa Cruz, California)
How to access genomic information using Ensembl August 2005.
BI420 – Course information Web site: Instructor: Gabor Marth Teaching.
The Central Dogma of Molecular Biology (Things are not really this simple) Genetic information is stored in our DNA (~ 3 billion bp) The DNA of a.
Phylogenetic Shadowing Daniel L. Ong. March 9, 2005RUGS, UC Berkeley2 Abstract The human genome contains about 3 billion base pairs! Algorithms to analyze.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Incorporating Bioinformatics in an Algorithms Course Lawrence D’Antonio Ramapo College of New Jersey.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
Login: BITseminar Pass: BITseminar2011 Login: BITseminar Pass: BITseminar2011.
Bioinformatics Timothy Ketcham Union College Gradutate Seminar 2003 Bioinformatics.
Positional cloning: the rest of the story a a a a a a a a X.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
Bioinformatics: Theory and Practice – Striking a Balance (a plea for teaching, as well as doing, Bioinformatics) Practice (Molecular Biology) Theory: Central.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Organizing information in the post-genomic era The rise of bioinformatics.
Korea BioInformation Center Byoung-Chul Kim
Calculating branch lengths from distances. ABC A B C----- a b c.
ARE THESE ALL BEARS? WHICH ONES ARE MORE CLOSELY RELATED?
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
Condor: BLAST Rob Quick Open Science Grid Indiana University.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
EB3233 Bioinformatics Introduction to Bioinformatics.
Bioinformatics and Computational Biology
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
The Future of Genetics Research Lesson 7. Human Genome Project 13 year project to sequence human genome and other species (fruit fly, mice yeast, nematodes,
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
각종 생물정보 분석도구 의 실무적 활용 및 실습 김형용 개발팀 Insilicogen, Inc.
Bioinformatics Overview
Introduction to Bioinformatics Resources for DNA Barcoding
Introduction to Bioinformatics and Functional Genomics
Gil McVean Department of Statistics
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
University of Pittsburgh
Access to Sequence Data and Related Information
Genomes and Their Evolution
Genome organization and Bioinformatics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
LESSON 1 INTNRODUCTION HYE-JOO KWON, Ph.D /
Introduction to Bioinformatic
BSC1010: Intro to Biology I K. Maltz Chapter 21.
Gene Safari (Biological Databases)
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Problems from last section
Introduction to Bioinformatics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

A Lite Introduction to (Bioinformatics and) Comparative Genomics Chris Mueller November 18, 2004 Based on the Genomics in Biomedical Research course at the Berkekly PGA

Biology Evolution –Species change over time by the process of natrual selection Molecular Biology Central Dogma –DNA is transcribed to RNA which is translated to proteins –Proteins are the machinery of life –DNA is the agent of evolution Key idea: Protein and RNA structure determines function

Genome Stats

Comparative Genomics Analyze and compare genomes from different species Goals –Understand how species evolved –Determine function of genes, regulatory networks, and other non-coding areas of genomes

Tools Public Databases –NCBI: clearing house for all data related to genomes Genomes, Genes, Proteins, SNPs, ESTs, Taxonomy, etc –TIGR: hand curated database Analysis Software –Database “query” (find similar sequences), alignment algorithms, family id (clustering), gene prediction, repeat finding, experimental design, etc –Expect for query routines, these are generally not accessible to biologists. Instead, results are made available via databases and browsers Browsers –Genome: Ensembl, MapViewer –Comparative Genomics: VISTA, UCSC –Can query on location, gene name, everyone plays together!

Browser Links UCSC Genome Browser – VISTA – Map Viewer – Ensembl – (try using each one to find your favorite gene)

Queries and Alignments Find matches between genomes “Queries” find local alignments for a gene or other short sequence Global alignments attempt to optimally align complete sequences –“Indels” are insertions/deletions that help construct alignments: AGGATGAGCCAGATAGGA---ACCGATTACCGGATAGC ||||||| ||||||||| ||||||||||||||||| AGGATGA-CCAGATAGGAGTGACCGATTACCGGATAGC

Large Genome Alignments LAGAN MLAGAN Shuffle LAGAN

Application: Phylogenetic Analysis Determine the evolutionary tree for sequences, species, genomes, etc Theory: natural selection, genetic drift Traditionally done with morphology Techniques –Model substitution rates Statistical models based on empherically derived scores Works well for proteins, but is difficult for DNA –Phylogenetic reconstruction Distance metrics* Parsimony (fewest # of subs wins)* Maximim likelihood Based on Jim Noonan’s (LBNL) talk *No evolutionary justification!

Example Porpoise AGGATGACCAGATAGGAGTGACCGATTACCGGATAGC Beluga AGGATGACCAGATAGGAGTGACCGATTACCGGATAGC Sperm AGGATGACCAGATAGGAGTGACCGATTACGGGATAGC Fin AGGATGACCAGATAGGAGTGACCGATTA---GATAGC Sei AGGATGACCAGATAGGAGTGACCGATTA---GATAGC Cow AGGATGACCAGATAGGAGTGACCGATTACCGGATAGC Giraffe AGGATGACCAGATAGGAGTGACCGATTACCGGATAGC What is the evolutionary tree for whales?

Application: Phenotyping Using SNPs SNP: Single Nucleotide Polymorphism - change in one base between two instances of the same gene Used as genetic flags to identify traits, esp. for genetic diseases CG goal: Identify as many SNPs as possible Challenges –Data: need sequenced genomes from many humans along with information about the donors –Need tools for mining the data to identify phenotypes dbSNP is an uncurrated repository of SNPs (many are misreported) (this was the one talk from industry) Based on Kelly Frazer’s talk

Application: Fishing the Genome Look for highly conserved regions across multiple genomes and study these first Only 1-2% of the genome is coding, need a way to narrow the search Driving Principle: regions are conserved for a reason! Based on Marcelo Nobrega’s talk

(VISTA Plot of SALL1 Human- Mouse-Chicken-Fugu)

Chomosome 16 Enhancer Browser Find conserved regions between genes in human fugu (pufferfish) alignments and systematically study them SALL1 0 bp500 Mbp

DOE Joint Genome Institute “Industrialized” genomics –High throughput genomic sequencing –Technology development –Computational Genomics –Functional Genomics Model: Partner with researchers to on sequencing and technology projects All data freely available – (or, this stuff is cool, sign me up!)

CS Challenges “Engineering” –Scalability! (nothing really scales well right now) –Stability! (Interactive apps crash way too often) –Timeliness of data –Biologists don’t use Unix! (and the Web is not the answer) –Better/faster algorithms –Interoperability among tools and better analysis tools It’s hard for biologists to use their own data with existing tools “Basic” –Automated curation, error checking –Computational models that biologists can trust –Structure/Function algorithms (this really is the grail) Education! (both ways)