A Lite Introduction to (Bioinformatics and) Comparative Genomics Chris Mueller August 10, 2004.

Slides:



Advertisements
Similar presentations
Parsimony Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
The National Center for Biotechnology Information (NCBI) a primary resource for molecular biology information Database Resources.
Phylogenetic Trees Understand the history and diversity of life. Systematics. –Study of biological diversity in evolutionary context. –Phylogeny is evolutionary.
Computational biology and computational biologists Tandy Warnow, UT-Austin Department of Computer Sciences Institute for Cellular and Molecular Biology.
Outline to SNP bioinformatics lecture
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
Structural bioinformatics
Review of cladistic technique Shared derived (apomorphic) traits are useful in understanding evolutionary relationships Shared primitive (plesiomorphic)
Predicting the Function of Single Nucleotide Polymorphisms Corey Harada Advisor: Eleazar Eskin.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Introduction to Bioinformatics Spring 2008 Yana Kortsarts, Computer Science Department Bob Morris, Biology Department.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Using Bioinformatics to Make the Bio- Math Connection The Confessions of a Biology Teacher.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Bioinformatics and Phylogenetic Analysis
Summer Bioinformatics Workshop 2008 Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University.
Genome Browsers Ensembl (EBI, UK) and UCSC (Santa Cruz, California)
How to access genomic information using Ensembl August 2005.
BI420 – Course information Web site: Instructor: Gabor Marth Teaching.
The Central Dogma of Molecular Biology (Things are not really this simple) Genetic information is stored in our DNA (~ 3 billion bp) The DNA of a.
Phylogenetic Shadowing Daniel L. Ong. March 9, 2005RUGS, UC Berkeley2 Abstract The human genome contains about 3 billion base pairs! Algorithms to analyze.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Bioinformatics Original definition (1979 by Paulien Hogeweg): “application of information technology and computer science to the field of molecular biology”
Incorporating Bioinformatics in an Algorithms Course Lawrence D’Antonio Ramapo College of New Jersey.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
A Lite Introduction to (Bioinformatics and) Comparative Genomics Chris Mueller November 18, 2004 Based on the Genomics in Biomedical Research course at.
SNP Haplotypes as Diagnostic Markers Shrish Tiwari CCMB, Hyderabad.
Bioinformatics: Theory and Practice – Striking a Balance (a plea for teaching, as well as doing, Bioinformatics) Practice (Molecular Biology) Theory: Central.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Organizing information in the post-genomic era The rise of bioinformatics.
Korea BioInformation Center Byoung-Chul Kim
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
Calculating branch lengths from distances. ABC A B C----- a b c.
ARE THESE ALL BEARS? WHICH ONES ARE MORE CLOSELY RELATED?
Predicting protein degradation rates Karen Page. The central dogma DNA RNA protein Transcription Translation The expression of genetic information stored.
Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
EB3233 Bioinformatics Introduction to Bioinformatics.
Bioinformatics and Computational Biology
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
The Future of Genetics Research Lesson 7. Human Genome Project 13 year project to sequence human genome and other species (fruit fly, mice yeast, nematodes,
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
Introduction to Bioinformatics Summary Thomas Nordahl Petersen.
각종 생물정보 분석도구 의 실무적 활용 및 실습 김형용 개발팀 Insilicogen, Inc.
BLAST: Basic Local Alignment Search Tool Robert (R.J.) Sperazza BLAST is a software used to analyze genetic information It can identify existing genes.
Bioinformatics Overview
Introduction to Bioinformatics Resources for DNA Barcoding
Gil McVean Department of Statistics
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
University of Pittsburgh
Methods of molecular phylogeny
Genomes and Their Evolution
Overview Bioinformatics: Analyzing biological data using statistics, math modeling, and computer science BLAST = Basic Local Alignment Search Tool Input.
Genome organization and Bioinformatics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
LESSON 1 INTNRODUCTION HYE-JOO KWON, Ph.D /
Introduction to Bioinformatic
BSC1010: Intro to Biology I K. Maltz Chapter 21.
Gene Safari (Biological Databases)
Problems from last section
Introduction to Bioinformatics
Presentation transcript:

A Lite Introduction to (Bioinformatics and) Comparative Genomics Chris Mueller August 10, 2004

Biology Evolution –Species change over time by the process of natrual selection Molecular Biology Central Dogma –DNA is transcribed to RNA which is translated to proteins –Proteins are the machinery of life –DNA is the agent of evolution Key idea: Protein and RNA structure determines function

Genome Stats

Comparative Genomics Analyze and compare genomes from different species Goals –Understand how species evolved –Determine function of genes, regulatory networks, and other non-coding areas of genomes

Tools Public Databases –NCBI: clearing house for all data related to genomes Genomes, Genes, Proteins, SNPs, ESTs, Taxonomy, etc –TIGR: hand curated database Analysis Software –Database “query” (find similar sequences), alignment algorithms, family id (clustering), gene prediction, repeat finding, experimental design, etc –Expect for query routines, these are generally not accessible to biologists. Instead, results are made available via databases and browsers Browsers –Genome: Ensembl, MapViewer –Comparative Genomics: VISTA, UCSC –Can query on location, gene name, everyone plays together!

Queries and Alignments Find matches between genomes “Queries” find local alignments for a gene or other short sequence Global alignments attempt to optimally align complete sequences –“Indels” are insertions/deletions that help construct alignments: AGGATGAGCCAGATAGGA---ACCGATTACCGGATAGC ||||||| ||||||||| ||||||||||||||||| AGGATGA-CCAGATAGGAGTGACCGATTACCGGATAGC

Application: Phylogenetic Analysis Determine the evolutionary tree for sequences, species, genomes, etc Theory: natural selection, genetic drift Traditionally done with morphology Techniques –Model substitution rates Statistical models based on empherically derived scores Works well for proteins, but is difficult for DNA –Phylogenetic reconstruction Distance metrics* Parsimony (fewest # of subs wins)* Maximim likelihood Based on Jim Noonan’s (LBNL) talk *No evolutionary justification!

Example Porpoise AGGATGACCAGATAGGAGTGACCGATTACCGGATAGC Beluga AGGATGACCAGATAGGAGTGACCGATTACCGGATAGC Sperm AGGATGACCAGATAGGAGTGACCGATTACGGGATAGC Fin AGGATGACCAGATAGGAGTGACCGATTA---GATAGC Sei AGGATGACCAGATAGGAGTGACCGATTA---GATAGC Cow AGGATGACCAGATAGGAGTGACCGATTACCGGATAGC Giraffe AGGATGACCAGATAGGAGTGACCGATTACCGGATAGC What is the evolutionary tree for whales?

Application: Phenotyping Using SNPs SNP: Single Nucleotide Polymorphism - change in one base between two instances of the same gene Used as genetic flags to identify traits, esp. for genetic diseases CG goal: Identify as many SNPs as possible Challenges –Data: need sequenced genomes from many humans along with information about the donors –Need tools for mining the data to identify phenotypes dbSNP is an uncurrated repository of SNPs (many are misreported) (this was the one talk from industry) Based on Kelly Frazer’s talk

Application: Fishing the Genome Look for highly conserved regions across multiple genomes and study these first Only 1-2% of the genome is coding, need a way to narrow the search Driving Principle: regions are conserved for a reason! Based on Marcelo Nobrega’s talk

(VISTA Plot of SALL1 Human- Mouse-Chicken-Fugu)

Chomosome 16 Enhancer Browser Find conserved regions between genes in human fugu (pufferfish) alignments and systematically study them SALL1 0 bp500 Mbp

CS Challenges “Engineering” –Scalability! (nothing really scales well right now) –Stability! (Interactive apps crash way too often) –Timeliness of data –Biologists don’t use Unix! (and the Web is not the answer) –Better/faster algorithms –Interoperability among tools and better analysis tools It’s hard for biologists to use their own data with existing tools “Basic” –Automated curation, error checking –Computational models that biologists can trust –Structure/Function algorithms (this really is the grail) Education! (both ways)