Comparative Genomics of the Eukaryotes

Slides:



Advertisements
Similar presentations
1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Advertisements

Introduction to Genetic Analysis TENTH EDITION Introduction to Genetic Analysis TENTH EDITION Griffiths Wessler Carroll Doebley © 2012 W. H. Freeman and.
Human Genome Project What did they do? Why did they do it? What will it mean for humankind? Animation OverviewAnimation Overview - Click.
Sequence allignement 1 Chitta Baral. Sequences and Sequence allignment Two main kind of sequences –Sequence of base pairs in DNA molecules (A+T+C+G)*
Basics of Comparative Genomics Dr G. P. S. Raghava.
Structural bioinformatics
Jeff Shen, Morgan Kearse, Jeff Shi, Yang Ding, & Owen Astrachan Genome Revolution Focus 2007, Duke University, Durham, North Carolina Introduction.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Alternative splicing and evolution Daniel Jeffares.
Introduction to bioinformatics
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
Similar Sequence Similar Function Charles Yan Spring 2006.
Sequence Alignment III CIS 667 February 10, 2004.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Model Organisms and Databases. Model Organisms Characteristics of model organisms in genetics studies –Genetic history well known –Short life cycle; large.
Topic : Phylogenetic Reconstruction I. Systematics = Science of biological diversity. Systematics uses taxonomy to reflect phylogeny (evolutionary history).
Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.
Introduction to Biological Sequences. Background: What is DNA? Deoxyribonucleic acid Blueprint that carries genetic information from one generation to.
The diversity of genomes and the tree of life
Lecture 1: Introduction Dr. Mamoun Ahram Faculty of Medicine Second year, Second semester, Principles of Genetics and Molecular Biology.
Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Exploration Session Week 8: Computational Biology Melissa Winstanley: (based on slides by Martin Tompa,
Genome projects and model organisms Level 3 Molecular Evolution and Bioinformatics Jim Provan.
Genome of Drosophila species Olga Dolgova UAB Barcelona, 2008.
Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology.
Sequence Analysis Alignments dot-plots scoring scheme Substitution matrices Search algorithms (BLAST)
Biostatistics-Lecture 15 High-throughput sequencing and sequence alignment Ruibin Xi Peking University School of Mathematical Sciences.
Gapped BLAST and PSI-BLAST : a new generation of protein database search programs Team2 邱冠儒 黃尹柔 田耕豪 蕭逸嫻 謝朝茂 莊閔傑 2014/05/12 1.
Protein Sequence Alignment and Database Searching.
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.
Genome Organization and Evolution. Assignment For 2/24/04 Read: Lesk, Chapter 2 Exercises 2.1, 2.5, 2.7, p 110 Problem 2.2, p 112 Weblems 2.4, 2.7, pp.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
GenomesGenomes Chapter 21 Genomes Sequencing of DNA Human Genome Project countries 20 research centers.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
IGEM 101: Session 7 4/2/15Jarrod Shilts 4/5/15Ophir Ospovat.
CISC667, F05, Lec9, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Sequence Database search Heuristic algorithms –FASTA –BLAST –PSI-BLAST.
Construction of Substitution Matrices
© 2015 W. H. Freeman and Company CHAPTER 1 The Genetics Revolution Introduction to Genetic Analysis ELEVENTH EDITION Introduction to Genetic Analysis ELEVENTH.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Sequence Specific DNA Uptake Genetic exchange & bacterial evolution DNA uptake is primitive genetic exchange Some important human pathogens have DNA uptake.
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
Genome Analysis II Comparative Genomics Jiangbo Miao Apr. 25, 2002 CISC889-02S: Bioinformatics.
Comparative genomics Haixu Tang School of Informatics.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Using blast to study gene evolution – an example.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Chapter 1 Introduction.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Classification. Cell Types Cells come in all types of shapes and sizes. Cell Membrane – cells are surrounded by a thin flexible layer Also known as a.
Sequence Alignment.
Construction of Substitution matrices
Step 3: Tools Database Searching
HomologyIf twp proteins are homologous, they have a common fold and a common ancestor If two proteins have >25% identity across their entire length, they.
Eukaryotic genes are interrupted by large introns. In eukaryotes, repeated sequences characterize great amounts of noncoding DNA. Bacteria have compact.
What is sequencing? Video: WlxM (Illumina video) WlxM.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Sequence similarity, BLAST alignments & multiple sequence alignments
Basics of Comparative Genomics
Introduction to Genetics and Biotechnology
Introduction to Genetics and Biotechnology
Genome Annotation Continued
Genomes and Their Evolution
BIOL 2416 Chapter 1: Genetics: An Introduction
Evolution of eukaryote genomes
Basic Local Alignment Search Tool
Basic Local Alignment Search Tool (BLAST)
Basics of Comparative Genomics
Basic Local Alignment Search Tool
Presentation transcript:

Comparative Genomics of the Eukaryotes Ishay Ben-Zion Comparative Genomics of the Eukaryotes A paper by : Rubin, Yandell, Wortman,…

Motivation Evolution – Charles Darwin (1838) Similarity between different species Model organisms A human shares 50% of his genes with a banana. How ? Humans and bananas are multi-cellular Other Similarities Humans share 23% of their genes with Yeast Could banana be a good model organism ?

Model Organisms Heavily Studied – used as examples for other species Once it is studied enough – It is a good candidate Important requirements: Size Generation Time (for genetic research) Manipulation (genetic and not) Little “Junk DNA” (easy for sequencing) Money

This paper describes: A comparison between the genomes of 3 Eukaryotes: Eukaryote – Cell has inner structures with membranes (nucleus) 1) A fruit fly - Drosophila melanogaster 2) A worm – C. elegans 3) Yeast – S. cerevisiae Other model organisms (E. coli, mouse, Zebrafish, Arabidopsis)

Taxonomic classification Cellular life Domain: Bacteria Archaea Eukaryota Kingdom: Animalia Plantae Protista Fungi H. influenzae Fly worm yeast Species:

Drosophila melanogaster Popular model organism (for developmental biology) A trial for the human genome (sequenced at 2000) Easily induce mutations

Caenorhabditis elegans Transparent, 1-mm long Simple – 959 cells (300 neurons) Eat, sleep & have sex (or self-fertilize) Hermaphrodites – 99.95%, Males – 0.05%

Caenorhabditis elegans Good as a model organism for: Genetics: First multi-cellular sequenced genome Developmental biology: cell fate mapping Neurobiology: neurons connectivity map

Saccharomyces cerevisiae Also called Baker’s yeast Single-celled Diameter: 5-10 μ Popular model organism Simplest Eukaryote First Eukaryotic sequenced genome

The 1st comparison Instead of counting genes - count gene families What are gene families ? Paralogs = highly similar proteins in the same genome Similar functionality – but not always Remark: proteins = genes Sets of paralogs

Findings H. influenzae Yeast Fly worm Total # of genes 1700 6200 13,600 18,400 # of gene families 1400 4400 8100 9400 # of duplicates 300 1800 5,500 9000 Size of a family: one or more No. of families – not a good measure for complexity

The 2nd comparison Pool genes of large families of 3 species: For each protein – search for orthologs Orthologs = Similar proteins in other species Among families found in flies and worms (but not yeast): Responsible for multi-cellular development Among families found only in flies: Responsible for immune response and fly specific

Methods – BLAST algorithm Basic Local Alignment Search Tool For comparing biological sequences (to find Homology) Example: Proteins, DNA sequences Query Library of sequences (In the library – sequences of different lengths) In the paper: Paralogy, Orthology - kinds of Homology C G A C G T T C A C G A C G T

BLAST – Step 1 Separate query to k-letter words Example: Proteins – Letters are Amino acids (L=Leucine) Query sequence: RPPQGLF (k=3) 3-letter words: RPP PPQ PQG QGL GLF

Use scoring matrix for two k-letter words BLAST – Step 2 Take one k-letter word – PQG Search library for similar words – LGMCPQA, DPPEGVV Define similarity: High score for 2 words Have common ancestor PQG – PQA : 12 PQG – PEG : 15 Save similar words above a threshold T (save positions) Repeat for all k-letter words in query Use scoring matrix for two k-letter words

BLAST – Step 3 Align at saved positions: - - - R P P Q G L F - - - - - - D P P E G V V - - - Scores: -2 7 7 2 6 1 -1 Extend match right and left for positive score New pairs are called High-scoring Segment Pairs (HSP) Save significant HSPs (above a threshold S) Total: 15 + 7 + 1 = 23

BLAST – Step 4 Align saved HSPs (with gaps) Example: 2 Sequences with 2 HSPs Insert gap Compute total score (involves gap penalties) Report all matches above a threshold E . R P Q G L F T S A M K H Y . D P E G V - M K S F Y N C . D P E G V M K S F Y N C

BLAST – Whole process Separate query to k-letter words Search library for similar k-letter words and save Extend to HSPs and save Align whole sequences and compute total score Return sequences with score above E These are homologous to query

The 3rd comparison Compare all genes of three species with length limitation (80% of length) 20% of the fly appear in worm and yeast They perform functions common to all eukaryotic cells

The 4th comparison Compare all genes of three species to mammalian sequences (without length limitation) 50% of the fly proteins appear in mammals 36% of the worm proteins appear in mammals Fly is closer to mammals Most of mammalian sequences used here were short The similarities reflect conserved domains

What are conserved domains ? Domains – independent parts that construct proteins Appear in different combinations in different proteins Similarity to short sequences Conserved domains Closeness in evolution ABC ADEG

To conclude Significant similarity between genomes of ”distant” species (Man – Yeast 23%) Similarity increases for taxonomically close species ( ) No. of genes or gene families – bad measure for complexity Why ? More information that is not encoded in the genome (Protein interactions – e.g. physical proximity of genes) How to define complexity ?