Basics of Comparative Genomics Dr G. P. S. Raghava.

Slides:



Advertisements
Similar presentations
Nothing in (computational) biology makes sense except in the light of evolution after Theodosius Dobzhansky (1970)
Advertisements

1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Phylogenetics workshop: Protein sequence phylogeny week 2 Darren Soanes.
Orthology, paralogy and GO annotation Paul D. Thomas SRI International.
Phylogenetic reconstruction
Comparative genomics Joachim Bargsten February 2012.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
Xenolog: Homologs resulting from horizontal gene transfer.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Some basics: Homology = refers to a structure, behavior, or other character of two taxa that is derived from the same or equivalent feature of a common.
Bioinformatics and Phylogenetic Analysis
What you should know by now Concepts: Pairwise alignment Global, semi-global and local alignment Dynamic programming Sequence similarity (Sum-of-Pairs)
"Nothing in biology makes sense except in the light of evolution" Theodosius Dobzhansky.
Protein Modules An Introduction to Bioinformatics.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Bioinformatics Genome anatomy Comparisons of some eukaryotic genomes Allignment of long genomic sequences Comparative genomics Oxford Grid Reconstruction.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Comparative Genomics of the Eukaryotes
Genome projects and model organisms Level 3 Molecular Evolution and Bioinformatics Jim Provan.
Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology.
Pairwise Alignments Part 1 Biology 224 Instructor: Tom Peavy Sept 8
Protein Bioinformatics Course
Protein Evolution and Sequence Analysis Protein Evolution and Sequence Analysis.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
Sequence Alignment Techniques. In this presentation…… Part 1 – Searching for Sequence Similarity Part 2 – Multiple Sequence Alignment.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.
Genome Organization and Evolution. Assignment For 2/24/04 Read: Lesk, Chapter 2 Exercises 2.1, 2.5, 2.7, p 110 Problem 2.2, p 112 Weblems 2.4, 2.7, pp.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
1 The Interrupted Gene. Ex Biochem c3-interrupted gene Introduction Figure 3.1.
ANALYSIS AND VISUALIZATION OF SINGLE COPY ORTHOLOGS IN ARABIDOPSIS, LETTUCE, SUNFLOWER AND OTHER PLANT SPECIES. Alexander Kozik and Richard W. Michelmore.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
You have worked for 2 years to isolate a gene involved in axon guidance. You sequence the cDNA clone that contains axon guidance activity. What do you.
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
Protein and RNA Families
Genome Analysis II Comparative Genomics Jiangbo Miao Apr. 25, 2002 CISC889-02S: Bioinformatics.
Genomic and comparative genomic analysis BIO520 BioinformaticsJim Lund.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
Comparative genomics Haixu Tang School of Informatics.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Using blast to study gene evolution – an example.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Bioinformatics and Computational Biology
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Phylogeny & Systematics
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
HomologyIf twp proteins are homologous, they have a common fold and a common ancestor If two proteins have >25% identity across their entire length, they.
Eukaryotic genes are interrupted by large introns. In eukaryotes, repeated sequences characterize great amounts of noncoding DNA. Bacteria have compact.
E VOLUTION OF E UKARYOTIC G ENOMES G ENE 342 Lecture 13 – Comparative genomics.
Phylogeny and the Tree of Life
Sequence similarity, BLAST alignments & multiple sequence alignments
BLAST program selection guide
Basics of Comparative Genomics
Comparative Genomics.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Evolution of eukaryote genomes
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Phylogeny and Systematics
What do you with a whole genome sequence?
Basic Local Alignment Search Tool
BSC1010: Intro to Biology I K. Maltz Chapter 21.
Pairwise Sequence Alignment
Bioinformatics Lecture 2 By: Dr. Mehdi Mansouri
Basics of Comparative Genomics
Basic Local Alignment Search Tool
Presentation transcript:

Basics of Comparative Genomics Dr G. P. S. Raghava

n n AIM : To understand Biology of Organisms n n Importance: More than 100 genomes sequenced, more than 250 in progress n n Definition: Comparison of set of proteins of one genome to another genome + comparision of gene location, gene order and gene regulation n n Application – –Visualization of information on genome – –Genome annotation (Prediction of gene, repeats, regulation region) – –Evolutionary information (gene loss, duplication, horizontal gene transfer, ancestor) – –Essential genes for cell survival – –Classification of genes based on function n n Tools and Databases

What is comparative genomics? n Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease n Understand the uniqueness between different species

Why Comparative Genomics ? n It tells us what are common and what are unique between different species at the genome level. n Genome comparison may be the surest and most reliable way to identify genes and predict their functions and interactions. – e.g., to distinguish orthologs from paralogs – e.g., to distinguish orthologs from paralogs n The functions of human genes and other DNA regions can be revealed by studying their counterparts in lower organisms.

What is compared? n Gene location n Gene structure –Exon number –Exon lengths –Intron lengths –Sequence similarity n Gene characteristics –Splice sites –Codon usage –Conserved synteny

Few facts from genome comparision n High degree of conservation of microbial proteins (~70% ancestral conserved region) n Protein related with ENERGY process are generally found all genomes n Proteins related to COMMUNICATION repersent repersent most distinctive function in each genome n INFORMATION related protein have complex behaviour n High frequence (~10%) non-orthologous gene displacement

Few Terminologies n Homology :- Homology is the relationship of any two characters ( such as two proteins that have similar sequences ) that have descended, usually through divergence, from a common ancestral character. Homologues are thus components or characters (such as genes/proteins with similar sequences) that can be attributed to a common ancestor of the two organisms during evolution.

Homologoues can either be orthologues xenologues, paralogues or. n Orthologues are homologues that have evolved from a common ancestral gene by speciation. They usually have similar functions. n Paralogues are homologues that are related or produced by duplication within a genome followed by subsequent divergence. They often have different functions. n Xenologues are homologous that are related by an interspecies (horizontal transfer) of the genetic material for one of the homologues. The functions of the xenologues are quite often similar.

Analogues n Analogues are non-homologues genes/proteins that have descended convergently from an unrelated ancestor. They have similar functions although they are unrelated in either sequence or structure.

Frequently used terms n Homology –Orthologous: Common ancestral gene. They usually have similar functions –Paralogous: duplication of gene within genome have usually different functions –Xenologous: That are related by an interspecies (horizontal gene transfer) of the genetic material, have similar function n Analogous: Not evolve from same ancestor n Similarity: sequence similarity n Percent Identitity

Visualising Genome Information

Genome Annotation The Process of Adding Biology Information and Predictions to a Sequenced Genome Framework

All-against-all Self-comparison n How? –Making a database of the proteome –Use each protein as a query in a similarity search against the database (BLAST, WU-BLAST or FASTA) (BLAST, WU-BLAST or FASTA) –Generate a matrix of alignment scores (P or E value) : A conservative cutoff E value : 10e-6 n Why? –Number of Gene Families This comparison distinguishes unique proteins from proteins arisen from gene duplication, and also reveals the # of gene families. –Paralogs Significantly matched pairs of protein sequences may be paralogs.

Between-Proteome Comparisons : Why? n To identify orthologs, gene families, and domains n Orthologs: (proteins that share a common ancestry & function) –A pair of proteins in two organisms that align along most of their lengths with a highly significant alignment score. –These proteins perform the core biological functions shared by the two organisms. –Two matched sequences (X in A, Y in B) may not be orthologs (Y and Z are paralogs in B, X and Z are orthologs) –Identify true orthologs (a) highest-scoring match (best hit) (b) E value < 0.01 (c) > 60% alignment over both proteins

Between-Proteome Comparisons: How? 1. Choose a yeast protein and perform a database similarity search of the worm proteome (WU-BLAST): a yeast-versus-worm search 2. Group the worm seqs that match the yeast query seq with a high P value ( to ), also include the yeast query seq in the group 3. From the group made in 2, choose a worm seq and make a search of the yeast proteome, using the same P limit 4. Add any matching yeast seq to the group made in 2 5. Repeat 3 & 4 for all initially matched seqs in the group 6. Repeat 1-5 for every yeast protein 7. As 1-6, perform a comparable worm-versus-yeast search 8. Coalesce the groups of related seqs. and remove any redundancies so that every sequence is represented only once. 9. Eliminate any matched pairs in which less than 80% of each seq is in the alignment

Figure 1 Regions of the human and mouse homologous genes: Coding exons (white), noncoding exons (gray}, introns (dark gray), and intergenic regions (black). Corresponding strong (white) and weak (gray) alignment regions of GLASS are shown connected with arrows. Dark lines connecting the alignment regions denote very weak or no alignment. The predicted coding regions of ROSETTA in human, and the corresponding regins in mouse, are shown (white) between the genes and the alignment regions.

Target Validation n Target validation involves taking steps to prove that a DNA, RNA, or protein molecule is directly involved in a disease process and is therefore a suitable target for development of a new therapeutic compound. n Genes that do not belong to an established family are critical to many disease processes and also need to be validated as potential drug targets.