BINF6201/8201: Molecular Sequence Analysis Dr. Zhengchang Su Office: 351 Bioinformatics Building Office hours: Tuesday and Thursday:

Slides:



Advertisements
Similar presentations
1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Advertisements

NCBI data, sliding window programs and dot plots Sept. 25, 2012 Learning objectives-Become familiar with OMIM and PubMed. Understand the difference between.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Chapter 19 Evolutionary Genetics 18 and 20 April, 2004
Orthology, paralogy and GO annotation Paul D. Thomas SRI International.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Types of homology BLAST
Molecular Evolution Revised 29/12/06
What Is Genomics? Genomics is the study of how the entire genome of a species functions as a unit and evolves over time. It is the study of life’s blueprint,
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
5.4 Cladistics Nature of science:
Xenolog: Homologs resulting from horizontal gene transfer.
Sequence Similarity Searching Class 4 March 2010.
Some basics: Homology = refers to a structure, behavior, or other character of two taxa that is derived from the same or equivalent feature of a common.
Bioinformatics and Phylogenetic Analysis
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
Utilizing Fuzzy Logic for Gene Sequence Construction from Sub Sequences and Characteristic Genome Derivation and Assembly.
The diversity of genomes and the tree of life
Microbial Genomes Features Analysis Role of high-throughput sequencing Yeast - the eukaryotic model microbe Databases –TIGR CMR –NCBI Microbial Genomes.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Pairwise & Multiple sequence alignments
Bioinformatics.
Development of Bioinformatics and its application on Biotechnology
Protein Evolution and Sequence Analysis Protein Evolution and Sequence Analysis.
The Evolutionary History of Biodiversity
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.
AIM: How do comparative studies help trace evolution?
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
Copyright © 2010 Pearson Education Inc. Lecture 01 – Genetics & Genomics: An Introduction Based on Chapter 1 – Genetics: An introduction.
Evolutionary Biology Concepts Molecular Evolution Phylogenetic Inference BIO520 BioinformaticsJim Lund Reading: Ch7.
REMINDERS 2 nd Exam on Nov.17 Coverage: Central Dogma of DNA Replication Transcription Translation Cell structure and function Recombinant DNA technology.
Chapter 24: Molecular and Genomic Evolution CHAPTER 24 Molecular and Genomic Evolution.
+ => Bioinformatics: from Sequence to Knowledge Outline: Introduction to bioinformatics The TAU Bioinformatics unit Useful bioinformatics issues and databases:
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Using blast to study gene evolution – an example.
Algorithms for Biological Sequence Analysis Kun-Mao Chao ( 趙坤茂 ) Department of Computer Science and Information Engineering National Taiwan University,
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Phylogeny & Systematics
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Building Phylogenies Maximum Likelihood. Methods Distance-based Parsimony Maximum likelihood.
Phylogeny and the Tree of Life
Phylogeny and the Tree of Life
BLAST program selection guide
Basics of Comparative Genomics
Molecular Phylogeny Similarity among organisms (and their genes) is the result of descent from a common ancestor. Variation occurs via genetic drift and.
Pipelines for Computational Analysis (Bioinformatics)
생물정보학 Bioinformatics.
Higher Biology Genomic Sequencing Mr G R Davidson.
Algorithms for Biological Sequence Analysis
There are four levels of structure in proteins
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
What do you with a whole genome sequence?
BSC1010: Intro to Biology I K. Maltz Chapter 21.
Pairwise Sequence Alignment
Structural evidence: Embryonic similarities Vestigial organs
Bioinformatics Lecture 2 By: Dr. Mehdi Mansouri
Chapter 26- Phylogeny and Systematics
Unit Genomic sequencing
Basics of Comparative Genomics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Introduction to Bioinformatics
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Presentation transcript:

BINF6201/8201: Molecular Sequence Analysis Dr. Zhengchang Su Office: 351 Bioinformatics Building Office hours: Tuesday and Thursday: 2:00~3:00pm

Textbook and reading materials  Textbook: Bioinformatics and Molecular Evolution by Paul G. Higgins and Teresa K. Attwood, Blackwell Publishing,  Additional readings from the current literature may be assigned as appropriate  All lecture slices will be available on line at

 Weekly or bi-weekly homework assignments, Ph.D students may have additional assignments (30%).  Two midterm exams (60%): 10/5(Tuesday) and 12/14 (Tuesday)  Classroom participation will count for 10% of the grade. Students Evaluation

Sequence data explosions  Three almost equivalent biological sequence databases International Sequence Database Collaboration 1.GenBank at NCBI 2.European Molecular Biology Laboratory (EMBL) Nucleotide Sequence Database at European Bioinformatics Institute (EBI) 3.DNA database of Japan (DDBI)  Features 1.All published biological sequences are requested to be deposited in the one of these three databases; 2.Data are exchanged among these three databases on a daily basis.

Data explosions  Both the number/length of sequences and number of transistors in a CPU increase exponentially with the time.  However the number/length of sequences increases even faster than the number of transistors in a CPU. (t)(t) lnN(t)

Sequence data explosions are the result of the continuous development of new sequencing technologies:  Chain termination (Sanger) method (1977)  Automation of sequence determination (late 1980s)  Shotgun sequencing strategy (1995)  NexGen sequencing technologies (2004) pyrosequencing: 454 Life Sciences/Roche Diagnostics 2. Solexa sequencing: Illumina 3. SOLiD sequencing: Applied Biosystems 4. Helico BioSciences: 5. Pacific Biosciences: 6. Polonator: open source

Data explosions  Since 1995, the number of sequenced genomes also increases exponentially. As of

Data explosions :  Since 2006, the number of meta-genome sequences increases exponentially thanks to the advent of next-generation sequencing technologies.  In September, 2009, about 200 meta-genomes are sequenced or are in the process of sequencing.

Data explosions  The speed of computers also increase exponentially with the time.  However, how can we use the ever powerful computers to solve biological problems is a very challenging task for computer science and biology research communities.

Data explosions  More and more biological researches use computational analyses.

Microarray/RNA-Seq: transcriptomics Mass spectrometry: Proteomics Nucleus magnetic resonance (MR) and mass spectrometry: Metabolomics What is genomics?  The availability of whole genome sequences of organisms has led to the birth of Genomics that studies the organisms based on the genetic information encoded in the genomes.  According to the subjects of the study, genomics can be divided into: 1. Functional genomics, which is coupled with the development of relevant high-throughput technologies, such as, 2. Comparative/evolutionary genomics

What is Bioinformatics?  For a short answer: “Bioinformatics is the use of computational methods to study biological data and problems”.  For a more detailed answer: Bioinformatics is 1.“The development and use of computational methods for studying the structure, function, and evolution of genes, proteins and whole genomes;” 2.“The development and use of methods for the management and analysis of biological information arising from genomics and high-throughput experiments.”

Population genetics, molecular evolution and sequence analysis  According to the evolutionary theory, biological sequences are related to one another through heredity and variation;  Sequence analysis methods are thus based on the principles of the evolution of sequences.  Therefore, to analyze sequences, we must understand 1.the dynamics changes of genes (loci) in a population of the same species— population genetics; and 2.how the gene sequences change during the course of evolution among different species — molecular evolution.

Sequence Similarity  The similarity of two sequences can be identified by aligning the two sequences using an alignment method/algorithm, such as the BLAST or Smith-Waterman method/algorithm.  Two parameters to describe the similarity of two sequences 1. Identity 2. Similarity Identities = 38/139 (27%), Similarity = 66/139 (47%), Gaps = 9/139 (6.5%) LELTYIVNFGSELAVVSMLPTFFETTFDLPKATAGILASCFAFVNLVARPAGGLISDSVG + Y + FG +A + LPT+ T + AG + FA ++ARP GG +SD + MSFLYAIVFGGFVAFSNYLPTYITTIYGFSTVDAGARTAGFALAAVLARPVGGWLSDRIA SRKNTMGFLTAGLGVGYLVMSMIKPGTFTGTTGIAVAVVITMLASFFVQSGEGATFALVP R + L + + P ++ T I +AV + + G G FA V PRHVVLASLAGTALLAFAAALQPPPEVWSAATFITLAVCLGV GTGGVFAWVA -LVKRRVTGQVAGLVGAYGNVG G V G+V A G +G RRAPAASVGSVTGIVAAAGGLG

Homologous Sequence  Homology: If the similarity of the two sequences are high enough, it is highly likely that they have evolved from a common ancestor, and we say that they are homologous to each other. For example, if two sequences of 100 amino acids have 80% of identical residuals, the probability by chance that the two sequences share this level of similarity is (1/20) 80.  Homology of two sequences can only be inferred computationally, but is difficult to be tested experimentally.

Orthologs and Paralogs There are two distinct types of homologous relationships, which differ in their evolutionary history and functional implications. Orthologs: Evolutional counterparts derived from a single ancestral gene in the last common ancestor of the given two species. Therefore, orthologous genes are related due to vertical evolution. Orthologous genes typically have the same function. Paralogs: homologous genes evolved through duplication within the same or ancestral genome. Therefore, paralogous genes are related due to duplication events. Paralogous genes do not necessary have the same function. duplication speciation

 When the similarity between two sequences are very low, say, 8% identity, then they could be still homologous due to divergent evolution;  Divergently evolved genes usually have similar biochemical functions. Speciation or duplication homologues Divergence evolution

 When the similarity between two sequences are very low, say, 8%, they could be of difference origin, and the observed sequence similarity is due to convergent evolution under functional selection during the course of evolution. These two sequences are called analogues. analogues  Analogues may have similar biochemical functions, and they usually only share several amino acids in the active site of enzymes, called motifs. Convergence evolution

Horizontal gene transfer (HGT)  During evolution, a progeny obtains its genes from its ancestor (vertical gene transfer), however, it also can obtain genes from other species, genera, or even taxa. This phenomenon is called horizontal gene transfer or lateral gene transfer.  HGT is very pervasive, in particular, in prokaryote, and is believed to be a major driving force for evolution. Archaea Bacteria Eukaryota Vertical gene transfer Horizontal gene transfer LCA (Last common ancestor)