Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.

Similar presentations


Presentation on theme: "1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software."— Presentation transcript:

1

2 1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software Web addresses

3 2 Why Search Databases? To find out if a new DNA sequence already is deposited in the databanks. To find proteins homologous to a putative coding ORF.

4 3 Why Search Databases? To find similar non-coding DNA stretches in the database, (for example: repeat elements, regulatory sequences). To locate false priming sites for a set of PCR oligonucleotides.

5 4 What Databases Are Available? DNA (nucleotide sequences): The big databases: Genbank, Embl, DDBJ an their weekly updates. These databases exchange information routinely. Genomic databases like the: Human (GDB), Mouse (MGB), Yeast (SGB), etc… Special databases: ESTs (expressed sequence tags) STSs (sequence-tagged sites) EPD (eukaryotic promoter database) REPBASE (repetitive sequence database) and many others.

6 5 What Databases Are Available? Protein (amino acid sequences): The big databases are: Swiss-Prot ( high level of annotation) PIR (protein identification resource) Translated databases like: SPTREMBL (translated EMBL) GenPept (translation of coding regions in GenBank) Special databases like: PDB(sequences derived from the 3D structure Brookhaven PDB)

7 6 Web Addresses http://www.ncbi.nlm.nih.gov/Entrez/ –http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=sear ch&DB=nucleotidehttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=sear ch&DB=nucleotide –http://www.ncbi.nlm.nih.gov/Genbank/GenbankOverview. htmlhttp://www.ncbi.nlm.nih.gov/Genbank/GenbankOverview. html –http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Proteinhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein

8 7 Let us go http://www.ncbi.nlm.nih.gov/Entrez/

9 8 What is GenBank? http://www.ncbi.nlm.nih.gov/Genbank/Genbank Overview.htmlhttp://www.ncbi.nlm.nih.gov/Genbank/Genbank Overview.html GenBank ® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences …

10 9 Access to GenBank http://www.ncbi.nlm.nih.gov/Genbank/GenbankOvervi ew.html.http://www.ncbi.nlm.nih.gov/Genbank/GenbankOvervi ew.html GenBank is available for searching at NCBI via several methods.searching The GenBank database is designed to provide and encourage access within the scientific community to the most up to date and comprehensive DNA sequence information. Therefore, NCBI places no restrictions on the use or distribution of the GenBank data.

11 10 NCBI databases http://www.ncbi.nlm.nih.gov/Database/inde x.html http://www.ncbi.nlm.nih.gov/Database/tut1.html Let us try a tutorial

12 11 Web Addresses http://www.ebi.ac.uk/Databases/ –http://www.ebi.ac.uk/embl/index.htmlhttp://www.ebi.ac.uk/embl/index.html –http://www.ebi.ac.uk/swissprot/index.htmlhttp://www.ebi.ac.uk/swissprot/index.html –http://www.ebi.ac.uk/microarray/ArrayExpress/ arrayexpress.htmlhttp://www.ebi.ac.uk/microarray/ArrayExpress/ arrayexpress.html

13 12 Homology and Analogy It is important to understand a concept that underpins sequence analysis - homology. The term homology is confounded and abused in the literature. Simply, sequences are said to be homologous if they are related by divergence from a common ancestor.

14 13 What Is Homology ? (from the Technion course) Similarity or likeness between properties in species. Before Darwin, homology was defined morphologically: Example:

15 14 Homology  Bats and butterflies fly, but are different.  Bats fly and whales swim, yet the bones in a bat's wing and a whale's flipper are strikingly alike.  Bats and butterflies wings are not homologous.  Bats wings and whales flippers are homologous.

16 15 Homology Interpretation from Darwin to 21st Century Darwin (1859) explained homology as the result of descent with modification from a common ancestor. Modern genetics: Homology information is in the genes. Two sequences are homologous if they are both similar and have a common ancestor.

17 16 When Does Similarity Imply Homology? Similarity by itself is not enough: for example, short sequences similarity could be random (result from different ancestors). Large enough similarities typically imply homology (and usually we do not have direct evidence on descent). Sequence similarity comes with a significance measure.

18 17 Homology and Analogy Understanding homology allows us to appreciate the concept of analogy; this is encountered in protein structures that share similar folds but have no demonstrable sequence similarity; or that share groups of catalytic residues with almost exactly equivalent spatial geometries, but otherwise have neither sequence nor structural similarity. Such relationships are thought to result from convergence to similar biological solutions from different evolutionary starting- points.

19 18 Homology and Analogy The essence of sequence analysis is the inference of homology. Homology is not a measure of similarity, but an absolute statement that sequences have a divergent rather than a convergent relationship. Thus, phrases that quantify homology are meaningless.

20 19 Orthology and Paralogy Homologous proteins may perform the same function in different species (orthologues) or different but related functions within one organism (paralogues). Comparison of orthologues allows study of molecular palaeontology, while paralogues have provided deeper insights into the underlying mechanisms of evolution.

21 20 Orthology and Paralogy Paralogues arose from single genes via successive duplication events. The duplicated genes followed separate evolutionary pathways, and new specificities evolved through variation and adaptation.

22 21 Complete genomes http://www.ncbi.nlm.nih.gov/entrez/query.f cgi?db=Genomehttp://www.ncbi.nlm.nih.gov/entrez/query.f cgi?db=Genome Let us walk around among genomes

23 22 COGs Phylogenetic classification of proteins encoded in complete genomes Clusters of Orthologous Groups of proteins (COGs) were delineated by comparing protein sequences encoded in 43 complete genomes, representing 30 major phylogenetic lineages. Each COG consists of individual proteins or groups of paralogs from at least 3 lineages and thus corresponds to an ancient conserved domain. Proteins from two eukaryotic genomes (Drosophila melanogaster and Caenorhabditis elegans) were assigned to COGs and can be reached from each individual COG page.Drosophila melanogasterCaenorhabditis elegans

24 23 COGs http://www.ncbi.nlm.nih.gov/COG/ Cognitor http://www.ncbi.nlm.nih.gov/COG/xognitor.html COG Help http://www.ncbi.nlm.nih.gov/COG/COGhelp.ht ml#tophttp://www.ncbi.nlm.nih.gov/COG/COGhelp.ht ml#top »FTP ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Mycobacterium_leprae/


Download ppt "1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software."

Similar presentations


Ads by Google