Chani & Malki present: Project adviser: Dr. Ron Wides The OdzFinder.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
First release of HOGENOM, a database of homologous genes from complete genome Equipe Bioinformatique et Génomique Evolutive Laboratoire de Biométrie et.
Structural bioinformatics
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Sequence similarity (II). Schedule Mar 23midterm assignedalignment Mar 30midterm dueprot struct/drugs April 6teams assignedprot struct/drugs April 13RNA.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
Tutorial 5 Motif discovery.
Alternative splicing and evolution Daniel Jeffares.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – The Transcription.
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
Protein Modules An Introduction to Bioinformatics.
Review of Laboratory 3 Spectrophotometric determination of DNA quantity, purity Abs 260 nmAbs 280 nmAbs 320 nmAbs 260/Abs
Genomics and bioinformatics summary 1. Gene finding: computer searches, cDNAs, ESTs, 2.Microarrays 3.Use BLAST to find homologous sequences 4.Multiple.
Protein and Function Databases
Lecture 12 Splicing and gene prediction in eukaryotes
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Comparative Genomics of the Eukaryotes
© Wiley Publishing All Rights Reserved.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
SPH 247 Statistical Analysis of Laboratory Data 1 May 12, 2015 SPH 247 Statistical Analysis of Laboratory Data.
Genomes School B&I TCD Bioinformatics May Genome sizes Completed eukaryotic nuclear genomes Type of organismSpeciesGenome size (10 6 base pairs)
WSSP Chapter 7 BLASTN: DNA vs DNA searches atttaccgtg ttggattgaa attatcttgc atgagccagc tgatgagtat gatacagttt tccgtattaa taacgaacgg ccggaaatag gatcccgatc.
WSSP Chapter 7 BLASTN: DNA vs DNA searches atttaccgtg ttggattgaa attatcttgc atgagccagc tgatgagtat gatacagttt tccgtattaa taacgaacgg ccggaaatag gatcccgatc.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Identification of Protein Domains. Orthologs and Paralogs Describing evolutionary relationships among genes (proteins): Two major ways of creating homologous.
SPH 247 Statistical Analysis of Laboratory Data 1May 14, 2013SPH 247 Statistical Analysis of Laboratory Data.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)
Comp. Genomics Recitation 3 The statistics of database searching.
Construction of Substitution Matrices
You have worked for 2 years to isolate a gene involved in axon guidance. You sequence the cDNA clone that contains axon guidance activity. What do you.
Tutorial 4 Substitution matrices and PSI-BLAST 1.
Motif discovery and Protein Databases Tutorial 5.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Using blast to study gene evolution – an example.
Tutorial 3 BLAST 1. BLAST tutorial How to use BLAST Score vs. E-value Exercise Cool story of the day: How Alzheimer is studied in yeast 2.
Russell Group, Protein Evolution _________ ____ Rob Russell Cell Networks University of Heidelberg Interactions and Modules: the how and why of molecular.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Construction of Substitution matrices
Blast 2.0 Details The Filter Option: –process of hiding regions of (nucleic acid or amino acid) sequence having characteristics.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
It will help in preparing for the exam to read:
HIP14 in zebrafish was successfully cloned into a pDrive and sequenced. Alignment analysis was performed by comparing the amino acid sequence in zebrafish.
You have worked for 2 years to isolate a gene involved in axon guidance. You sequence the cDNA clone that contains axon guidance activity. The sequence.
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – the Transcription.
MGM workshop. 19 Oct 2010 Some frequently-used Bioinformatics Tools Konstantinos Mavrommatis Prokaryotic Superprogram.
Gene models and proteomes for Saccharomyces cerevisiae (Sc), Schizosaccharomyces pombe (Sp), Arabidopsis thaliana (At), Oryza sativa (Os), Drosophila melanogaster.
What is BLAST? Basic BLAST search What is BLAST?
The Biologist’s Wishlist A complete and accurate set of all genes and their genomic positions A set of all the transcripts produced by each gene The location.
Bos taurus Olfactory Receptor Katie Davis 1,2 and Sandra Rodriguez-Zas 1 1 Department of Animal Sciences, University of Illinois Urbana-Champaign, 2 ACES.
Organism CDE Standard Candidate VCDE, January 22, 2008 VCDE Small Group: Riki Ohira, Dianne Reeves, Mukesh Sharma, Grace Stafford, Baris Suzek, Lynne Wilkens.
What is BLAST? Basic BLAST search What is BLAST?
A Member of the Kekkon Protein Family Ryan Allis Sean Boyle
Genome Center of Wisconsin, UW-Madison
Identify D. melanogaster ortholog
Basic Local Alignment Search Tool
Protein structure prediction.
Source Page Understanding for Heterogeneous Molecular Biological Data
Basic Local Alignment Search Tool
Presentation transcript:

Chani & Malki present: Project adviser: Dr. Ron Wides The OdzFinder

WANTED Name: Odz a.k.a: Ten-m Family: pair-rule gene Length: 10,000 bp

Getting to Know Odz …  Discovered in D. Melanogaster in 1994 Odz protein is expressed in neurons, developing brain and hindgut Odz protein is expressed in segmentation. Od O d z  Belongs to pair rule gene family  Plays a crucial role in the CNS during fetal development

The Odz Family Ten-m1 Ten-m2 Ten-m3 Ten-m4 Ten-a Ten-m Vertebrates Arthropods Odz gene orthologs have been found in 3 phylums: Nematodes

The Odz Protein  2731 Amino Acids III.hydrophobic sequences, probably transmembrane sequence EGF-like domainIntracellular kinase substrate domainODZ The only pair rule gene that encodes a protein!  Contains 3 domains: I. extracellular EGF-like repeats II. tyrosine kinase phosphorylation sites

EGF-like Repeats x(4)-C-x(0,48)-C-x(3,12)-C-x(1,70)-C-x(1,6)-C-x(2)-G-a-x(0,21)-G-x(2)-C-x EGF-like domain:  amino acid residues  Significant homology to epidermal growth factor (EGF)  Has been found in single or multiple copies in a number of other proteins  Generally found in the extracellular domain of membrane proteins or secreted proteins  Involved in receptor-ligand interactions  Includes 6 conserved cysteine residues involved in disulfide bonds

The lab’s goals: Genomics:  To find a broad family of Odz gene  Phylogenetic trees to discover segmentation mechanism  Massive alignment to find conserved regions  Biological in-vivo experiments to change regions Proteomics:  The protein’s role  How the protein functions  The protein’s interactions with other proteins ( i.e : notch)

Finding Odz Genes  BLASTing new EST libraries Data Bases Se/uences discovered in the lab EST Libraries Odz DataBase  Extracting DNA from various innocent creatures  BLASTing existing databases

Odz Database  The collected data was organized by Michal Markovitz in a relational database.  The database consists of 10 different tables. For example:

2 problems remained: 1. Blast results include many non Odz hits: prokaryotic hits non-metazoan hits EGF region hits Low similarity We need a program to automatically extract Odz hits from NCBI Blast results!!! 2. Every day… New sequences are added to the existing databases New EST libraries are released

A perl program that will automatically extract Odz hits from NCBI Blast results. The OdzFinder

Blast Report Tax Report UpdateDatabase Combination Look up table Evalue>y? Score>x? Evalue>y? Odz EGF? Metazoan? Prokaryote? All EGF No EGF Mixed EGF no yes input S.O.F.T - screen Odz Flow Template

>gi| |gb|AC Apis mellifera BAC clone RP11-18D7, complete sequence Length = Score = 153 bits (328), Expect = 3e-36 Identities = 59/59 (100%), Positives = 59/59 (100%) Frame = +3 / +3 Query: 3 IQHKTFKFHGNYIKQRFHPRIYK*RYKYQRFHPRIYK*NLNLYRVCCSHIILECLQTAH 179 IQHKTFKFHGNYIKQRFHPRIYK*RYKYQRFHPRIYK*NLNLYRVCCSHIILECLQTAH Subjct: 3 IQHKTFKFHGNYIKQRFHPRIYK*RYKYQRFHPRIYK*NLNLYRVCCSHIILECLQTAH 179  The program extracts relevant information from each hit: input Blast Report  BLASTS are performed on the Odz orthologs  The results are sent to the OdzFinder program to be filtered.

>gi| |gb|AC Apis mellifera BAC clone RP11-18D7, complete sequence Length = Score = 153 bits (328), Expect = 3e-36 Identities = 59/59 (100%), Positives = 59/59 (100%) Frame = +3 / +3 Query: 3 IQHKTFKFHGNYIKQRFHPRIYK*RYKYQRFHPRIYK*NLNLYRVCCSHIILECLQTAH 179 IQHKTFKFHGNYIKQRFHPRIYK*RYKYQRFHPRIYK*NLNLYRVCCSHIILECLQTAH Subjct: 3 IQHKTFKFHGNYIKQRFHPRIYK*RYKYQRFHPRIYK*NLNLYRVCCSHIILECLQTAH 179 Taxonomy Report Eukaryota hits 41 orgs [root; cellular organisms]. Bilateria hits 33 orgs [Fungi/Metazoa group; Metazoa; Eumetazoa].. Coelomata hits 31 orgs... Deuterostomia hits 23 orgs.... Chordata hits 22 orgs..... Euteleostomi hits 21 orgs [Craniata; Vertebrata; Gnathostomata; Teleostomi] Tetrapoda hits 14 orgs [Sarcopterygii] Amniota hits 12 orgs Eutheria hits 10 orgs [Mammalia; Theria] Search for eukaryotic and metazoan results. Build prokaryotic database for possible future use. Evolutional distance becomes relevant when dealing with EGF-like repeats. The program will receive the BLAST hit’s Taxonomy Report and manipulate it into a manageable hash table. A default Taxonomy Report will be available when BLASTing against ESTs. input Blast ReportTax Report ; rootroot; cellular organisms; Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; Bilateria; Coelomata; Protostomia; Panarthropoda; Arthropoda; Mandibulata; Pancrustacea; Hexapoda; Insecta; Dicondylia; Pterygota; Neoptera; Endopterygota; Hymenoptera; Apocrita; Aculeata; Apoidea; Apidae; Apinae; Apini; Apiscellular organismsEukaryotaFungi/Metazoa groupMetazoa EumetazoaBilateriaCoelomataProtostomiaPanarthropodaArthropoda MandibulataPancrustaceaHexapodaInsectaDicondyliaPterygota NeopteraEndopterygotaHymenopteraApocritaAculeataApoideaApidae ApinaeApini Apis

Tenascin-m (odz) includes 8 EGF-like repeats The conserved EGF region gave problematic results. Many hits appear only due to their similarity to the EGF region. Query : Subject : EGF? High score!!!

There are three possible positions regarding the hit’s relation to the query’s EGF-like region - I. The hit is completely inside the query’s EGF-region Query Hit II. The hit is completely outside the query’s EGF-region Query Hit III. The hit is partially in the query’s EGF-region Query Hit

Get a better picture..

 score & e-value are examined  Set low threshholds to ensure that very small hits are not missed - some times they are translocations Position I : The hit is completely outside the query’s EGF-like region Evalue<y? Score>x? Odz yes No EGF

Position II : The hit is completely inside the query’s EGF-like region Look up table example: In order to prevent acceptance of non-odz hits with high scores due to their egf-region, a look up table was established evolutionally close query & subject high id % demanded evolutionally distant query & subject low id % demanded Odz ParalogOdz OrthologHitQuery 70%95%Homo SapiensMus Musculus 55%75%Drosophila Melanogaster Mus Musculus Look up table Score>x? Evalue>y? Odz yes ? All EGF

Position III : The hit is partially inside the query’s EGF-like region 2 Possibilities: A. False call ! An EGF hit with insignificant similarity outside of EGF-domains. B. The Real Thing ! EGF with adjacent regions of significant similarity. AB Treat like II Is it more like A or like B? Treat like I Mixed EGF

DBI Update Database : Data flow through DBI  A database interface module for Perl  Enables Perl applications to access multiple database types  Provides a consistent database interface independent of the actual database being used DBD::MSQL MySQL RDBMS DBI Perl Script

speciesscoregi Xenopus Apis mellifera Gallus gallus Homo sapiens Rattus norvegicus Mus musculus Drosophila melanogaster Caenorhabditis elegans Gasterosteus aculeatus Results!

Special thanks to our project adviser Dr. Ron Wides For his guidance, patience & Krispy Kreme donuts