Tree Reconciliation: Notung Reconciliations Notes on how to map Notung format files to a reconciliation map that can be imported to TR database.

Slides:



Advertisements
Similar presentations
Compiler Construction
Advertisements

AST Generation Prepared by Manuel E. Bermúdez, Ph.D. Associate Professor University of Florida Programming Language Concepts Lecture 9.
CSE-700 Parallel Programming Assignment 6 POSTECH Oct 19, 2007 박성우.
Bottom up Parsing Bottom up parsing trys to transform the input string into the start symbol. Moves through a sequence of sentential forms (sequence of.
Bottom-up Parsing A general style of bottom-up syntax analysis, known as shift-reduce parsing. Two types of bottom-up parsing: Operator-Precedence parsing.
Evolutionary Analysis. Tree Mathematical structure Model evolutionary history.
Pushdown Automata Consists of –Pushdown stack (can have terminals and nonterminals) –Finite state automaton control Can do one of three actions (based.
Chapter 10 Introduction to Arrays
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Efficient Clustering of Large EST Data Sets on Parallel Computers CECS Bioinformatics Journal Club September 17, 2003 Nucleic Acids Research, 2003,
The Essence of Command Injection Attacks in Web Applications Zhendong Su and Gary Wassermann Present by Alon Kremer April 2011.
CHAPTER 8 SEQUENCE CONTROL Hardware- Von Neumann architecture (sequence from incrementing program counter); loop and conditional from test and jump (branch)
CISC667, F05, Lec14, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (I) Maximum Parsimony.
CS511 - Spring 2006 Final Presentation Project 3 - Team 1 Ching Chang Panagiotis Papapetrou Raymond Sweha.
Tree Pattern Matching in Phylogenetic Trees Automatic Search for Orthologs or Paralogs in Homologous Gene Sequence Databases By: Jean-François Dufayard,
Software and Software Vulnerabilities. Synopsis Array overflows Stack overflows String problems Pointer clobbering. Dynamic memory management Integer.
Page 1 Generalized Inference with Multiple Semantic Role Labeling Systems Peter Koomen, Vasin Punyakanok, Dan Roth, (Scott) Wen-tau Yih Department of Computer.
CS273a Lecture 9/10, Aut 10, Batzoglou Multiple Sequence Alignment.
Gene transfer Organismal tree: species B species A species C species D Gene Transfer seq. from B seq. from A seq. from C seq. from D molecular tree: speciation.
A computational phylogenetic approach to interaction analysis Cynthia Sims Parr University of Maryland College Park Ecological Society of America Montreal,
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
OPERATOR PRECEDENCE PARSING
Christian M Zmasek, PhD Burnham Institute for Medical Research Bioinformatics and Systems Biology
Team Babbage Charles Maingi Seph Newman Jon Rollman Nils Schlupp.
-Mandakinee Singh (11CS10026).  What is parsing? ◦ Discovering the derivation of a string: If one exists. ◦ Harder than generating strings.  Two major.
PHYLOGENETIC TREES. A phylogeny, or species/evolutionary tree, represents the evolutionary relationships among a set of organisms or groups of organisms,
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
The bootstrap, consenus-trees, and super-trees Phylogenetics Workhop, August 2006 Barbara Holland.
Introduction to Perl Yupu Liang cbio at MSKCC
Estimating Species Tree from Gene Trees by Minimizing Duplications
Dependency Parser for Swedish Project for EDA171 by Jonas Pålsson Marcus Stamborg.
Chapter 3 Context-Free Grammars and Parsing. The Parsing Process sequence of tokens syntax tree parser Duties of parser: Determine correct syntax Build.
Phylogenetics.
CSE 5317/4305 L3: Parsing #11 Parsing #1 Leonidas Fegaras.
Structured learning: overview Sunita Sarawagi IIT Bombay TexPoint fonts used in EMF. Read the TexPoint manual before.
COMPILERS 4 TH SEPTEMBER 2013 WEDNESDAY 11CS10045 SRAJAN GARG.
BY: JAKE TENBERG & CHELSEA SHIPP PROJECT REVIEW: JGIBBERISH.
Compiler Construction Lecture Five: Parsing - Part Two CSC 2103: Compiler Construction Lecture Five: Parsing - Part Two Joyce Nakatumba-Nabende 1.
Application of Phylogenetic Networks in Evolutionary Studies Daniel H. Huson and David Bryant Presented by Peggy Wang.
1.3 The ZigBee application framework Jae Shin Lee.
CS 2130 Lecture 18 Bottom-Up Parsing or Shift-Reduce Parsing Warning: The precedence table given for the Wff grammar is in error.
Semantic analysis Jakub Yaghob
Parsing #1 Leonidas Fegaras.
Parsing — Part II (Top-down parsing, left-recursion removal)
Constructing Precedence Table
Chapter 4 - Parsing CSCE 343.
Parsing Bottom Up CMPS 450 J. Moloney CMPS 450.
Bottom-up parsing Goal of parser : build a derivation
Unit-3 Bottom-Up-Parsing.
Parsing — Part II (Top-down parsing, left-recursion removal)
Prioritize Organism Selection for the Genomic Encyclopedia Project to Optimize Phylogenetic Diversity Dongying Wu April 10, 2007.
Overview of Compilation The Compiler BACK End
Compiler Construction
Parsing Techniques.
Shift Reduce Parsing Unit -3
Programming Language Syntax 7
(b) Tree representation
Some sticky issues Short branches and ortholog/paralog inference
CS 581 Tandy Warnow.
Simple Linear Regression
Parsing — Part II (Top-down parsing, left-recursion removal)
Maximize read usage through mapping strategies
Lecture 18 Bottom-Up Parsing or Shift-Reduce Parsing
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
OPERATOR GRAMMAR No Ɛ-transition. No two adjacent non-terminals. Eg.
Example Simple graph Binary Decision Tree Binary Decision Diagram …
Parsing CSCI 432 Computer Science Theory
Presentation transcript:

Tree Reconciliation: Notung Reconciliations Notes on how to map Notung format files to a reconciliation map that can be imported to TR database

Notung Notung software is a framework for tree reconciliation using duplication loss parsimony Notung can resolve polytomies Input parameters – Cost of duplication – Cost of loss – Conditional Duplication cost – Edge weight threshold Input files – Gene Tree (NH or NHX) – Species Tree (NH or NHX) Output – Reconciled gene tree (Newick, NHX, or Notung) Availability –

Notung: File Format Notung is a modified NHX file format – All nodes in species-tree and gene-tree are named – Species-tree node names are used to map gene tree nodes to the species tree – Losses are encoded in the leaf node name using species tree node ids: Taxon*LOST Murinae*LOST[&&NHX:S=Murinae] – NHX tags map gene tree nodes to the species tree – Reconciled tree exists in a single line terminating with semicolon – File includes the species tree [&&NOTUNG-SPECIES-TREE(((human,pan)Homo/Pan/Gorillagroup,mac)Catarrhini,(mouse,rat)Murinae)Euarchontoglires] – File ncludes meta data abut program parameters [&&NOTUNG-PARAMETERS:T=90.0:VERSION=2.6:CL=1.0:CD=1.5:CCD=0.0]

Notung: Example Reconciliation Reconciliation as drawn by Notung GUI.

rat*LOST pan mac human mouse Murinae Homo/Pan/Gorillagroup Euarchontoglires Catarrhini Human-gene BA3 Human-gene BA1 Human-gene BA2 Human-gene BA6 Human-gene BA4 Human-gene BA5 Pan-gene BA2 Pan-gene BA1 mac-gene BA mac-gene BB2 Human-gene BB1 rat-gene A mouse-geneB mac-gene AA mouse-geneA mac-gene A Human-gene A Pan-gene A R523 n21 R521 n14 n16 n11 r520 n25 r706 Pan*LOST r511 r510 Notung reconciliation drawn in “fat-tree” format n2 r507 n7 n8 n38 n37 Notung: Example Reconciliation Human-gene BB2 Pan*LOST r707 r514 mac-gene BB1 r708 n35 Murinae*LOST r709 Homo/Pan/Gorilla*LOST r710 r509 rat

rat*LOST pan mac human mouse Murinae Homo/Pan/Gorillagroup Euarchontoglires Catarrhini Human-gene BA3 Human-gene BA1 Human-gene BA2 Human-gene BA6 Human-gene BA4 Human-gene BA5 Pan-gene BA2 Pan-gene BA1 mac-gene BA mac-gene BB2 Human-gene BB1 rat-gene A mouse-geneB mac-gene AA mouse-geneA mac-gene A Human-gene A Pan-gene A R523 n21 R521 n14 n16 n11 r520 n25 r706 Pan*LOST r511 r510 Bootstrap values available for the n* labeled nodes n2 r507 n7 n8 n38 n37 Notung: Example Reconciliation Human-gene BB2 Pan*LOST r707 r514 mac-gene BB1 r708 n35 Murinae*LOST r709 Homo/Pan/Gorilla*LOST r710 r509 rat

Notung: File Format (((mouse-gene-A[&&NHX:S=mouse],rat-gene-A[&&NHX:S=rat])n2:56.0[&&NHX:S=Murinae:D=N:B=56.0],((human-gene- A[&&NHX:S=human],pan-gene-A[&&NHX:S=pan])r507[&&NHX:S=Homo/Pan/Gorillagroup:D=N],mac-gene- A[&&NHX:S=mac])n7:70.0[&&NHX:S=Catarrhini:D=N:B=70.0])n8:100.0[&&NHX:S=Euarchontoglires:D=N:B=100.0],(((((((((human-gene- BA4[&&NHX:S=human],human-gene-BA5[&&NHX:S=human])n14:86.0[&&NHX:S=human:D=Y:B=86.0],human-gene- BA6[&&NHX:S=human])n16:78.0[&&NHX:S=human:D=Y:B=78.0],( (human-gene-BA1[&&NHX:S=human],human-gene-BA2[&&NHX:S=human])r523[&&NHX:S=human:D=Y],human-gene- BA3[&&NHX:S=human])n21:76.0[&&NHX:S=human:D=Y:B=76.0])r521[&&NHX:S=human:D=Y],(pan-gene-BA1[&&NHX:S=pan],pan- gene-BA2[&&NHX:S=pan])n11:97.0[&&NHX:S=pan:D=Y:B=97.0])r520[&&NHX:S=Homo/Pan/Gorillagroup:D=N],mac-gene- BA[&&NHX:S=mac])n25:73.0[&&NHX:S=Catarrhini:D=N:B=73.0],((human-gene- BB1[&&NHX:S=human],pan*LOST[&&NHX:S=pan])r706[&&NHX:S=Homo/Pan/Gorillagroup],mac-gene- BB2[&&NHX:S=mac])r511[&&NHX:S=Catarrhini:D=N])r510[&&NHX:S=Catarrhini:D=Y],((human-gene- BB2[&&NHX:S=human],pan*LOST[&&NHX:S=pan])r707[&&NHX:S=Homo/Pan/Gorillagroup],mac-gene- BB1[&&NHX:S=mac])r514[&&NHX:S=Catarrhini:D=N])r509[&&NHX:S=Catarrhini:D=Y],(mouse-gene- B[&&NHX:S=mouse],rat*LOST[&&NHX:S=rat])r708[&&NHX:S=Murinae])n35:98.0[&&NHX:S=Euarchontoglires:D=N:B=98.0],((mac-gene- AA[&&NHX:S=mac],Homo/Pan/Gorillagroup*LOST[&&NHX:S=Homo/Pan/Gorillagroup])r709[&&NHX:S=Catarrhini],Murinae*LOST[&&NH X:S=Murinae])r710[&&NHX:S=Euarchontoglires])n37:100.0[&&NHX:S=Euarchontoglires:D=Y:B=100.0])n38:94.0[&&NHX:S=Euarchontogl ires:D=Y:B=94.0]; [&&NOTUNG-SPECIES-TREE(((human,pan)Homo/Pan/Gorillagroup,mac)Catarrhini,(mouse,rat)Murinae)Euarchontoglires] [&&NOTUNG-PARAMETERS:T=90.0:VERSION=2.6:CL=1.0:CD=1.5:CCD=0.0]

Notung: NHX Tags Notung NHX Tags – Gene tree node tags S = Node in the species tree that the gene tree node is mapped to – This will match a name used in &&NOTUNG-SPECIES-TREE D = Boolean – Y = a duplication node: the gene tree node maps to the edge leading up to the species tree node identified by S – N = a speciation node: the gene tree node maps on the node in the species tree identified by S B = Bootstrap value – Double precision/Float number (ranges from 0.0 to 100.0) – [&&NOTUNG-Parameters … T = Edge weight threshold Version = Notung version used CL = Cost of loss CD = Cost of duplication CCD = Cost of conditional duplications – Includes the species tree [&&NOTUNG-SPECIES-TREE(((human,pan)Homo/Pan/Gorillagroup,mac)Catarrhini,(mouse,rat)Murinae)Euarchontoglires] – Includes meta data abut program parameters [&&NOTUNG-PARAMETERS:T=90.0:VERSION=2.6:CL=1.0:CD=1.5:CCD=0.0]

Notung: Questions … Can multiple gene trees be included in a single Notung format file? How does Notung treat multiple trees with same parsimony score in its output? What is How are unique names preserved for multiple loss events on a single edge? – These are leaf nodes so does not really matter How does Notung handle internal nodes not named in the input species tree?

Notung & NHX Parsing Attempting to parse the NHX tags with existing Bioperl NHX parser throws an error after parsing the reconciled gene tree: j-macbook01:scripts jestill$./tr_test_species_tree.pl -i sandbox/notung/exercise5_genetree_reconciled_resolvedPolytomies --format nhx EXCEPTION: Bio::Root::Exception MSG: Unrecognized, non &&NHX string: >>Euarchontoglires<<; lastevent is ) STACK: Error::throw STACK: Bio::Root::Root::throw /opt/local/lib/perl5/site_perl/5.8.9//Bio/Root/Root.pm:368 STACK: Bio::TreeIO::nhx::next_tree /opt/local/lib/perl5/site_perl/5.8.9//Bio/TreeIO/nhx.pm:246 STACK:./tr_test_species_tree.pl:

Notung & NHX Parsing Trimming the non-standard lines from the bottom of the NHX files does allow the program to parse the output without error: j-macbook01:scripts jestill$./tr_test_species_tree.pl -i sandbox/notung/exercise5_genetree_reconciled_resolvedPolytomiesTrimmed --format nhx NUM TAXA:25 Taxon Ids: mouse-gene-A rat-gene-A … …. r710 r709 mac-gene-AA Homo/Pan/Gorillagroup*LOST Murinae*LOST NODES WITH IDS:49 NODES WITH BOOTSTRAP VALUES:11 NODES WITH BRANCH LENGTH VALUES:11

Notung & NHX Parsing A trick to parsing these files is to read in input files and echo lines terminated by semicolon to a file handle passed to the TreeIO object while ( ) { chomp; if (m/(.*)\;/) { $tree_num++; # Using a pipe to create a new filehand by echoing the # single line of interest... seems sloppy but this # works. There is probably a more elegant way to do this my $tree_handle = new FileHandle("echo \'$_\' |") || die "Can not echo filehandle"; my $die_msg = "Can not open $format format tree file:\n$infile"; my $tree_in = new Bio::TreeIO(-fh => $tree_handle, -format => $format) || die $die_msg; while( my $tree = $tree_in->next_tree ) { # Do stuff with the tree

Notung and TR Database Notung produces a format that is different from the PRIME format used by PrimeGSR and Treebest. Working now to see if the Notung file format is compatible with existing NHX module in BioPerl or if a new module will be needed? – The NHX module of BioPerl CAN be used with some creative parsing of the Notung format