Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs Thomas J Lee, Peter Karp, AIC BRG Ian Paulsen consulting.

Slides:



Advertisements
Similar presentations
Editing Pathway/Genome Databases. SRI International Bioinformatics Pathway Tools Paradigm Separate database from user interface Navigator provides one.
Advertisements

Michigan Electronic Grants System Plus
Unbalanced Reactions by Markus Krummenacker Q
1 Microsoft Access 2002 Tutorial 9 – Automating Tasks With Macros.
CPIT 102 CPIT 102 CHAPTER 1 COLLABORATING on DOCUMENTS.
Overviews and Omics Viewers. SRI International Bioinformatics Introduction Each overview is a genome-scale diagram of a different aspect of the cellular.
eBilling Training Invoicing
SRI International Bioinformatics 1 The consistency Checker, or Overhauling a PGDB By Ron Caspi.
© by Pearson Education, Inc. All Rights Reserved.
Guide to Oracle10G1 Introduction To Forms Builder Chapter 5.
Automating Tasks With Macros
Automating Tasks With Macros. 2 Design a switchboard and dialog box for a graphical user interface Database developers interact directly with Access.
陳虹瑋 國立陽明大學 生物資訊學程 Genome Engineering Lab. Genome Engineering Lab The Newest.
Macros Tutorial Week 20. Objectives By the end of this tutorial you should understand how to: Create macros Assign macros to events Associate macros with.
Management Information Systems MS Access 2003 By: Mr. Imdadullah Lecturer, Department of M.I.S. College of Business Administration, KSU.
Access Tutorial 3 Maintaining and Querying a Database
10 March Setup Users. 10 March Setup Users Window Allows you to perform several user tasks –Enroll users –Enable/disable users –Set user access.
Using Microsoft Outlook: Basics. Objectives Guided Tour of Outlook –Identification –Views Basics –Contacts –Folders –Web Access Q&A.
ROSI Express Report Training: Scheduled Courses with Instructor/Coordinator Diagnostics.
Using Task Manager to Work EDI/ERA Posting Lori Phillips CHUG at Centricity Live April 29 – May 2,2015.
Integration of E. Coli Data (E. coli Pathway and Genomic Data from BioCyc) Jesse Walsh.
Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs.
XP New Perspectives on Microsoft Access 2002 Tutorial 51 Microsoft Access 2002 Tutorial 5 – Enhancing a Table’s Design, and Creating Advanced Queries and.
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
G-Databases Competency 7.00 Objective 7.01 Demonstrate basic database concepts and functions.
XP New Perspectives on Integrating Microsoft Office XP Tutorial 2 1 Integrating Microsoft Office XP Tutorial 2 – Integrating Word, Excel, and Access.
SRI International Bioinformatics 1 Recent Developments in Pathway Tools GMOD Workshop November ‘07 Suzanne Paley Bioinformatics Research Group SRI International.
Management Information Systems MS Access MS Access is an application software that facilitates us to create Database Management Systems (DBMS)
Copyright OpenHelix. No use or reproduction without express written consent1.
Duty Log and Chat Setup SSG Frese, Jerome S. Sensor Manager Cell 12 MDD.
Transport Identification Parser: Inferring Transport Reactions from Protein Data for PGDBs Thomas J Lee, Peter Karp, AIC BRG Ian Paulsen consulting.
Microsoft Office 2007 Access Chapter 6 Using Macros, Switchboards, PivotTables, and PivotCharts.
XP 1 Microsoft Access 2003 Introduction To Microsoft Access 2003.
SRI International Bioinformatics 1 Advanced Editing of Pathway/Genome Databases Ron Caspi.
SRI International Bioinformatics 1 Object Groups & Enrichment Analysis Suzanne Paley Pathway Tools Workshop 2010.
® Microsoft Office 2010 Access Tutorial 3 Maintaining and Querying a Database.
Key Applications Module Lesson 21 — Access Essentials
MS Access 2007 Management Information Systems 1. Overview 2  What is MS Access?  Access Terminology  Access Window  Database Window  Create New Database.
The consistency Checker, or Overhauling a PGDB By Ron Caspi.
PearsonAccess April 14, PearsonAccess – Agenda Order Tracking Additional Orders Student Data Upload (SDU) files New Student Wizard Online Testing.
The Next Generation. Parent Access Grade History and Attendance.
Reports and Learning Resources Module 5 1. SLMS Primary Administrator Training Module 5: Reports and Learning Resources 2.
SRI International Bioinformatics 1 Submitting pathway to MetaCyc Ron Caspi.
FIX Eye FIX Eye Getting started: The guide EPAM Systems B2BITS.
XP Chapter 2 Succeeding in Business with Microsoft Office Access 2003: A Problem-Solving Approach 1 Building The Database Chapter 2 “It is only the farmer.
SRI International Bioinformatics 1 SmartTables & Enrichment Analysis Peter Karp SRI Bioinformatics Research Group September 2015.
XP New Perspectives on Microsoft Office Access 2003 Tutorial 10 1 Microsoft Office Access 2003 Tutorial 10 – Automating Tasks With Macros.
Online Catalog Tutorial. Introduction Welcome to the Online Catalog Tutorial. This is the place to find answers to all of your online shopping questions.
Introduction to KE EMu Unit objectives: Introduction to Windows Use the keyboard and mouse Use the desktop Open, move and resize a.
Copyright OpenHelix. No use or reproduction without express written consent1.
Introduction to KE EMu Unit objectives: Introduction to Windows Use the keyboard and mouse Use the desktop Open, move and resize a.
SRI International Bioinformatics 1 Editing Pathway/Genome Databases Ron Caspi.
PestPac Software. Leads The Leads Module allows you to track all of your pending sales for your company from the first contact to the close. By the end.
Excel part 5 Working with Excel Tables, PivotTables, and PivotCharts.
Key Applications Module Lesson 22 — Managing and Reporting Database Information Computer Literacy BASICS.
T U T O R I A L  2009 Pearson Education, Inc. All rights reserved Address Book Application Introducing Database Programming.
SRI International Bioinformatics 1 Pathway Tools Features Available Only in the Desktop Version PathoLogic.
XP New Perspectives on Microsoft Office Access 2003, Second Edition- Tutorial 6 1 Microsoft Office Access 2003 Tutorial 6 – Creating Custom Forms.
Recent Developments and Future Directions in Pathway Tools Peter D. Karp SRI International.
Core LIMS Training: Entering Experimental Data – Simple Data Entry.
The Simple Corpus Tool Martin Weisser Research Center for Linguistics & Applied Linguistics Guangdong University of Foreign Studies
Michigan Electronic Grants System Plus
Practical Office 2007 Chapter 10
PathoLogic: More about Matching Enzyme Names to Reactions
Running the Transport Inference Parser
Incremental PathoLogic
Propagating Changed Annotation and Pathway Information
G-Databases Competency 7.00
Plant Cells.. Membrane.. Nutrients traffic.. Regulation..
Purchase Document Management
Presentation transcript:

Transport Inference Parser: Inferring Transport Reactions from Protein Data for PGDBs Thomas J Lee, Peter Karp, AIC BRG Ian Paulsen consulting

Running the Transport Inference Parser 1. Run Pathway Tools. 2. Make the organism of interest the current organism. 3. [Run operon predictor]. 4. Select Tools/Pathologic. 5. From Pathologic, select Refine/Transport Inference Parser. 6. If running TIP for the first time on the organism, optionally provide its aerobicity. 7. Wait and observe progress. 8. When complete, Probable Transporter Table window appears. 9. You may now review and modify the inferred transporters.

Task Description Infer transport reactions from protein data and construct them in BioCyc KBs for a variety of organisms, automatically where possible, with human assistance where necessary.

Scope Run for all Tier 3 KBs (~700 KB) To support both automated and user-controlled operation: –Distinguish high- and low-confidence inferences –Automated mode accepts all high-confidence inferences –Track evidence where possible –Provide accept/reject/edit options to user

Output Construct the following for each inferred transported substrate : –Transport-Reaction frame of correct subclass Assign compartments – use simple assumptions –Enzymatic-Reaction frame linking protein to reaction Construct Protein-Complexes as required

Sequence of operations 1. Find candidate transporter proteins. 2. Filter out candidates. 3. Identify substrate(s). 4. Assign an energy coupling to transporter. 5. Identify compartment of each substrate. 6. Group subunits of transporter complexes. 7. Construct full compartmental reaction from substrate and coupling. 8. Construct enzymatic reaction linking each reaction with protein.

1. Find candidate transporter proteins Input: all protein frames of organism Output: internal data structure (PARTRANS) Exclude proteins with long annotations (default: 12 words) Tokenize the annotation Annotation must contain an indicator. Exs: "transport”, “export”, “permease”, “channel”

2. Filter candidates Exclude if annotation matches a list of regular expressions of counterindicator phrases and patterns –Ex: “transport associated domain” Exclude if annotation contains counterindicator word –Exs: “regulator”, “nuclear-export”

3. Identify substrate(s) Search annotation for names of MetaCyc compounds. Details: Multiple substrates indicate multiple reactions, symport/antiport pair, or both. Exs: “cytosine/purines/uracil/thiamine/allantoin permease family protein” “magnesium and cobalt transport protein cora, putative” “sodium:sulfate symporter transmembrane domain protein” “probable agcs sodium/alanine/glycine symporter” Exclude non-substrates that look like compounds via an exception list. Exs: “ as” “be” “c” “i”

3. Identify substrate(s) (cont.) Name canonicalization. Ex: strip plurals. Affixed substrates. Exs: “-transporting” “-specific” Lookup special ionic forms. Exs: “cuprous” “ferric” “hydrogen” Resolve multivalent options using aerobicity. Exs: “FE” “CR” “MN” Two-word substrates, substrate classes. Ex: “amino acid”

4. Assign an energy coupling. 1. Search annotation for prioritized list of indicators. Exs: ("atp-binding". ATP) ("mfs". SECONDARY) ("pts". PTS) ("phosphotransferase". PTS) ("carrier". SECONDARY) ("channel". CHANNEL) 2. Some substrates imply a coupling. Ex: protoheme => ATP Absence of indicator => UNKNOWN Deferred some more sophisticated techniques: BLAST vs. E.coli HMM family identification

5. Identify compartment of each substrate. Use keywords to determine compartment of primary substrate (Exs: “ export ”, “ antiporter ” ) Otherwise assume primary substrate is transported into cell (periplasm => cytoplasm) Deferred complex compartment analysis: Assume E.coli-like cellular structure

6. Group subunits of transporter complexes. Many transporters are systems of several proteins. These are grouped into complexes Grouping criteria; all must be met: –Predicted coupling is ATP or PEP –Predicted substrates are identical –Genes of proteins have a common operon (NOTE requirement on operon availability) Resulting complex is added to KB under Protein- Complexes.

7. Construct full compartmental reaction from substrate and coupling. Determine set of transported substrates for this transporter: For SECONDARY coupling: –Identify auxiliary substrate providing ion gradient (H+, Na+) –Remove from transported substrate list –Place on side of reaction indicated by symport/antiport clues For other couplings: –Determined previously in substrate analysis

7. Construct full compartmental reaction from substrate and coupling (cont). For each transported substrate of this transporter, either import reaction (from E.coli) or to create new one. 1.Search import KB for reaction with matching substrates (find-rxn-by-substrates) –Transported substrate added with indicated compartment –Auxiliary substrates determined by coupling. Ex: – CHANNEL typically have none – ATP have ATP/H2O  ADP/phosphate 2.If one reaction is found, import: (import-reactions trxns src-kb dst-kb …) 3.If multiple reactions found, retain all. 4.Else if reaction is not present in KB, create new rxn

7. Construct full compartmental reaction from substrate and coupling (cont). Create new reaction: Create reaction frame, subclass determined by coupling: –(create-instance-w-generated-id rxn-class) Add transported and auxiliary substrates to appropriate sides of reaction

8. Construct enzymatic reaction linking each reaction with protein. For each created reaction: (add-reactions-to-protein …) Added evidence code, history string arguments Subordinates new [ (import-reactions) handles import of enzymatic- reactions]

Running the Transport Inference Parser 1. Run Pathway Tools. 2. Make the organism of interest the current organism. 3. [Run operon predictor]. 4. Select Tools/Pathologic. 5. From Pathologic, select Refine/Transport Inference Parser. 6. If running TIP for the first time on the organism, optionally provide its aerobicity. 7. Wait and observe progress. 8. When complete, Probable Transporter Table window appears. 9. You may now review and modify the inferred transporters.

GUI Overview 1.Window is titled: Probable Transporter Table for Organism 2.Table of inferred transporters is organized into columns: –Status –Gene –Substrate –Coupling –Reaction / Function 3. Each row contains a transport reaction description: –Multiple reactions per transport protein are possible –Sort by Gene (the default) to keep together visually 4. Aggregate pane shows counts by status. 5. Mousing over a reaction shows details in bottom pane.

Notional GUI Example StatusGeneSubstrateCouplingReaction / Annotation Un- reviewed T0059Ca2+SECONDARYCa+2[c] + H+[p] = Ca+2[p] + H+[c] calcium/proton antiporter RejectedT3669phosphateATPH2O + ATP + phosphate[p] = ADP + 2 phosphate[c] phosphate transport atp- binding protein AcceptedT0080Na+CHANNELNa+[p] = Na+[c] sodium channel

Reviewing and Editing Left-click on a row –Dialog box appears May edit: –Function (name) –Energy coupling May invoke Reaction Editor on reaction May retract reaction May update status

Transporter Status Accepted: –Incorporate transporter into PGDB upon save Rejected: –Discard transporter upon save Unreviewed: –Initial value of status –Change to Accepted to preserve edits Accept and Reject are undoable

Filtering and Sorting Filtering excluded transporters from display: –Filter low- or high-confidence transporters –Filter by status –Filter by number of reactions per substrate Sort transporters by: –Gene –Energy Coupling –Substrate number/name –Status (e.g., Accepted, Rejected)

Group Operations TIP permits en masse acceptance or rejection of remaining predictions being shown: Edit / Accept all Unreviewed predictions being shown

Saving Your Work The TIP has made in-memory modifications to the KB; nothing has been saved. Exit / Save saves all predictions & edits. Exit / Cancel reverts to most recent save.

Multisession Workflow 1.TIP remembers accepted predictions in the KB. 2.TIP remembers rejected transporters in a file under the organism directory. 3.To continue, re-run TIP and resume session. 4.If you don ’ t resume (i.e., start from scratch): –Will not re-predict Accepteds –Will re-predict Rejecteds