Human Genetics Integrative Bioinformatics using Cytoscape (and R2)

Slides:



Advertisements
Similar presentations
Pathways analysis Iowa State Workshop 11 June 2009.
Advertisements

CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
The IntAct Database Sandra Orchard & Birgit Meldal.
5 EBI is an Outstation of the European Molecular Biology Laboratory. Master title Molecular Interactions – the IntAct Database Sandra Orchard EMBL-EBI.
Pathways & Networks analysis COST Functional Modeling Workshop April, Helsinki.
Gene expression analysis summary Where are we now?
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Gene Co-expression Network Analysis BMI 730 Kun Huang Department of Biomedical Informatics Ohio State University.
Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break 14:45 – 15:15Regulatory pathways lecture 15:15 – 15:45Exercise.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
BioPathways SIG, July Networks in Biology Molecular interaction and similarity networks are vital for understanding gene function.
Biological networks Tutorial 12. Protein-Protein interactions –STRING Protein and genetic interactions –BioGRID Signaling pathways –SPIKE Network visualization.
August 29, 2002InforMax Confidential1 Vector PathBlazer Product Overview.
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
Efficient Algorithms for Detecting Signaling Pathways in Protein Interaction Networks Jacob Scott, Trey Ideker, Richard M. Karp, Roded Sharan RECOMB 2005.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
Computational Molecular Biology Biochem 218 – BioMedical Informatics Gene Regulatory.
Ch10. Intermolecular Interactions and Biological Pathways
Cytoscape A powerful bioinformatic tool Mathieu Michaud
Review of Ondex Bernice Rogowitz G2P Visualization and Visual Analytics Team March 18, 2010.
1Module #: Title of Module. Network visualization and analysis with Cytoscape Gary Bader July 15, 2013 – Network Analysis, UCLA.
Tutorial session 1 Network generation Exploring PPI networks using Cytoscape EMBO Practical Course Session 8 Nadezhda Doncheva and Piet Molenaar.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
Gene Set Enrichment Analysis (GSEA)
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Hyun Seok Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute, Yonsei University College of Medicine Lecture 13. Network Analysis MES
EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center.
Networks and Interactions Boo Virk v1.0.
Tutorial session 2 Network annotation Exploring PPI networks using Cytoscape EMBO Practical Course Session 8 Nadezhda Doncheva and Piet Molenaar.
CellFateScout step- by-step tutorial for a case study Version 0.94.
EADGENE and SABRE Post-Analyses Workshop 12-14th November 2008, Lelystad, Netherlands 1 François Moreews SIGENAE, INRA, Rennes Cytoscape.
Copyright OpenHelix. No use or reproduction without express written consent1.
BIological NetwOrk Manager Cytoscape plugin Andrei Zinovyev Institut Curie/INSERM/Ecole de Mines, UMR 900 “Computational Systems Biology of Cancer”
Tutorial session 3 Network analysis Exploring PPI networks using Cytoscape EMBO Practical Course Session 8 Nadezhda Doncheva and Piet Molenaar.
SRI International Bioinformatics 1 SmartTables & Enrichment Analysis Peter Karp SRI Bioinformatics Research Group September 2015.
Tutorial session 1 Network generation Exploring PPI networks using Cytoscape EMBO Practical Course Session 8 Nadezhda Doncheva and Piet Molenaar.
Central dogma: the story of life RNA DNA Protein.
EB3233 Bioinformatics Introduction to Bioinformatics.
Copyright OpenHelix. No use or reproduction without express written consent1.
Tutorial session 3 Network analysis Exploring PPI networks using Cytoscape EMBO Practical Course Session 8 Nadezhda Doncheva and Piet Molenaar.
Copyright OpenHelix. No use or reproduction without express written consent1.
Introduction to biological molecular networks
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
Integrated Genomic and Proteomic Analyses of a Systematically Perturbed Metabolic Network Science, Vol 292, Issue 5518, , 4 May 2001.
Introduction and Applications of Microarray Databases Chen-hsiung Chan Department of Computer Science and Information Engineering National Taiwan University.
CSC, Dec.15-16,2005. Cytoscape Team Trey Ideker Mark Anderson Nerius Landys Ryan Kelley Chris Workman Past contributors: Nada Amin Owen Ozier Jonathan.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
High throughput biology data management and data intensive computing drivers George Michaels.
Goal: Help users learn to use Cytoscape to answer biologically relevant questions by providing information, tutorials and linking new.
Human Genetics Integrative Bioinformatics using Cytoscape (and R2)
Canadian Bioinformatics Workshops
Microarray Technology and Data Analysis Roy Williams PhD Sanford | Burnham Medical Research Institute.
Canadian Bioinformatics Workshops
CSCI2950-C Lecture 12 Networks
Canadian Bioinformatics Workshops
Tutorial 12 Biological networks.
Pathway Visualization
CSCI2950-C Lecture 13 Network Motifs; Network Integration
Schedule for the Afternoon
Network biology An introduction to STRING and Cytoscape
Pathway Visualization
Analysis of Clustering Algorithms
Presentation transcript:

Human Genetics Integrative Bioinformatics using Cytoscape (and R2)

Human Genetics (Bio)Chemistry versus Molecular Biology …some basic concepts (Bio)Chemistry Concentrations Molecular structures Reaction equations Quantitative Defined experimental setup Molecular Biology Regulation Large biomolecules Large scale processes Qualitative Complex experimental setup (by necessity!)

Human Genetics Molecular Biology: New techniques Integrative Bioinformatics needed  (Deep)Sequencing – Arrays – Proteomics Quantitative analysis –handling large datasets –statistics Capturing complexity –integration –graphs Integrative Bioinformatics: Integrated Bioinformaticians!

Human Genetics Integrative Bioinformatics: An example

Human Genetics Integrative Bioinformatics: What they did 1.Sequence genome; assign gene function using protein sequence, structural similarities (Bonneau et al., 2004; Ng et al., 2000) 2.Perturb cells: environmental factors; knockouts (Baliga et al., 2004; Kaur et al., 2006; Kottemann et al., 2005) 3.Measure changes: microarrays (Baliga et al., 2004;Kaur et al., 2006; Whitehead et al., 2006). 4.Integrate diverse data (mRNA levels, evolutionarily conserved associations among proteins, metabolic pathways, cis-regulatory motifs, etc.) with the cMonkey algorithm to reduce data complexity and identify subsets of genes that are coregulated in certain environments (biclusters) (Reiss et al., 2006). 5.Using the machine learning algorithm Inferelator construct a dynamic network model for influence of changes in EFs and TFs on the expression of coregulated genes (Bonneau et al., 2006). 6.Explore the network with Gaggle, a framework for data integration and software interoperability to formulate and then experimentally test hypotheses to drive additional iterations of steps 2–6 (Shannon et al., 2006)

Human Genetics Integrative Bioinformatics: Their framework

Human Genetics Integrative Bioinformatics: results

Human Genetics Goes to show that: 1.Aggregate 2.Search/Visualize 3.Analyze/Feedback Combine data from different sources Filter Algorithms Need for adaptable software Goal: Facilitate ideas

Human Genetics Cytoscape - Network Visualization and Analysis Freely-available (open- source, java) software, easily extensible (Plugin API) Visualizing networks (e.g. molecular interaction networks) Analyzing networks with gene expression profiles and other cell state data (GO, proteomics, …) Used in several hundred analyses in recent literature Continuity guaranteed

Human Genetics An example Cytoscape work-flow

Human Genetics Cytoscape Workflow 1. Load Networks (Import network data into Cytoscape)‏ 2. Load Attributes (Get data about networks into Cytoscape)‏ 3. Analyze and Visualize Networks 4. Prepare for Publication A specific example of this workflow: –Cline, et al. “Integration of biological networks and gene expression data using Cytoscape”, Nature Protocols, 2, (2007).

Human Genetics Networks as graphs A Network is a collection of –Nodes (or vertices) –Edges connecting nodes (directed or undirected, weighted, multiple edges, self-edges) Nodes can represent proteins, genes, metabolites, or groups of these (e.g. complexes) - any sort of object Edges can be either physical or functional interactions, activators, regulators, reactions - any sort of relations

Human Genetics Cytoscape Workflow 1. Load Networks (Get network data into Cytoscape)‏ 2. Load Attributes (Get data about networks into Cytoscape)‏ 3. Analyze and Visualize Networks 4. Prepare for Publication

Human Genetics Creating a network

Human Genetics Free-format Text and Excel Files Specify Input File Define Columns Text Parsing Options Preview

Human Genetics : over 240 pathway db’s Pathways: plenty resources

Human Genetics All kinds of network data… Physical interactions –Protein – Protein interactions –Protein – DNA interactions –Metabolic interactions Functional interactions –Co-expression relations –Genetic interactions –Knockout/siRNA – targets

Human Genetics Pre-formatted Network Files Cytoscape supports many popular file formats:  SIF (Simple Interaction Format)‏  GML (Graph Markup Language)‏  XGMML (eXtensible Graph Markup and Modeling Language)‏  BioPax (Biological Pathway Data)‏  PSI-MI 1 & 2.5 (Protein Standards Initiative)‏  SBML Level 2 (Systems Biology Markup Language)‏ Available for download from data sources (URLs, web-services, formatted table files)

Human Genetics Internet Databases Cytoscape version 2.6 –web service clients: import networks directly from several trusted internet resources  IntAct (MBL-EBI)  PathwayCommons (collection of data resources)  NCBI Entrez Gene  Many more will be included...

Human Genetics Interaction Database Search Import Visualize and Analyze

Human Genetics Cytoscape Workflow 1. Load Networks (Get network data into Cytoscape)‏ 2. Load Attributes (Get data about networks into Cytoscape)‏ 3. Analyze and Visualize Networks 4. Prepare for Publication

Human Genetics What are Attributes? Any data that describes or provides details about the nodes and edges in the network –Gene Expression Data –Mass Spectrometry Data –Protein Structure Information –Gene Ontology (GO) terms –Interaction Confidence Values, etc Cytoscape support multiple data types –Numbers (integers, floats) –Text (strings) –Logical (booleans) –Lists…

Attribute Management Node or Edge ID Specific Attribute Tabs Select Attributes for Display Strings and floating type of attributes

Human Genetics Load Attributes: Import Attribute Files Map data about Networks onto Networks. Attributes can be loaded in many of the same ways as networks.  Import pre-formatted attribute files  Import formatted text or Excel files  Create attributes manually in attribute editor  Load attributes from web services  ID mapping though node attributes

Human Genetics ID Mapping Mapping identifiers from one source to another is a major challenge Multiple levels of IDs E.g. probe->gene ->peptide- >protein Cytoscape provides an ID mapping through the BioMart web service of EBI to convert the IDs Not perfect but sufficient Additional mapping mechanism underway

Human Genetics Cytoscape Workflow 1. Load Networks (Get network data into Cytoscape)‏ 2. Load Attributes (Get data about networks into Cytoscape)‏ 3. Analyze and Visualize Networks 4. Prepare for Publication

Human Genetics Visual Data Integration 1. Network Data 2. Attribute Data YDR382W pp YDL130W YDR382W pp YFL039C YFL039C pp YCL040W YFL039C pp YHR179W ExpressionValue YCL040W = YDL130W = YDR382W = YFL039C = YHR179W = VizMapper

Human Genetics VizMapper List of Data Attributes Default Visual Style Editor List of Visual Attributes Mapping definition List of Visual Styles

Human Genetics Types of mappings Continuous  Continuous Data mapped to Continuous Visual Attributes (e.g. gene expression levels mapped to node color)  Continuous Data mapped to Discrete Visual Attributes (e.g. p-value categories mapped to node shape) Discrete  Discrete (categorical) Data to Discrete Visual Attributes (e.g. GO annotation mapped to node shape)  Discrete Data mapped to Continuous Visual Attributes(e.g. multiple GO terms mapped to pie coloring)

Human Genetics Network Filtering

Human Genetics Several Layout Algorithms Spring-embedded Circular Hierarchical

Human Genetics Linkout Nodes and Edges act as hyperlinks to external databases. User-configurable URLs Collection of the biological results for the publication

Human Genetics Cytoscape Workflow 1. Load Networks (Get network data into Cytoscape)‏ 2. Load Attributes (Get data about networks into Cytoscape)‏ 3. Analyze and Visualize Networks 4. Prepare for Publication

Human Genetics Prepare for Publication Fine tune the Figures Manual Layout manipulation options (align, scale, rotate) Manually override visual styles –place labels, change colors, etc.

Human Genetics Finalizing the Figures Publication Quality Graphics in several formats  PDF, EPS, SVG, PNG, JPEG, and BMP Export Session to HTML for Web

Human Genetics Cytoscape: So what? The big Pro Cyto argument: EXTENSIBLE Plugins, Plugins, Plugins –In our case enabled extended array data analysis

Human Genetics Cytoscape is Extensible Cytoscape is open source and free software A plugin interface that allows any programmer to write their own extensions to Cytoscape Plugins represent the primary biological analysis mechanism in Cytoscape Plugins are distributed from a central Cytoscape database and can be installed while running Several dozens of plug-ins currently available (

Human Genetics Hello World Plugin

Human Genetics Extending the workflow through plugins Graph based integration and analysis of molecular biological data

Human Genetics Integrative Bioinformatics in our group Aggregate data: Affymetrix arrays –Tumor series –Public data –Experiments Manipulate celllines; Lentiviral library Search/Visualize/Selection: R2 –Statistical cutoffs –Correlations: R2 –Clinical data coupling Analysis/Feedback: R2 and Cytoscape –Known Interactions –Transcription Factor binding

Human Genetics External data sources Statistical analysis Perl module Cytoscape webstart AMC Plugin Canonical paths DB Patient data GEO arrays Algorithms Array data: Tumor and Experiments R2-array analysis interface Cytoscape interface HGServer Integrative Bioinformatics in our group

Human Genetics Array data analysis: R2 Mainly work by Jan Koster

Human Genetics R2 interface: Demo

Human Genetics R2 interface

Human Genetics R2 interface

Human Genetics R2 interface

Human Genetics R2 interface

Human Genetics R2 interface

Human Genetics Timeseries in R2 / Cytoscape (Demo)

Human Genetics Timeseries in R2

Human Genetics Timeseries in R2

Human Genetics Timeseries in R2 Integration with Cytoscape through webstart

Human Genetics Timeseries in Cytoscape: Visualization

Human Genetics Timeseries in Cytoscape: Aggregate data

Human Genetics Timeseries in Cytoscape: Search/Filter

Human Genetics Timeseries in Cytoscape: Filter

Human Genetics Timeseries in Cytoscape

Human Genetics Timeseries in Cytoscape

Human Genetics Tf (green) and partners (red)

Human Genetics Filtering

Human Genetics Filtering

Human Genetics Coloring, layout

Human Genetics Resuming: 1.Aggregate 2.Search/Visualize 3.Analyze/ Feedback Combine NOTCH3 knockout data with TF and PPi data Layout timeseries/Find downstream targets Identify MSX1/Knockout in new experiment

Human Genetics More Plugin Examples BiNGO (Enriched GO categories found in the sub-network) WikiPathways (Visualize curated pathways) MCODE (Putative protein complexes) GenePro (Protein-Protein interaction cluster visualization) jActiveModules (Search for significant sub-networks) NetworkAnalyzer (Statistical analysis of networks) Agilent Literature Search (Network creation) CyGoose (Gaggle communication) See for many more

Human Genetics Timeseries and BinGO: Aggregate

Human Genetics Timeseries and BinGO: Analyze

Human Genetics Timeseries and BinGO

Human Genetics Timeseries and BinGO

Human Genetics GOlorize plug-in (Pasteur) Node placement on the basis of both the connection structure (the edges) and the class structure (GO) A modification of the classic force-directed layout algorithm Beyond GO classes, other class information can be used though attributes (e.g. active modules, complexes)

Human Genetics GOlorize plug-in interface Default settings for the class attractive force and separation factor Class-directed network layout

Human Genetics Example: genetic interaction network Standard Spring-embedded layout algorithm in Cytoscape

Human Genetics Example: genetic interaction network Spring-embedded layout algorithm with GO colour-coding

Human Genetics Example: genetic interaction network Final results of the GOlorize layout algorithm in Cytoscape Garcia et al. Bioinformatics 2007

Human Genetics Find Network Clusters - MCODE Plugin Network clusters are highly interconnected sub-networks that may be also partly overlapping Clusters in a protein-protein interaction network have been shown to represent protein complexes and parts of biological pathways Clusters in a protein similarity network represent protein families Network clustering is available through the MCODE Cytoscape plugin

Human Genetics Network Clustering 7000 Yeast interactions among 3000 proteins

Human Genetics Bader & Hogue, BMC Bioinformatics (1):2

Human Genetics Proteasome 26S Proteasome 20S Ribosome RNA Pol core RNA Splicing Bader & Hogue, BMC Bioinformatics (1):2

Human Genetics Find Network Motifs - Netmatch plugin Network motif is a sub-network that occurs significantly more often than by chance alone Input: query and target networks, optional node/edge labels Output: topological query matches as subgraphs of target network Supports: subgraph matching, node/edge labels, label wildcards, approximate paths

Human Genetics Finding query sub-networks QueryResults Ferro et al. Bioinformatics 2007

Human Genetics Finding Signaling Pathways Potential signaling pathways from plasma membrane to nucleus via cytoplasm Raf-1 Mek MAPK TFs Nucleus - Growth Control Mitogenesis MAP Kinase Cascade Ras NetMatch query Shortest path between subgraph matches Signaling pathway example NetMatch Results

Human Genetics Find Active Subnetworks Active modules are sub-networks that show differential expression over user-specified conditions or time-points  Microarray gene-expression attributes  Mass-spectrometry protein abundance Method  Calculate z-score/node, Z A score/subgraph, correct for random expression data sampling  Score over multiple experimental conditions  Simulated annealing-based search method is used to find the high scoring networks Ideker T, Ozier O, Schwikowski B, Siegel AF Bioinformatics. 2002;18 Suppl 1:S233-40

Human Genetics Finding active modules Ideker T et al. Science 2001; Bioinformatics 2002 jActiveModules plug-in Input: interaction network and p-values for gene expression values over several conditions Output: significant sub- networks that show differential expression over one or several conditions

Human Genetics Cerebral: Cellular location and expression data

Human Genetics Concluding Cytoscape is a proven valuable tool for integrative bioinformatics Easily extensible: well suited to answer new biological research questions Analyses can be tedious for biologists; up to bioinformaticians to translate these in simple workflows Therefore: bioinformaticians, integrate into wet-lab research groups!

Human Genetics Some notes… Plugin lifetime –Maintenance –Interoperability Visualization issues… –Standard biologist layouts –Fancy visuals Cytoscape 3.0 aims to solve these issues (amongst others)

Human Genetics Availability Cytoscape: – R2 –Available shortly through –Keep yourself posted on