Pathway analysis Daniel Hurley Pathway analysis: summary A popular buzzword… but what does it mean? A popular buzzword… but what does it mean? How do.

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
DT Coursework By D. Henwood.
Introduction to statistics. Statistics n Plays an important role in many facets of human endeavour n Occurs remarkably frequently in our everyday lives.
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
OCR GCSE Humanities Get Ahead - improving delivery and assessment of Unit 3 Unit B033 Controlled Assessment Approaches to Preparing Candidates for the.
Power Analysis for Correlation & Multiple Regression Sample Size & multiple regression Subject-to-variable ratios Stability of correlation values Useful.
Biological Interpretation of Microarray Data Helen Lockstone DTC Bioinformatics Course 9 th February 2010.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 5): Outliers Fall, 2008.
1 Monte Carlo methods Mike Sinclair. 2 Overview Monte Carlo –Based on roulette wheel probabilities –Used to describe large-scale interactions in biology.
Chapter 11: understanding randomness (Simulations)
Reflective practice Session 4 – Working together.
ACT Question Analysis and Strategies for Science Presentation A.
© Curriculum Foundation1 Section 2 The nature of the assessment task Section 2 The nature of the assessment task There are three key questions: What are.
Blueprint of Life Topic 9: Pedigrees
Copyright © 2010 Pearson Education, Inc. Unit 3: Gathering Data Chapter 11 Understanding Randomness.
How to Critically Review an Article
N318b Winter 2002 Nursing Statistics Specific statistical tests: Correlation Lecture 10.
Duane Theobald Something to Consider… Have you seen something like this before? Does it make sense?
Microsoft ® Office Access ™ 2007 Training Choose between Access and Excel ICT Staff Development presents:
Choose between Access and Excel Right questions, right program If you’re having trouble choosing between Access and Excel, take a moment to answer an important.
Gene Set Enrichment Analysis (GSEA)
Developing Business Practice –302LON Using data in your studies Unit: 5 Knowledgecast: 2.
EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center.
Introduction To Biological Research. Step-by-step analysis of biological data The statistical analysis of a biological experiment may be broken down into.
1 Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
Networks and Interactions Boo Virk v1.0.
 Is there a comparison? ◦ Are the groups really comparable?  Are the differences being reported real? ◦ Are they worth reporting? ◦ How much confidence.
Course on Data Analysis and Interpretation P Presented by B. Unmar Sponsored by GGSU PART 2 Date: 5 July
A-ing You have two minutes to write a hypothesis about the difference between girls and boys You need to be able to test it by asking other pupils in your.
Basic features for portal users. Agenda - Basic features Overview –features and navigation Browsing data –Files and Samples Gene Summary pages Performing.
Fission Yeast Computing Workshop -1- Searching, querying, browsing downloading and analysing data using PomBase Basic PomBase Features Gene Page Overview.
Company LOGO Selecting Title and Writing Abstract of Manuscript.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 10, Slide 1 Chapter 10 Understanding Randomness.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 11 Understanding Randomness.
Presenting Results Laura Biggins v1.0 1.
Literacy I can recall main info, know where to look for it, make inferences linked to evidence, show awareness of characters’ intentions, adapt speech.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Close Reading Intermediate 2. Time The Close Reading exam paper lasts for one hour. (Date and time for 2011: Friday 13 May, 1.00pm to 2.00pm.) NAB: Friday.
GUI GoMiner and High-Throughput GoMiner Analysis of Alternative Splice Variants Barry Zeeberg, Ari Kahn, Michael Ryan, David Kane, Curtis Jamison, Hongfang.
Carrying out a statistics investigation. A process.
Analysis of GEO datasets using GEO2R Parthav Jailwala CCR Collaborative Bioinformatics Resource CCR/NCI/NIH.
SCIENTIFIC METHOD CA STATE STANDARD 8.
Gene Expression Omnibus (GEO)
Scientific Method. Ask a question Ask a question.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
BY DR. HAMZA ABDULGHANI MBBS,DPHC,ABFM,FRCGP (UK), Diploma MedED(UK) Associate Professor DEPT. OF MEDICAL EDUCATION COLLEGE OF MEDICINE June 2012 Writing.
Chapter 18: Presentation of Information (Unit 3 – Marketing)
CuffDiff ran successfully. Output files include gene_exp.diff What are the next steps? Use Navigation bar to find files; they may be under DNA Subway if.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Chapter 10 Understanding Randomness. Why Be Random? What is it about chance outcomes being random that makes random selection seem fair? Two things: –
How to structure good history writing Always put an introduction which explains what you are going to talk about. Always put a conclusion which summarises.
Title: Assign Pathways to Gene Set June 21, 2007 Guanming Wu.
How to answer the American West exam paper Edexcel.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Understanding Statistics © Curriculum Press 2003     H0H0 H1H1.
URBDP 591 A Lecture 16: Research Validity and Replication Objectives Guidelines for Writing Final Paper Statistical Conclusion Validity Montecarlo Simulation/Randomization.
Gene Set Analysis using R and Bioconductor Daniel Gusenleitner
SRI International Bioinformatics Selected PathoLogic Refining Tasks Creation of Protein Complexes Assignment of Modified Proteins Operon Prediction.
Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.
Exploring and Presenting Results
The more difficult topics
CS 641 – Requirements Engineering
CS 641 – Requirements Engineering
Understanding Randomness
Understanding Randomness
Large Scale Data Integration
A bioinformatic analysis of microRNAs role in osteoarthritis
Gene Expression Omnibus (GEO)
Introduction. Conducting statistical investigations to develop learner statistical thinking.
Presentation transcript:

Pathway analysis Daniel Hurley

Pathway analysis: summary A popular buzzword… but what does it mean? A popular buzzword… but what does it mean? How do you do it? How do you interpret the results?

First

What we mean by ‘pathway analysis’ A ‘pathway’ implies causation, but don’t be fooled!

What we mean by ‘pathway analysis’ A ‘pathway’ implies causation, but don’t be fooled! Most ‘pathway analysis’ actually identifies groups of functionally similar transcripts. Louis’ example: (

A useful paper…. What we mean by ‘pathway analysis’ But the conclusion is: lots of tools, some quite different approaches!

Pathway analysis tools like GATHER, DAVID, and GeneSetDB typically rely on enrichment analyses to tell us things. This set of techniques asks the question ‘of this set of genes, how many share any particular function, and is that more than we would expect by chance?’ Example: the top 200 most differentially-expressed genes by some ranking (e.g. adjusted p-value) Determination of ‘by chance’ is usually done using a permutation (= Monte Carlo) approach Other ‘pathway analyses’ involve signatures of groups of transcripts (e.g. using Principal Component Analysis) What we mean by ‘pathway analysis’

But what do we mean by a ‘function’? Lots of things: Protein function Hypothetical protein function Chromosomal location Metabolic pathway association Disease association

Daunting

The key point

Pathway analysis can identify common features present in a group of transcripts What the output means depends on the specific biology under study No such thing really as a ‘general’ pathway analysis A good place to start is by finding papers relevant to the specific biology What we mean by ‘pathway analysis’

What can you do with it? Some answers: Get a general picture of the active functions in a condition (vs. control) Investigate whether a particular function is active in a condition Differentiate conditions by their active functions Investigate the functions associated with a particular gene Identify conditions with similar functions

Next

Begin with a list of transcripts of interest Pathway analysis: how you do it

Choose a web-based tool: GATHER, DAVID and GeneSetDB are good ones to start Pathway analysis: how you do it But Pathguide.org has 325 pathway links at last count

Enter the list of transcripts: with most tools, you will either paste in gene names or identifiers, or upload a file Pathway analysis: how you do it

Finally

Basic tools will produce ranked lists of the most ‘enriched’ categories: Pathway analysis: interpreting results GATHER

More sophisticated ones will produce ‘network’ diagrams Pathway analysis: interpreting results DAVID Ingenuity Pathways Analysis But the interpretation of these is rather subjective

Summary Pathway analysis should probably be called information enrichment analysis – a more accurate term Used prudently, it is a useful tool for exploring the functional landscape of an experiment To make it meaningful, you need to interpret the results in the context of the specific biology under study There are a lot of web-based tools; start with one which is current and produces a result you value To start, you need a set of (transcripts) of interest To present the results, you can use a simple table, or a more complex ‘network’ diagram Risk: false-positives are very difficult to identify, and with enough data you can link any molecular species to any other species

FIN Any questions?