Download presentation
Presentation is loading. Please wait.
Published byTodd O’Connor’ Modified over 9 years ago
1
Improving PPI Networks with Correlated Gene Expression Data Jesse Walsh
2
Outline Background Goal Introduce previous author – Their hypothesis in 30 words or less – Why does my analysis still have merit? – Benefits of having something to compare results to The data – DIP – M3D Methods – Gene Expression Distance – Choice of expression data : Reference Experiment – DIP data Multiple sets…. [Small scale, High Thr, Random] Necessary to convert these protein ID’s into gene ID’s Blast sequences against the genome Problem: Many proteins demerged. *Example* Mapped what was left… – Analysis Plotted expression distance of remaining interactions – Plots do not match expectations – What happened? Strongly biased removal of demerged proteins… left me with mostly self interactions This issue is not addressed in the reference paper Results – Well….. None? – Original questions: inappropriate – What this analysis would tell me is: Future Work – Better Data? – Cluster?
3
Background PPI networks are currently derived either computationally or experimentally It is well known that there are a great number of false positives and false negatives in computationally derived networks High-throughput – Two major published yeast PPI experiments showed only ≈150 similar interactions out of thousands [1]
4
Goal: Improve the Data Quality The goal of this study is to improve the quality of computationally predicted protein-protein interactions Hypothesis: – Proteins that interact may also have similar expression patterns – Gene coexpression is correlated to PPIs
5
Previous Work Deane et al. (2001) [2] – Proposed EPR metric: use gene expression profiles to assess the quality of computationally predicted protein-protein interactions Figure adapted from Deane et al. [2]
6
Glimpse at the Data Genome SizeInteractionsProteins E. Coli (2008)4.6 million bp74471863 Yeast (2001)--80634150 Yeast (2008)12.5 million bp184404943 DIP (Database of Interacting Proteins) http://dip.doe-mbi.ucla.edu/dip/Main.cgi [3] Interaction Data: DIP dataset statistics M3D (Many Microbe Microarrays Database) http://m3d.bu.edu/cgi-bin/web/array/index.pl?section=home [4] Affymetrix Expression Data: M3D (Many Microbe Microarrays Database) Number of Experiments: 466 Number of Chips: 907 Genes: 4298
7
Expression Data Selection Concerned about complications from adding to many expression conditions – Knockouts, over-expression, foreign genes Selected a group of 20 conditions that were published as a part of the same experiment – Hope for more homogenous data – Allen et al. [5]
8
Method Gene Expression Distance Expression Distance given by: Summed over 20 conditions – Treated the first condition (wild-type anaerobic) as the reference condition for lack of a true control Expression Distance equation from Deane et al. [2]
9
Method DIP Data DIP labels as ‘core’ or ‘non-core’ – Corresponds roughly to small scale experiments and high-throughput experiments 3 Interaction Datasets – Core Core interactions – Non Non-core interactions – Rand 100,000 random interactions were created
10
Method Mapping DIP interaction set used uniprot protein identifiers, while M3D used gene ids Ran a blast of protein sequences against translated E. coli genome to map the datasets together Lost most of my data on this step Number of Interactions Mapping Available CORE991220 NON5999903 RAND100,000
11
Results Bin size =.05 Density distribution of squared distances
12
Results from Deane et al. [2] Figure adapted from Deane et al. [2] Bin size = 1.25
13
Results from Deane et al. [2] Figure adapted from Deane et al. [2] Least Squares Factorization
14
Discussion Mapping/Demerging problem – Kept 1044 of my 6991 interactions (15%) Case study P22885 – Obsolete since 2005 – Demerged to P0A8P6 and P0A8P7 Tyrosine recombinase xerC All three have a perfect ClustalW match
15
Discussion Shape of curve – Multimers and proteins that link to themselves in the PPI network (the zeros problem) – 66.6% of Core, 43.6% of Non, <0.1% of Rand
16
Conclusion Cannot predict novel interactions Cannot assign confidence values to individual interactions Can provide some measure of the overall quality of a PPI dataset
17
Thank You References: [1] Deeds EJ, Ashenberg O, Shakhnovich EI. “A simple physical model for scaling in protein-protein interaction networks.” Proc. Natl Acad. Sci. USA (2006) 103:311–316 [2] Charlotte M. Deane et al. “Protein Interactions: Two Methods for Assessment of the Reliability of High Throughput Observations.” Molecular & Cellular Proteomics 1.5 349-356 [3] Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D (2004) The Database of Interacting Proteins: 2004 update. NAR 32 Database issue:D449-51 [4] Faith JJ, Driscoll ME, Fusaro VA, Cosgrove EJ, Hayete B, Juhn FS, Schneider SJ, and Gardner TS. Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured experimental metadata. Nucleic Acids Research [5] Timothy e. Allen et al. “Genome-Scale Analysis of the Uses of the Escherichia coli Genome: Model-Driven Analysis of Heterogeneous Data Sets.” J Bacteriol. 2003 November, 185(21): 6392- 6399
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.