KDD Cup Task 2 Mark Craven Department of Biostatistics & Medical Informatics Department of Computer Sciences University of Wisconsin

Slides:



Advertisements
Similar presentations
Global POP An investigation of environmental pollutants in fish worldwide Version April 2007 International school campaign.
Advertisements

Molecular Systems Biology 3; Article number 140; doi: /msb
© 2002 The MITRE Corporation. ALL RIGHTS RESERVED. Co-Chair: Alexander Yeh, MITRE Corp. Data: FlyBase ( July 2002 KDD Cup 2002 Task1:
Comparison of Data Mining Algorithms on Bioinformatics Dataset Melissa K. Carroll Advisor: Sung-Hyuk Cha March 4, 2003.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
Global Mapping of the Yeast Genetic Interaction Network Tong et. al, Science, Feb 2004 Presented by Bowen Cui.
KDD-2001 Cup The Genomics Challenge Christos Hatzis, Silico Insights David Page, University of Wisconsin Co-chairs August 26, 2001 Special thanks: DuPont.
. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.
Course Code Course Title Credits BMS 500 Responsible Conduct of Research 1 BMS 510F Biochemistry and Cell Biology 10 BMS 512 Critical Thinking 1 BMS 523A.
© 2003 The MITRE Corporation. ALL RIGHTS RESERVED. MITRE Critical Assessment of Information Extraction Systems in Biology (BioCreAtIvE) Marc Colosimo Lynette.
CSE 591 (99689) Application of AI to molecular Biology (5:15 – 6: 30 PM, PSA 309) Instructor: Chitta Baral Office hours: Tuesday 2 to 5 PM.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Novel Modulators of Ah Receptor Signaling
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Computational Approaches in Epigenomics Guo-Cheng Yuan Department of Biostatistics and Computational Biology Dana-Farber Cancer Institute Harvard School.
Internet tools for genomic analysis: part 2
1 Intro & materials. 2 Overview Monday –MA experimental basic –MA data analysis –Introduction to lab 1 –lab 1 Tuesday –Introduction to lab 2 –lab 2 Bio-Informatic.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Genetics: From Genes to Genomes
GCB/CIS 535 Microarray Topics John Tobias November 15 th, 2004.
Epistasis Analysis Using Microarrays Chris Workman.
Pathway Informatics 6 th July, 2015 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services Health Sciences Library System University of.
---- Mark Borodovsky a short intro Position open: Scientist - Pathway Informatics (June 2009) THE POSITION The successful candidate will join the Computational.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Inferring Cellular Networks Using Probabilistic Graphical Models Jianlin Cheng, PhD University of Missouri 2009.
Three Challenges in Data Mining Anne Denton Department of Computer Science NDSU.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Extreme Re-balancing for SVMs: a case study Advisor :
Knowledgebase Creation & Systems Biology: A new prospect in discovery informatics S.Shriram, Siri Technologies (Cytogenomics), Bangalore S.Shriram, Siri.
3. Vertical Data LECTURE 3 Section 3 T he DataMIME™ System (DataMIME tm = Data Mining, NO NOISE) Internet DII (Data Integration Interface) Data Integration.
EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center.
JM - 1 Introduction to Bioinformatics: Lecture I An Overview of the Course Jarek Meller Jarek Meller Division of Biomedical Informatics,
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
PattArAn – From Annotation Triplets to Sentence Fingerprints Motivation Motivation  Scientific concepts are annotated with controlled vocabulary (CV)
Finish up array applications Move on to proteomics Protein microarrays.
Kansas State University Department of Computing and Information Sciences Kansas State University KDD Lab ( cDNA.
1 Bio-Trac 40 (Protein Bioinformatics) October 8, 2009 Zhang-Zhi Hu, M.D. Associate Professor Department of Oncology Department of Biochemistry and Molecular.
Identifying Candidate Pathways to Explain Phenotypes in Genome-Wide Mutant Screens Mark Craven Department of Biostatistics & Medical Informatics University.
Anti-Learning Adam Kowalczyk Statistical Machine Learning NICTA, Canberra 1 National ICT Australia Limited is funded and.
Ptree * -based Approach to Mining Gene Expression Data Fei Pan 1, Xin Hu 2, William Perrizo 1 1. Dept. Computer Science, 2. Dept. Pharmaceutical Science,
Do Now What Is Genetic Engineering? When have you heard this term before? Why do you think it is important? Do Now Objectives Unit Literacy Videos Focus.
Modeling of complex systems: what is relevant? Arno Knobbe, Marvin Meeng, Joost Kok Leiden Institute of Advanced Computer Science (LIACS)
Genomics for Librarians Stuart M. Brown, Ph.D. Director, Research Computing, NYU School of Medicine.
A Biology Primer Part IV: Gene networks and systems biology Vasileios Hatzivassiloglou University of Texas at Dallas.
Bioinformatics Core Facility Guglielmo Roma January 2011.
Systems Biology ___ Toward System-level Understanding of Biological Systems Hou-Haifeng.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Bioinformatics The application of computer technology to the management of biological information
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
AdvancedBioinformatics Biostatistics & Medical Informatics 776 Computer Sciences 776 Spring 2002 Mark Craven Dept. of Biostatistics & Medical Informatics.
Parsing A Bacterial Genome Mark Craven Department of Biostatistics & Medical Informatics University of Wisconsin U.S.A.
Nuria Lopez-Bigas Methods and tools in functional genomics (microarrays) BCO17.
By: Amira Djebbari and John Quackenbush BMC Systems Biology 2008, 2: 57 Presented by: Garron Wright April 20, 2009 CSCE 582.
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Bioinformatics and Computational Biology
Extracting binary signals from microarray time-course data Debashis Sahoo 1, David L. Dill 2, Rob Tibshirani 3 and Sylvia K. Plevritis 4 1 Department of.
Introduction to biological molecular networks
Annotating Gene List From Literature Xin He Department of Computer Science UIUC.
Development of a Signaling Pathway Map for the FXM Gil Sambrano, Lily Jiang, Madhu Natarajan, Alex Gilman, Adam Arkin University of California San Francisco,
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
1 Genomics Advances in 1990 ’ s Gene –Expressed sequence tag (EST) –Sequence database Information –Public accessible –Browser-based, user-friendly bioinformatics.
© Telstra Corporation Limited 2002 Research Laboratories KDD Cup 2002: Single Class SVM for Yeast Gene Regulation Prediction Adam Kowalczyk Bhavani Raskutti.
Computer Science and Engineering PhD in Computer Science Monday, November 07, :00 a.m. – 11:00 a.m. Swearingen Conference Room 3A75 Network Based.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
BIO409/509 Cell and Molecular Biology. SECOND Methods paper assignment due Wed., 4/20 (you don’t do this assignment if you are in the 4H STEM Ambassador.
 Facilities Open House Functional Genomics Facility Molishree Joshi, Ph.D. 6/1/2015 Contact Information:
Visualization of Content Information in Networks using GlyphNet
Regulation of the Immune Response by the Aryl Hydrocarbon Receptor
Anti-Learning Adam Kowalczyk Statistical Machine Learning
Presentation transcript:

KDD Cup Task 2 Mark Craven Department of Biostatistics & Medical Informatics Department of Computer Sciences University of Wisconsin

Task Motivation molecular biology has entered a new era in which experimentation can be done in a high-throughput manner –microarrays can simultaneously measure the “activity” of thousands of genes under some set of conditions –yeast deletion arrays can measure the activity of some “reporter” system when each of ~5k genes is knocked out key problem: it is difficult for biologists to assimilate and interpret thousands of measurements per experiment

The Problem Domain: Characterizing the Regulatome of the AHR Signaling Pathway experimental data kindly provided by Guang Yao and Prof. Chris Bradfield McArdle Laboratory for Cancer Research University of Wisconsin the Aryl Hydrocarbon Receptor (AHR) is a member of the protein family that mediates the biological response to dioxin, hypoxia, circadian rhythm, etc. focus of project: determine which proteins affect the activity of AHR

The AHR Signaling Pathway NUCLEUS CYTOSOL Hsp90 AHR AHR ligand (e.g.dioxin) ARNT DRE TNGCGTG ARNT AHR Gene Expression (P450 etc.) AHR ARNT AHR ARA9 when a cell is exposed to say, dioxin, AHR acts to turn on/off various genes experiment motivation: which proteins (gene products) in the cell regulate how AHR does this?

Characterizing the Regulatome of the AHR Signaling Pathway a high-throughput experiment using the Yeast Deletion Array (~5k strains of yeast, each with a specified gene knocked out) for each strain –insert a specially engineered AHR gene –insert a “reporter” system that is activated by AHR signaling –prod the AHR signaling pathway with a dose of agonist –see if the reporter lights up result: we can see which genes encode proteins that affect AHR signaling

The KDD Cup Task key computational task : help annotate/explain the results of the experiment, using available data sources a proxy task for KDD Cup: develop models that can predict the experimental result for a given gene from available data sources rationale: –annotation/explanation task not amenable to objective evaluation –prediction task, like annotation/explanation task, involves eliciting patterns from available data that explain why individual genes behave as they do in the experiment

The KDD Cup Task given: data describing a gene –hierarchical (functional/localization annotation) –relational (protein-protein interactions) –text (scientific abstracts from MEDLINE) do: predict if knocking out the gene will have a significant effect on AHR signaling

Characteristics of the Problem rich data sources much missing data –function/localization annotations –protein-protein interactions –abstracts few positive instances (127 pos, 4380 neg) very “disjunctive”

Task Evaluation evaluated as a two-class problem –positive: knockout has significant effect on AHR signaling but two different definitions of positive class –narrow: knockout has an AHR-specific effect –broad: knockout also affects a control pathway the scoring metric was the sum of the area under the ROC curve (AROC) for the two class partitions

AROC Scores for All Teams

Task 2 Winning Teams winner  Adam Kowalczyk and Bhavani Raskutti Telstra Research Laboratories honorable mention  David Vogel and Randy Axelrod A.I. Insight Inc. and Sentara Healthcare  Marcus Denecke, Mark-A. Krogel, Marco Landwehr and Tobias Scheffer Magdeburg University  George Forman Hewlett Packard Labs  Amal Perera, Bill Jockheck, Willy Valdivia Granda, Anne Denton, Pratap Kotala and William Perrizo North Dakota State University

Current and Future Activity figure out what lessons have been learned –value of text? –which algorithms learned most accurate models? –etc. determine if learned models can provide insight into the domain write articles (task overview, descriptions of winning teams’ methods) for SIGKDD Explorations maintain public access to data set (do Google search on KDD Cup)

Acknowledgements the experimental data was generated by Guang Yao and Prof. Chris Bradfield McArdle Laboratory for Cancer Research University of Wisconsin