Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,

Slides:



Advertisements
Similar presentations
Computational Revision of Ecological Process Models
Advertisements

Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, CA 94304
Pat Langley Dileep George Stephen Bay Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford,
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Pat Langley Institute for the Study of Learning and Expertise Javier Sanchez CSLI / Stanford University Ljupco Todorovski Saso Dzeroski Jozef Stefan Institute.
Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,
Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,
Pat Langley Institute for the Study of Learning and Expertise and Center for the Study of Language and Information Stanford University, Stanford, California.
Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California
Pat Langley Center for the Study of Language and Information Stanford University, Stanford, California
Filtering Information in Complex Temporal Domains
Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,
Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona Computational Discovery.
Pat Langley Center for the Study of Language and Information Stanford University, Stanford, California
Pat Langley Center for the Study of Language and Information Stanford University, Stanford, California
Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,
Pat Langley Institute for the Study of Learning and Expertise 2164 Staunton Court, Palo Alto, California and School of Computing and Informatics Arizona.
Javier Sanchez Pat Langley Computational Learning Laboratory Center for the Study of Language and Information Stanford University, Stanford, California.
Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,
Data Mining vs. Statistics
Virtual University - Human Computer Interaction 1 © Imran Hussain | UMT Imran Hussain University of Management and Technology (UMT) Lecture 20 User Research.
Lesson Overview 1.1 What Is Science?.
Understanding the Research Process
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Pat Langley School of Computing and Informatics Arizona State University Tempe, Arizona Institute for the Study of Learning and Expertise Palo Alto, California.
Experimental Evaluation in Computer Science: A Quantitative Study Paul Lukowicz, Ernst A. Heinz, Lutz Prechelt and Walter F. Tichy Journal of Systems and.
Chapter 1 Conducting & Reading Research Baumgartner et al Chapter 1 Nature and Purpose of Research.
Machine Creativity. Outline BackgroundBackground –The problem and its importance. –The known algorithms and systems. Summary of the Creativity Machine.
Data Mining – Intro.
CHAPTER 3 RESEARCH TRADITIONS.
Formulating objectives, general and specific
Research method2 Dr Majed El- Farra 1 Research methods Second meeting.
Virginia Standard of Learning BIO.1a-m
Thanks to K. Arrigo, G. Bradshaw, S. Borrett, W. Bridewell, S. Dzeroski, H. Simon, L. Todorovski, and J. Zytkow for their contributions to this research,
Unit 2: Engineering Design Process
Knowledge representation
Research !!.  Philosophy The foundation of human knowledge A search for a general understanding of values and reality by chiefly speculative rather thanobservational.
Class Starter Please list the first five words or phrases that come to your mind when you hear the word : CHEMISTRY.
Taxonomies and Laws Lecture 10. Taxonomies and Laws Taxonomies enumerate scientifically relevant classes and organize them into a hierarchical structure,
Discovering Dynamic Models Lecture 21. Dynamic Models: Introduction Dynamic models can describe how variables change over time or explain variation by.
Pat Langley Adam Arvay Department of Computer Science University of Auckland Auckland, NZ Heuristic Induction of Rate-Based Process Models Thanks to W.
Thanks to G. Bradshaw, W. Bridewell, S. Dzeroski, H. A. Simon, L. Todorovski, R. Valdes-Perez, and J. Zytkow for discussions that led to many of these.
Introduction to Science Informatics Lecture 1. What Is Science? a dependence on external verification; an expectation of reproducible results; a focus.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Developing and Evaluating Theories of Behavior.
Discovering Descriptive Knowledge Lecture 18. Descriptive Knowledge in Science In an earlier lecture, we introduced the representation and use of taxonomies.
The Major Steps of a Public Health Evaluation 1. Engage Stakeholders 2. Describe the program 3. Focus on the evaluation design 4. Gather credible evidence.
Lesson Overview Lesson Overview What Is Science? Lesson Overview 1.1 What Is Science?
The E ngineering Design Process Foundations of Technology The E ngineering Design Process © 2013 International Technology and Engineering Educators Association,
Dendral: A Case Study Lecture 25.
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
Lesson Overview Lesson Overview What Is Science? Lesson Overview 1.1 What Is Science?
© 2010 Pearson Addison-Wesley. All rights reserved. Addison Wesley is an imprint of Designing the User Interface: Strategies for Effective Human-Computer.
What is Science? Chapter 1, Lesson 1. Using one or more of your senses and tools to gather information. observing.
Research for Nurses: Methods and Interpretation Chapter 1 What is research? What is nursing research? What are the goals of Nursing research?
Lesson Overview Lesson Overview What Is Science? Lesson Overview 1.1 What Is Science?
Artificial Intelligence: Research and Collaborative Possibilities a presentation by: Dr. Ernest L. McDuffie, Assistant Professor Department of Computer.
Chapter 1: Section 1 What is Science?. What Science IS and IS NOT.. The goal of Science is to investigate and understand the natural world, to explain.
Helpful hints for planning your Wednesday investigation.
WHAT IS RESEARCH? According to Redman and Morry,
Chapter 1 Introduction to Research in Psychology.
Expert System / Knowledge-based System Dr. Ahmed Elfaig 1.ES can be defined as computer application program that makes decision or solves problem in a.
Ecological Interface Design Overview Park Young Ho Dept. of Nuclear & Quantum Engineering Korea Advanced Institute of Science and Technology May
Classification of models
Lee, Jung-Woo Interdisciplinary Program in Cognitive Science
Pat Langley Department of Computer Science University of Auckland
Artificial Intelligence introduction(2)
Data Warehousing and Data Mining
3.1.1 Introduction to Machine Learning
Discovery Informatics
Presentation transcript:

Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University, Stanford, California Lesssons for the Computational Discovery of Scientific Knowledge Thanks to S. Bay, V. Brooks, S. Klooster, A. Pohorille, C. Potter, K. Saito, J. Shrager, M. Schwabacher, and A. Torregrosa.

Outline of the Talk 1. History of machine learning applications 2. Traditional lessons from applied machine learning 3. History of computational scientific discovery 4. Two application efforts in scientific discovery 5. Lessons from these application efforts 6. Directions for future research

History of Machine Learning Applications Early 1980s: D. Michie et al. champion use of decision-tree induction on industrial problems. Early 1980s: D. Michie et al. champion use of decision-tree induction on industrial problems. During 1980s: Parallel application developments in neural networks and case-based learning. During 1980s: Parallel application developments in neural networks and case-based learning. Early 1990s: Initial reviews of machine learning applications. Early 1990s: Initial reviews of machine learning applications. Mid 1993: First workshops on applications of machine learning. Mid 1993: First workshops on applications of machine learning. Mid 1995: CACM paper analyzes factors underlying success. Mid 1995: CACM paper analyzes factors underlying success. Mid 1995: KDD conference becomes the default meeting for papers on machine learning applications. Mid 1995: KDD conference becomes the default meeting for papers on machine learning applications. Early 1998: Special issue of Machine Learning, with editorial, on applications. Early 1998: Special issue of Machine Learning, with editorial, on applications.

Steps in the Application of Machine Learning Formulating the Problem Engineering the Representation Collecting and Preparing Data Evaluating the Learned Knowledge Gaining User Acceptance Induction Process

Areas of Machine Learning Applications data mining for classification/regression tasks data mining for classification/regression tasks empirical natural language processing empirical natural language processing applied reinforcement learning applied reinforcement learning adaptive interfaces for personalized services adaptive interfaces for personalized services computational scientific discovery computational scientific discovery There exist a number of application movements within the field of machine learning: These types of applications differ in the demands they make and in the issues they raise.

Data Mining vs. Scientific Discovery Data mining generates knowledge cast as decision trees, logical rules, or other notations invented by AI researchers; Data mining generates knowledge cast as decision trees, logical rules, or other notations invented by AI researchers; Computational scientific discovery instead uses equations, structural models, reaction pathways, or other formalisms invented by scientists and engineers. Computational scientific discovery instead uses equations, structural models, reaction pathways, or other formalisms invented by scientists and engineers. There exist two computational paradigms for discovering explicit knowledge from data: Both approaches draw on heuristic search to find regularities in data, but they differ considerably in their emphases.

History of Research on Computational Scientific Discovery Bacon.1–Bacon.5 Abacus, Coper Fahrehneit, E*, Tetrad, IDS N Hume, ARC DST, GP N LaGrange SDS SSF, RF5, LaGramge Dalton, Stahl RL, Progol Gell-Mann BR-3, Mendel Pauli Stahlp, Revolver Dendral AM GlauberNGlauber IDS Q, Live IE Coast, Phineas, AbE, Kekada Mechem, CDP Astra, GP M HR BR-4 Numeric lawsQualitative lawsStructural modelsProcess models Legend

Successes of Computational Scientific Discovery Over the past decade, systems of this type have helped discover new knowledge in many scientific fields: stellar taxonomies from infrared spectra (Cheeseman et al., 1989)stellar taxonomies from infrared spectra (Cheeseman et al., 1989) qualitative chemical factors in mutagenesis (King et al., 1996)qualitative chemical factors in mutagenesis (King et al., 1996) quantitative laws of metallic behavior (Sleeman et al., 1997)quantitative laws of metallic behavior (Sleeman et al., 1997) qualitative conjectures in number theory (Colton et al., 2000)qualitative conjectures in number theory (Colton et al., 2000) temporal laws of ecological behavior (Todorovski et al., 2000)temporal laws of ecological behavior (Todorovski et al., 2000) reaction pathways in catalytic chemistry (Valdes-Perez, 1994, 1997)reaction pathways in catalytic chemistry (Valdes-Perez, 1994, 1997) Each of these has led to publications in the refereed literature of the relevant scientific field (see Langley, 2000).

Steps in Applying Computational Scientific Discovery problem formulation representation engineering data collection/ manipulation algorithm manipulation filtering and interpretation algorithm invocation

Two Applications for Scientific Discovery Data on climate variables and carbon production over space and time A model of the Earths ecosystem that fits and explains these data Gene expression levels, over time, for wild and mutant organisms. A model of gene regulation that fits and explains these data Given GivenFind Find

Lesson 1 NPPc = month max (E · IPAR, 0) E = 0.56 · T1 · T2 · W T1 = · Topt – · Topt 2 T2 = 1.18 / [(1 + e 0.2 · (Topt – Tempc – 10) ) · (1 + e 0.3 · (Tempc – Topt – 10) )] W = · EET / PET PET = 1.6 · (10 · Tempc / AHI) A · PET-TW-M if Tempc > 0 PET = 0 if Tempc < 0 A = · AHI 3 – · AHI · AHI IPAR = 0.5 · FPAR-FAS · Monthly-Solar · Sol-Conver FPAR-FAS = min [(SR-FAS – 1.08) / SR (UMD-VEG), 0.95] SR-FAS = (Mon-FAS-NDVI ) / (Mon-FAS-NDVI – 1000) DFR NBLANBLR RRPhoto PBS Health psbA1 psbA2 cpcB Light + Traditional notations from machine learning are not communicated easily to domain scientists. Ecosystem model Gene regulation model

m Lesson 2 Scientists often have initial models that should influence the discovery process. Discovery Initial model DFR NBLANBLR RRPhoto PBS Health psbA1 psbA2 cpcB Light + Observations Revised model × DFR NBLANBLR RRPhoto PBS Health psbA1 psbA2 cpcB Light + ×

Lesson 3 Scientific data are often rare and difficult to obtain rather than being plentiful. Ecosystem model Gene regulation model Number of variables Number of initial links Number of initial links Number of possible links Number of possible links Number of samples Number of samples Number of variables Number of equations Number of parameters Number of samples

Lesson 4 Scientists want models that move beyond description to provide explanations of their data. Ecosystem model Gene regulation model DFR NBLANBLR RRPhoto PBS Health psbA1 psbA2 cpcB Light + NPPc IPAR PET T1T2We_max E EET Tempc Topt NDVI SOLAR AHI A PETTWM SR FPAR VEG

Lesson 5 Scientists want computational assistance rather than automated discovery systems. Discovery Initial model DFR NBLANBLR RRPhoto PBS Health psbA1 psbA2 cpcB Light + Observations Revised model × DFR NBLANBLR RRPhoto PBS Health psbA1 psbA2 cpcB Light + ×

An Environment for Interactive Modeling specify process models of static and dynamic systems; specify process models of static and dynamic systems; display and edit a models structure and details graphically; display and edit a models structure and details graphically; utilize a model to simulate a systems behavior over time; utilize a model to simulate a systems behavior over time; incorporate background knowledge cast as generic processes; incorporate background knowledge cast as generic processes; indicate which processes to consider during model revision; indicate which processes to consider during model revision; invoke a revision module that improves a models fit to data. invoke a revision module that improves a models fit to data. In response, we are developing an environment that lets users: The current environment focuses on quantitative processes, but future versions will also support qualitative models.

A Process Model for Carbon Production model npp; variables NPPc, E, IPAR, T1, T2, W, Topt, tempc, eet, PET, PETTWM, ahi, A, FPARFAS, monthlySolar, SolConver, MONFASNDVI, umd_veg; observable ahi,eet,tempc,Topt,MONFASNDVI,monthlySolar,PETTWM,umd_veg; process CarbonProd; equations NPPc = E * IPAR; process PhotoEfficiency; equations E = (0.389 * (T1 * (T2 * W))); process TempStress1; equations T1 = (0.8 + ((0.02 * Topt) - ( * (Topt ^ 2)))); process TempStress2; equations T2 = (( / (1 + ( ^ (0.2 * (Topt tempc))))) / (1 + ( ^ (0.3 * (tempc Topt))))); process WaterStress; conditions PET!=0; equations W = (0.5 + (0.5 * (eet / PET))); process WSNoEvapoTrans; conditions PET==0; equations W = 0.5; process EvapoTrans; conditions tempc>0; equations PET = 1.6 * (10 * tempc / ahi) ^ A * PETTWM;

Viewing and Editing a Process Model

Directions for Future Research methods for discovering knowledge in scientific formalisms methods for discovering knowledge in scientific formalisms techniques for revising existing scientific models techniques for revising existing scientific models approaches to dealing with small data sets approaches to dealing with small data sets algorithms for discovering explanatory models algorithms for discovering explanatory models interactive environments for scientific knowledge discovery interactive environments for scientific knowledge discovery These lessons suggest the field needs increased research on: Taken together, these emphases should address the needs of domain scientists and produce interesting new methods.

In Memoriam Herbert A. Simon (1916 – 2001) Herbert A. Simon (1916 – 2001) Jan M. Zytkow (1945 – 2001) Jan M. Zytkow (1945 – 2001) Early last year, computational scientific discovery lost two of its founding fathers: Both contributed to the field in many ways: posing new problems, inventing methods, training students, and organizing meetings. Moreover, both were interdisciplinary researchers who contributed to computer science, psychology, philosophy, and statistics. Herb Simon and Jan Zytkow were excellent role models that we should all aim to emulate.

The NPPc Portion of CASA NPPc = month max (E · IPAR, 0) E = 0.56 · T1 · T2 · W E = 0.56 · T1 · T2 · W T1 = · Topt – · Topt 2 T1 = · Topt – · Topt 2 T2 = 1.18 / [(1 + e 0.2 · (Topt – Tempc – 10) ) · (1 + e 0.3 · (Tempc – Topt – 10) )] T2 = 1.18 / [(1 + e 0.2 · (Topt – Tempc – 10) ) · (1 + e 0.3 · (Tempc – Topt – 10) )] W = · EET / PET W = · EET / PET PET = 1.6 · (10 · Tempc / AHI) A · PET-TW-M if Tempc > 0 PET = 1.6 · (10 · Tempc / AHI) A · PET-TW-M if Tempc > 0 PET = 0 if Tempc < 0 PET = 0 if Tempc < 0 A = · AHI 3 – · AHI · AHI A = · AHI 3 – · AHI · AHI IPAR = 0.5 · FPAR-FAS · Monthly-Solar · Sol-Conver IPAR = 0.5 · FPAR-FAS · Monthly-Solar · Sol-Conver FPAR-FAS = min [(SR-FAS – 1.08) / SR (UMD-VEG), 0.95] FPAR-FAS = min [(SR-FAS – 1.08) / SR (UMD-VEG), 0.95] SR-FAS = (Mon-FAS-NDVI ) / (Mon-FAS-NDVI – 1000) SR-FAS = (Mon-FAS-NDVI ) / (Mon-FAS-NDVI – 1000)

The NPPc Portion of CASA NPPc IPAR PET T1T2We_max E EET Tempc Topt NDVI SOLAR AHI A PET TWM SR FPAR VEG

How do plants modify their photosynthetic apparatus in high light? A Model of Photosynthesis Regulation DFR NBLANBLR RRPhoto PBS Health psbA1 psbA2 cpcB Light +