Download presentation
Presentation is loading. Please wait.
Published byMadeline Williams Modified over 10 years ago
1
Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University, Stanford, California http://www.isle.org/~langley langley@isle.org Lesssons for the Computational Discovery of Scientific Knowledge Thanks to S. Bay, V. Brooks, S. Klooster, A. Pohorille, C. Potter, K. Saito, J. Shrager, M. Schwabacher, and A. Torregrosa.
2
Outline of the Talk 1. History of machine learning applications 2. Traditional lessons from applied machine learning 3. History of computational scientific discovery 4. Two application efforts in scientific discovery 5. Lessons from these application efforts 6. Directions for future research
3
History of Machine Learning Applications Early 1980s: D. Michie et al. champion use of decision-tree induction on industrial problems. Early 1980s: D. Michie et al. champion use of decision-tree induction on industrial problems. During 1980s: Parallel application developments in neural networks and case-based learning. During 1980s: Parallel application developments in neural networks and case-based learning. Early 1990s: Initial reviews of machine learning applications. Early 1990s: Initial reviews of machine learning applications. Mid 1993: First workshops on applications of machine learning. Mid 1993: First workshops on applications of machine learning. Mid 1995: CACM paper analyzes factors underlying success. Mid 1995: CACM paper analyzes factors underlying success. Mid 1995: KDD conference becomes the default meeting for papers on machine learning applications. Mid 1995: KDD conference becomes the default meeting for papers on machine learning applications. Early 1998: Special issue of Machine Learning, with editorial, on applications. Early 1998: Special issue of Machine Learning, with editorial, on applications.
4
Steps in the Application of Machine Learning Formulating the Problem Engineering the Representation Collecting and Preparing Data Evaluating the Learned Knowledge Gaining User Acceptance Induction Process
5
Areas of Machine Learning Applications data mining for classification/regression tasks data mining for classification/regression tasks empirical natural language processing empirical natural language processing applied reinforcement learning applied reinforcement learning adaptive interfaces for personalized services adaptive interfaces for personalized services computational scientific discovery computational scientific discovery There exist a number of application movements within the field of machine learning: These types of applications differ in the demands they make and in the issues they raise.
6
Data Mining vs. Scientific Discovery Data mining generates knowledge cast as decision trees, logical rules, or other notations invented by AI researchers; Data mining generates knowledge cast as decision trees, logical rules, or other notations invented by AI researchers; Computational scientific discovery instead uses equations, structural models, reaction pathways, or other formalisms invented by scientists and engineers. Computational scientific discovery instead uses equations, structural models, reaction pathways, or other formalisms invented by scientists and engineers. There exist two computational paradigms for discovering explicit knowledge from data: Both approaches draw on heuristic search to find regularities in data, but they differ considerably in their emphases.
7
History of Research on Computational Scientific Discovery 1989199019791980198119821983198419851986198719881991199219931994199519961997199819992000 Bacon.1–Bacon.5 Abacus, Coper Fahrehneit, E*, Tetrad, IDS N Hume, ARC DST, GP N LaGrange SDS SSF, RF5, LaGramge Dalton, Stahl RL, Progol Gell-Mann BR-3, Mendel Pauli Stahlp, Revolver Dendral AM GlauberNGlauber IDS Q, Live IE Coast, Phineas, AbE, Kekada Mechem, CDP Astra, GP M HR BR-4 Numeric lawsQualitative lawsStructural modelsProcess models Legend
8
Successes of Computational Scientific Discovery Over the past decade, systems of this type have helped discover new knowledge in many scientific fields: stellar taxonomies from infrared spectra (Cheeseman et al., 1989)stellar taxonomies from infrared spectra (Cheeseman et al., 1989) qualitative chemical factors in mutagenesis (King et al., 1996)qualitative chemical factors in mutagenesis (King et al., 1996) quantitative laws of metallic behavior (Sleeman et al., 1997)quantitative laws of metallic behavior (Sleeman et al., 1997) qualitative conjectures in number theory (Colton et al., 2000)qualitative conjectures in number theory (Colton et al., 2000) temporal laws of ecological behavior (Todorovski et al., 2000)temporal laws of ecological behavior (Todorovski et al., 2000) reaction pathways in catalytic chemistry (Valdes-Perez, 1994, 1997)reaction pathways in catalytic chemistry (Valdes-Perez, 1994, 1997) Each of these has led to publications in the refereed literature of the relevant scientific field (see Langley, 2000).
9
Steps in Applying Computational Scientific Discovery problem formulation representation engineering data collection/ manipulation algorithm manipulation filtering and interpretation algorithm invocation
10
Two Applications for Scientific Discovery Data on climate variables and carbon production over space and time A model of the Earths ecosystem that fits and explains these data Gene expression levels, over time, for wild and mutant organisms. A model of gene regulation that fits and explains these data Given GivenFind Find
11
Lesson 1 NPPc = month max (E · IPAR, 0) E = 0.56 · T1 · T2 · W T1 = 0.8 + 0.02 · Topt – 0.0005 · Topt 2 T2 = 1.18 / [(1 + e 0.2 · (Topt – Tempc – 10) ) · (1 + e 0.3 · (Tempc – Topt – 10) )] W = 0.5 + 0.5 · EET / PET PET = 1.6 · (10 · Tempc / AHI) A · PET-TW-M if Tempc > 0 PET = 0 if Tempc < 0 A = 0.00000068 · AHI 3 – 0.000077 · AHI 2 + 0.018 · AHI + 0.49 IPAR = 0.5 · FPAR-FAS · Monthly-Solar · Sol-Conver FPAR-FAS = min [(SR-FAS – 1.08) / SR (UMD-VEG), 0.95] SR-FAS = (Mon-FAS-NDVI + 1000) / (Mon-FAS-NDVI – 1000) DFR NBLANBLR RRPhoto PBS Health - + + + - - - psbA1 psbA2 cpcB + + - - Light + Traditional notations from machine learning are not communicated easily to domain scientists. Ecosystem model Gene regulation model
12
m Lesson 2 Scientists often have initial models that should influence the discovery process. Discovery Initial model DFR NBLANBLR RRPhoto PBS Health - + + + - - - psbA1 psbA2 cpcB + + - - Light + Observations Revised model × DFR NBLANBLR RRPhoto PBS Health - - + + - - psbA1 psbA2 cpcB + - + Light + ×
13
Lesson 3 Scientific data are often rare and difficult to obtain rather than being plentiful. Ecosystem model Gene regulation model Number of variables Number of initial links Number of initial links Number of possible links Number of possible links Number of samples Number of samples Number of variables Number of equations Number of parameters Number of samples 8 11 20 303 911 70 7020
14
Lesson 4 Scientists want models that move beyond description to provide explanations of their data. Ecosystem model Gene regulation model DFR NBLANBLR RRPhoto PBS Health - + + + - - - psbA1 psbA2 cpcB + + - - Light + NPPc IPAR PET T1T2We_max E EET Tempc Topt NDVI SOLAR AHI A PETTWM SR FPAR VEG
15
Lesson 5 Scientists want computational assistance rather than automated discovery systems. Discovery Initial model DFR NBLANBLR RRPhoto PBS Health - + + + - - - psbA1 psbA2 cpcB + + - - Light + Observations Revised model × DFR NBLANBLR RRPhoto PBS Health - - + + - - psbA1 psbA2 cpcB + - + Light + ×
16
An Environment for Interactive Modeling specify process models of static and dynamic systems; specify process models of static and dynamic systems; display and edit a models structure and details graphically; display and edit a models structure and details graphically; utilize a model to simulate a systems behavior over time; utilize a model to simulate a systems behavior over time; incorporate background knowledge cast as generic processes; incorporate background knowledge cast as generic processes; indicate which processes to consider during model revision; indicate which processes to consider during model revision; invoke a revision module that improves a models fit to data. invoke a revision module that improves a models fit to data. In response, we are developing an environment that lets users: The current environment focuses on quantitative processes, but future versions will also support qualitative models.
17
A Process Model for Carbon Production model npp; variables NPPc, E, IPAR, T1, T2, W, Topt, tempc, eet, PET, PETTWM, ahi, A, FPARFAS, monthlySolar, SolConver, MONFASNDVI, umd_veg; observable ahi,eet,tempc,Topt,MONFASNDVI,monthlySolar,PETTWM,umd_veg; process CarbonProd; equations NPPc = E * IPAR; process PhotoEfficiency; equations E = (0.389 * (T1 * (T2 * W))); process TempStress1; equations T1 = (0.8 + ((0.02 * Topt) - (0.0005 * (Topt ^ 2)))); process TempStress2; equations T2 = ((1.1814 / (1 + (2.718281828 ^ (0.2 * (Topt - 10 - tempc))))) / (1 + (2.718281828 ^ (0.3 * (tempc - 10 - Topt))))); process WaterStress; conditions PET!=0; equations W = (0.5 + (0.5 * (eet / PET))); process WSNoEvapoTrans; conditions PET==0; equations W = 0.5; process EvapoTrans; conditions tempc>0; equations PET = 1.6 * (10 * tempc / ahi) ^ A * PETTWM;
18
Viewing and Editing a Process Model
19
Directions for Future Research methods for discovering knowledge in scientific formalisms methods for discovering knowledge in scientific formalisms techniques for revising existing scientific models techniques for revising existing scientific models approaches to dealing with small data sets approaches to dealing with small data sets algorithms for discovering explanatory models algorithms for discovering explanatory models interactive environments for scientific knowledge discovery interactive environments for scientific knowledge discovery These lessons suggest the field needs increased research on: Taken together, these emphases should address the needs of domain scientists and produce interesting new methods.
20
In Memoriam Herbert A. Simon (1916 – 2001) Herbert A. Simon (1916 – 2001) Jan M. Zytkow (1945 – 2001) Jan M. Zytkow (1945 – 2001) Early last year, computational scientific discovery lost two of its founding fathers: Both contributed to the field in many ways: posing new problems, inventing methods, training students, and organizing meetings. Moreover, both were interdisciplinary researchers who contributed to computer science, psychology, philosophy, and statistics. Herb Simon and Jan Zytkow were excellent role models that we should all aim to emulate.
22
The NPPc Portion of CASA NPPc = month max (E · IPAR, 0) E = 0.56 · T1 · T2 · W E = 0.56 · T1 · T2 · W T1 = 0.8 + 0.02 · Topt – 0.0005 · Topt 2 T1 = 0.8 + 0.02 · Topt – 0.0005 · Topt 2 T2 = 1.18 / [(1 + e 0.2 · (Topt – Tempc – 10) ) · (1 + e 0.3 · (Tempc – Topt – 10) )] T2 = 1.18 / [(1 + e 0.2 · (Topt – Tempc – 10) ) · (1 + e 0.3 · (Tempc – Topt – 10) )] W = 0.5 + 0.5 · EET / PET W = 0.5 + 0.5 · EET / PET PET = 1.6 · (10 · Tempc / AHI) A · PET-TW-M if Tempc > 0 PET = 1.6 · (10 · Tempc / AHI) A · PET-TW-M if Tempc > 0 PET = 0 if Tempc < 0 PET = 0 if Tempc < 0 A = 0.00000068 · AHI 3 – 0.000077 · AHI 2 + 0.018 · AHI + 0.49 A = 0.00000068 · AHI 3 – 0.000077 · AHI 2 + 0.018 · AHI + 0.49 IPAR = 0.5 · FPAR-FAS · Monthly-Solar · Sol-Conver IPAR = 0.5 · FPAR-FAS · Monthly-Solar · Sol-Conver FPAR-FAS = min [(SR-FAS – 1.08) / SR (UMD-VEG), 0.95] FPAR-FAS = min [(SR-FAS – 1.08) / SR (UMD-VEG), 0.95] SR-FAS = (Mon-FAS-NDVI + 1000) / (Mon-FAS-NDVI – 1000) SR-FAS = (Mon-FAS-NDVI + 1000) / (Mon-FAS-NDVI – 1000)
23
The NPPc Portion of CASA NPPc IPAR PET T1T2We_max E EET Tempc Topt NDVI SOLAR AHI A PET TWM SR FPAR VEG
24
How do plants modify their photosynthetic apparatus in high light? A Model of Photosynthesis Regulation DFR NBLANBLR RRPhoto PBS Health - + + + - - - psbA1 psbA2 cpcB + + - - Light +
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.