Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,

Similar presentations


Presentation on theme: "Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,"— Presentation transcript:

1 Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University, Stanford, California http://cll.stanford.edu/~langley langley@csli.stanford.edu Challenges in the Computational Discovery of Scientific Knowledge Thanks to K. Arrigo, S. Bay, L. Chrisman, D. George, A. Pohorille, C. Potter, J. Sanchez, K. Saito, and J. Shrager.

2 The Challenge of Systems Science focus on synthesis rather than analysis in their operation; focus on synthesis rather than analysis in their operation; rely on computer modeling as one of their central methods; rely on computer modeling as one of their central methods; develop system-level models with many variables and relations; develop system-level models with many variables and relations; evaluate their models on observational, not experimental, data. evaluate their models on observational, not experimental, data. Disciplines like Earth science and computational biology differ from traditional fields in that they: Developing and testing such models are complex tasks that would benefit from computational aids. Our research goal is to design, construct, evaluate, and understand such computational tools for systems science.

3 The Data Mining Paradigm emphasizes the availability of vast amounts of data; emphasizes the availability of vast amounts of data; focuses on business data, with some scientific applications; focuses on business data, with some scientific applications; uses formalisms like decision trees, association rules, and Bayesian networks to encode learned knowledge. uses formalisms like decision trees, association rules, and Bayesian networks to encode learned knowledge. One approach to computational discovery, known as data mining: I.e., data mining researchers favor their own formalisms over those used by scientists and engineers. As a result, their discoveries are seldom communicable to members of those communities.

4 Computational Scientific Discovery emphasizes use of heuristic search in the discovery process; emphasizes use of heuristic search in the discovery process; focuses on discovery of knowledge in scientific domains; focuses on discovery of knowledge in scientific domains; uses formalisms like numeric equations, structural models, and reaction pathways to describe regularities. uses formalisms like numeric equations, structural models, and reaction pathways to describe regularities. An older paradigm, computational scientific discovery, instead: I.e., researchers in this framework favor representations used by scientists and engineers. As a result, their systems discoveries are usually communicable to members of those communities.

5 Successes of Computational Scientific Discovery Over the past decade, systems of this type have helped discover new knowledge in many scientific fields: qualitative chemical factors in mutagenesis (King et al., 1996) qualitative chemical factors in mutagenesis (King et al., 1996) quantitative laws of metallic behavior (Sleeman et al., 1997) quantitative laws of metallic behavior (Sleeman et al., 1997) qualitative conjectures in number theory (Colton et al., 2000) qualitative conjectures in number theory (Colton et al., 2000) temporal laws of ecological behavior (Todorovski et al., 2000) temporal laws of ecological behavior (Todorovski et al., 2000) reaction pathways in catalytic chemistry (Valdes-Perez, 1994) reaction pathways in catalytic chemistry (Valdes-Perez, 1994) Each has led to publications in the refereed scientific literature (e.g., Langley, 2000), but they did not focus on systems science.

6 Two Discovery Problems in Systems Science Data on climate and organism variables over space and time An ecosystem model that fits these data and explains them Time-series data on gene expressions for specific organisms A model of gene regulation that fits and explains these data Given GivenFind Find These problems raise new challenges that require advances in our methods for computational discovery.

7 Challenge 1: Representing Scientific Models address observational rather than experimental data; address observational rather than experimental data; deal with dynamic systems that change over time; deal with dynamic systems that change over time; have an explanatory rather than a descriptive character; have an explanatory rather than a descriptive character; are causal in that they describe chains of effects; are causal in that they describe chains of effects; contain quantitative relations and qualitative structure. contain quantitative relations and qualitative structure. To assist system scientists modeling efforts, we must first encode candidate models that: We need some formal way to represent such models that can be interpreted computationally.

8 Why Are Existing Formalisms Inadequate? d[ice_mass,t] = (18 heat) / 6.02 d[water_mass,t] = (18 heat) / 6.02 systems of equations B>6 C>0 C>4 14.318.711.516.9 regression trees gcd(X,X,X). gcd(X,Y,D) :- X<Y,Z is Y–X,gcd(X,Z,D). gcd(X,Y,D) :- Y<X,gcd(Y,X,D). Horn clause programs x =12, x =1 x =12, x =1 y =18, x =2 y =18, x =2 x =12, x =1 x =12, x =1 y =10, x =2 y =10, x =2 x =16, x =2 x =16, x =2 y =13, x =1 y =13, x =1 x =19, x =1 x =19, x =1 y =11, x =2 y =11, x =2 0.3 0.7 1.0 1.0 hidden Markov models

9 A Process Model for an Aquatic Ecosystem model AquaticEcosystem variables: phyto, zoo, nitro, residue observables: phyto, nitro process phyto_exponential_decay equations:d[phyto,t,1] = 0.3 phyto equations:d[phyto,t,1] = 0.3 phyto d[residue,t,1] = 0.3 phyto process zoo_exponential_decay equations:d[zoo,t,1] = 0.3 zoo equations:d[zoo,t,1] = 0.3 zoo d[residue,t,1] = 0.3 zoo process zoo_phyto_predation equations:d[zoo,t,1] = 0.7 0.5 zoo equations:d[zoo,t,1] = 0.7 0.5 zoo d[residue,t,1] = 0.3 0.5 zoo d[phyto,t,1] = 0.5 zoo process nitro_uptake conditions:nitro > 0 conditions:nitro > 0 equations:d[phyto,t,1] = 0.4 phyto equations:d[phyto,t,1] = 0.4 phyto d[nitro,t,1] = 0.1 0.4 phyto process nitro_remineralization; equations:d[nitro,t,1] = 0.05 residue equations:d[nitro,t,1] = 0.05 residue d[residue,t,1 ] = 0.05 residue

10 Advantages of Quantitative Process Models they embed quantitative relations within qualitative structure; they embed quantitative relations within qualitative structure; that refer to notations and mechanisms familiar to scientists; that refer to notations and mechanisms familiar to scientists; they provide dynamical predictions of changes over time; they provide dynamical predictions of changes over time; they offer causal and explanatory accounts of phenomena; they offer causal and explanatory accounts of phenomena; while retaining the modularity needed to support induction. while retaining the modularity needed to support induction. Process models are a good target for discovery systems because: Quantitative process models provide an important alternative to formalisms used currently in computational discovery.

11 Challenge 2: Making Predictions from Models To utilize or evaluate a given process model, we must simulate its behavior over time: specify initial values for input variables and time step size; specify initial values for input variables and time step size; on each time step, determine which processes are active; on each time step, determine which processes are active; solve active algebraic/differential equations with known values; solve active algebraic/differential equations with known values; propagate values and recursively solve other active equations; propagate values and recursively solve other active equations; when multiple processes influence the same variable, assume their effects are additive. when multiple processes influence the same variable, assume their effects are additive. This performance method makes specific predictions that we can compare to observations.

12 Predictions from the Ecosystem Model

13 Challenge 3: Encoding Background Knowledge Horn clause programs (e.g., Towell & Shavlik, 1990) Horn clause programs (e.g., Towell & Shavlik, 1990) context-free grammars (e.g., Dzeroski & Todorovski, 1997) context-free grammars (e.g., Dzeroski & Todorovski, 1997) prior probability distributions (e.g., Friedman et al., 2000) prior probability distributions (e.g., Friedman et al., 2000) To constrain candidate models, we can utilize available backround knowledge about the domain. Previous work has cast background knowledge in terms of: However, none of these notations are familiar to domain scientists, which suggests the need for another approach.

14 Generic Processes as Background Knowledge the variables involved in a process and their types; the variables involved in a process and their types; the parameters appearing in a process and their ranges; the parameters appearing in a process and their ranges; the forms of conditions on the process; and the forms of conditions on the process; and the forms of associated equations and their parameters. the forms of associated equations and their parameters. Our framework casts background knowledge as generic processes that specify: Generic processes are building blocks from which one can compose a specific process model.

15 Generic Processes for Aquatic Ecosystems generic process exponential_decaygeneric process remineralization variables: S{species}, D{detritus} variables: N{nutrient}, D{detritus} variables: S{species}, D{detritus} variables: N{nutrient}, D{detritus} parameters: [0, 1] parameters: [0, 1] parameters: [0, 1] parameters: [0, 1] equations:d[S,t,1] = 1 S equations:d[N, t,1] = D equations:d[S,t,1] = 1 S equations:d[N, t,1] = D d[D,t,1] = Sd[D, t,1] = 1 D generic process predationgeneric process constant_inflow variables: S1{species}, S2{species}, D{detritus} variables: N{nutrient} variables: S1{species}, S2{species}, D{detritus} variables: N{nutrient} parameters: [0, 1], [0, 1] parameters: [0, 1] parameters: [0, 1], [0, 1] parameters: [0, 1] equations:d[S1,t,1] = S1 equations:d[N,t,1] = equations:d[S1,t,1] = S1 equations:d[N,t,1] = d[D,t,1] = (1 ) S1 d[S2,t,1] = 1 S1 generic process nutrient_uptake variables: S{species}, N{nutrient} variables: S{species}, N{nutrient} parameters: [0, ], [0, 1], [0, 1] parameters: [0, ], [0, 1], [0, 1] conditions:N > conditions:N > equations:d[S,t,1] = S equations:d[S,t,1] = S d[N,t,1] = 1 S

16 process exponential_growth variables: P {population} variables: P {population} equations: d[P,t] = [0, 1, ] P equations: d[P,t] = [0, 1, ] P process logistic_growth variables: P {population} variables: P {population} equations: d[P,t] = [0, 1, ] P (1 P / [0, 1, ]) equations: d[P,t] = [0, 1, ] P (1 P / [0, 1, ]) process constant_inflow variables: I {inorganic_nutrient} variables: I {inorganic_nutrient} equations: d[I,t] = [0, 1, ] equations: d[I,t] = [0, 1, ] process consumption variables: P1 {population}, P2 {population}, nutrient_P2 variables: P1 {population}, P2 {population}, nutrient_P2 equations: d[P1,t] = [0, 1, ] P1 nutrient_P2, equations: d[P1,t] = [0, 1, ] P1 nutrient_P2, d[P2,t] = [0, 1, ] P1 nutrient_P2 d[P2,t] = [0, 1, ] P1 nutrient_P2 process no_saturation variables: P {number}, nutrient_P {number} variables: P {number}, nutrient_P {number} equations: nutrient_P = P equations: nutrient_P = P process saturation variables: P {number}, nutrient_P {number} variables: P {number}, nutrient_P {number} equations: nutrient_P = P / (P + [0, 1, ]) equations: nutrient_P = P / (P + [0, 1, ]) Challenge 4: Inducing Process Models model AquaticEcosystem variables: nitro, phyto, zoo, nutrient_nitro, nutrient_phyto observables: nitro, phyto, zoo process phyto_exponential_growth equations: d[phyto,t] = 0.1 phyto equations: d[phyto,t] = 0.1 phyto process zoo_logistic_growth equations: d[zoo,t] = 0.1 zoo / (1 zoo / 1.5) equations: d[zoo,t] = 0.1 zoo / (1 zoo / 1.5) process phyto_nitro_consumption equations: d[nitro,t] = 1 phyto nutrient_nitro, equations: d[nitro,t] = 1 phyto nutrient_nitro, d[phyto,t] = 1 phyto nutrient_nitro d[phyto,t] = 1 phyto nutrient_nitro process phyto_nitro_no_saturation equations: nutrient_nitro = nitro equations: nutrient_nitro = nitro process zoo_phyto_consumption equations: d[phyto,t] = 1 zoo nutrient_phyto, equations: d[phyto,t] = 1 zoo nutrient_phyto, d[zoo,t] = 1 zoo nutrient_phyto d[zoo,t] = 1 zoo nutrient_phyto process zoo_phyto_saturation equations: nutrient_phyto = phyto / (phyto + 0.5) equations: nutrient_phyto = phyto / (phyto + 0.5) Induction training data generic processes process model

17 A Method for Process Model Induction 1. Find all ways to instantiate known generic processes with specific variables; 2. Combine subsets of instantiated processes into candidate generic models that specify explanatory structures; 2a. Limit the number of processes that can connect any two 2a. Limit the number of processes that can connect any two variables and the total number of processes; variables and the total number of processes; 2b. Limit the number of unobservable variables in the model; 2b. Limit the number of unobservable variables in the model; 3. For each generic model, carry out gradient descent search through parameter space to find good parameter values; 4. Return the parameterized model with the lowest mean error. The IPM algorithm that constructs process models from generic components in four stages:

18 Evaluation of the IPM Algorithm 1. We used the aquatic ecosystem model to generate data on 500 time steps for the variables nitro and phyto; 2. We replaced each true value x with x (1 + r 0.05), where r came from a Gaussian distribution ( = 0 and = 1); 3. We ran IPM on these noisy data, giving it type constraints and generic processes as background knowledge. To demonstrate IPM's ability to induce process models, we ran it on synthetic data for a known system. The IPM algorithm examined a space of 2212 generic models, each with an embedded parameter optimization.

19 Predictions from Induced Ecosystem Model

20 identifying conditions on processes (parameter optimization) identifying conditions on processes (parameter optimization) inferring initial values of unobservables (parameter optimization) inferring initial values of unobservables (parameter optimization) keeping the search space tractable (typing on variables) keeping the search space tractable (typing on variables) reducing variance to mitigate overfitting (min. desc. length) reducing variance to mitigate overfitting (min. desc. length) Process model induction raises a number of issues that have clear analogues in other paradigms: We have demonstrated promising responses to these four problems within the IPM framework. Robust Induction of Process Models

21 Best Model Fit to Actual Nitrate Data

22 Best Model Fit to Actual Phytoplankton Data

23 Collecting Data on Photosynthetic Processes External stimuli (e.g., light) Adaptation Period Sampling mRNA/cDNA Equlibrium Period MicroarrayTrace Continuous Culture (Chemostat) /wwwscience.murdoch.edu.au/teach www.affymetrix.com/ Health of Culture Time

24 Gene Expressions for Cyanobacteria

25 A Process Model for Photosynthetic Regulation model photo_reg variables: mRNA_abundance, light, photo_protein, energy_supply, energy_demand, Awake energy_supply, energy_demand, Awake observable: light, mRNA_abundance process wake_upprocess fall_asleep conditions: light > 0 conditions:light == 0 conditions: light > 0 conditions:light == 0 equations: Awake = 1 equations:Awake = 0 equations: Awake = 1 equations:Awake = 0 process basic_energy_demandprocess awake_energy_demand conditions: Awake == 0 conditions:Awake == 1 conditions: Awake == 0 conditions:Awake == 1 equations: energy_demand = 0.5 equations:energy_demand = 1.0 equations: energy_demand = 0.5 equations:energy_demand = 1.0 process photosynthesis; equations: energy_supply = light photo_protein equations: energy_supply = light photo_protein process protein_translation equations: d[photo_protein,t,1] = 0.2 mRNA_abundance equations: d[photo_protein,t,1] = 0.2 mRNA_abundance process protein_degradation equations: d[photo_protein,t,1] = 0.01 photo_protein equations: d[photo_protein,t,1] = 0.01 photo_protein process mRNA_transcription conditions: energy_demand > energy_supply, Awake == 1 conditions: energy_demand > energy_supply, Awake == 1 equations: d[mRNA_abundance,t,1] = 0.1 (energy_demand energy_supply) equations: d[mRNA_abundance,t,1] = 0.1 (energy_demand energy_supply) process mRNA_degradation equations: d[mRNA_abundance,t,1] = 0.01 mRNA_abundance equations: d[mRNA_abundance,t,1] = 0.01 mRNA_abundance

26 Predictions from Best Parameterized Model

27 Electic Power on the International Space Station

28 Telemetry Data from Space Station Batteries

29 Induced Process Model for Battery Behavior model Battery variables: Rs, soc, Vcb, temperature, Vt, i observable: Vt, i, temperature process voltage_chargeprocess voltage_discharge conditions: i > 0conditions:i 0conditions:i < 0 equations:Vt = Vcb + (Rs i)equations:Vt = Vcb 2.63 / (Rs + 2.63) equations:Vt = Vcb + (Rs i)equations:Vt = Vcb 2.63 / (Rs + 2.63) process charge_transfer equations:d[soc,t,1] = i (Vcb/302.71) equations:d[soc,t,1] = i (Vcb/302.71) process constant_influence_Vcb equations:Vcb = 215.00 equations:Vcb = 215.00 process linear_influence_Vcb_soc equations:Vcb = 205.87 soc equations:Vcb = 205.87 soc process quadratic_influence_Vcb_soc equations:Vcb = 154.87 soc soc equations:Vcb = 154.87 soc soc process linear_influence_Vcb_temp equations:Vcb = 1.38 temperature equations:Vcb = 1.38 temperature process constant_influence_Rs equations:Rs = 0.20 equations:Rs = 0.20

30 Results on Battery Test Data

31 Steps in Applying Computational Scientific Discovery problem formulation representation engineering data collection/ manipulation algorithm manipulation filtering and interpretation algorithm invocation

32 specify a quantitative process model of the target system; specify a quantitative process model of the target system; display and edit the models structure and details graphically; display and edit the models structure and details graphically; simulate the models behavior over time and situations; simulate the models behavior over time and situations; compare the models predicted behavior to observations; compare the models predicted behavior to observations; invoke a revision module in response to detected anomalies. invoke a revision module in response to detected anomalies. Because scientists do not want to be replaced, we are developing an interactive environment that lets users: The environment offers computational assistance in forming and evaluating models but lets the user retain control. Challenge 5: Interfacing with Scientists

33 Viewing and Editing a Process Model

34 Results of Revising the NPP Model Initial model: E = 0.56 · T1 · T2 · W E = 0.56 · T1 · T2 · W T2 = 1.18 / [(1 + e 0.2 · (Topt – Tempc – 10) ) · (1 + e 0.3 · (Tempc – Topt – 10) )] T2 = 1.18 / [(1 + e 0.2 · (Topt – Tempc – 10) ) · (1 + e 0.3 · (Tempc – Topt – 10) )] PET = 1.6 · (10 · Tempc / AHI) A · PET-TW-M PET = 1.6 · (10 · Tempc / AHI) A · PET-TW-M SR {3.06, 4.35, 4.35, 4.05, 5.09, 3.06, 4.05, 4.05, 4.05, 5.09, 4.05} SR {3.06, 4.35, 4.35, 4.05, 5.09, 3.06, 4.05, 4.05, 4.05, 5.09, 4.05} RMSE on training data = 465.212 and r 2 = 0.799 Revised model: E = 0.353 · T1 0.00 · T2 0.08 · W 0.00 E = 0.353 · T1 0.00 · T2 0.08 · W 0.00 T2 = 0.83 / [(1 + e 1.0 · (Topt – Tempc – 6.34) ) · (1 + e 1.0 · (Tempc – Topt – 11.52) )] T2 = 0.83 / [(1 + e 1.0 · (Topt – Tempc – 6.34) ) · (1 + e 1.0 · (Tempc – Topt – 11.52) )] PET = 1.6 · (10 · Tempc / AHI) A · PET-TW-M PET = 1.6 · (10 · Tempc / AHI) A · PET-TW-M SR {0.61, 3.99, 2.44, 10.0, 2.21, 2.13, 2.04, 0.43, 1.35, 1.85, 1.61} SR {0.61, 3.99, 2.44, 10.0, 2.21, 2.13, 2.04, 0.43, 1.35, 1.85, 1.61} Cross-validated RMSE = 397.306 and r 2 = 0.853 [ 15 % reduction ]

35 computational scientific discovery (e.g., Langley et al., 1983); computational scientific discovery (e.g., Langley et al., 1983); theory revision in machine learning (e.g., Towell, 1991); theory revision in machine learning (e.g., Towell, 1991); qualitative physics and simulation (e.g., Forbus, 1984); qualitative physics and simulation (e.g., Forbus, 1984); languages for scientific simulation (e.g., STELLA, MATLAB ); languages for scientific simulation (e.g., STELLA, MATLAB ); interactive tools for data analysis (e.g., Schneiderman, 2001). interactive tools for data analysis (e.g., Schneiderman, 2001). Intellectual Influences Our approach to computational discovery incorporates ideas from many traditions: Our work combines, in novel ways, insights from machine learning, AI, programming languages, and human-computer interaction.

36 Contributions of the Research a new formalism for representing scientific process models; a new formalism for representing scientific process models; a computational method for simulating these models behavior; a computational method for simulating these models behavior; an encoding for background knowledge as generic processes; an encoding for background knowledge as generic processes; an algorithm for inducing process models from time-series data; an algorithm for inducing process models from time-series data; an interactive environment for model construction/utilization. an interactive environment for model construction/utilization. In summary, our work on computational scientific discovery has, in responding to five challenges, produced: We have demonstrated this approach to model construction on four domains from Earth science, microbiology, and engineering.

37 Directions for Future Research produce additional results on other scientific data sets produce additional results on other scientific data sets develop more robust methods for fitting model parameters develop more robust methods for fitting model parameters extend the approach to handle data sets with missing values extend the approach to handle data sets with missing values implement heuristic methods for searching the model space implement heuristic methods for searching the model space utilize knowledge of subsystems to further constrain search utilize knowledge of subsystems to further constrain search augment the modeling environment to make it more usable augment the modeling environment to make it more usable Despite our progress to date, we need further work in order to: Inductive process modeling has great potential to speed progress in system science and engineering.

38 End of Presentation

39 In Memoriam Herbert A. Simon (1916 – 2001) Herbert A. Simon (1916 – 2001) Jan M. Zytkow (1945 – 2001) Jan M. Zytkow (1945 – 2001) Two years ago, computational scientific discovery lost two of its founding fathers: Both contributed to the field in many ways: posing new problems, inventing methods, training students, and organizing meetings. Moreover, both were interdisciplinary researchers who contributed to computer science, psychology, philosophy, and statistics. Herb Simon and Jan Zytkow were excellent role models that we should all aim to emulate.

40 Data Mining vs. Scientific Discovery Data mining generates knowledge cast as decision trees, logical rules, or other notations invented by AI researchers; Data mining generates knowledge cast as decision trees, logical rules, or other notations invented by AI researchers; Computational scientific discovery instead uses equations, structural models, reaction pathways, or other formalisms invented by scientists and engineers. Computational scientific discovery instead uses equations, structural models, reaction pathways, or other formalisms invented by scientists and engineers. There exist two computational paradigms for discovering explicit knowledge from data: Both approaches draw on heuristic search to find regularities in data, but they differ considerably in their emphases.

41 Time Line for Research on Computational Scientific Discovery 1989199019791980198119821983198419851986198719881991199219931994199519961997199819992000 Bacon.1–Bacon.5 Abacus, Coper Fahrehneit, E*, Tetrad, IDS N Hume, ARC DST, GP N LaGrange SDS SSF, RF5, LaGramge Dalton, Stahl RL, Progol Gell-Mann BR-3, Mendel Pauli Stahlp, Revolver Dendral AM GlauberNGlauber IDS Q, Live IE Coast, Phineas, AbE, Kekada Mechem, CDP Astra, GP M HR BR-4 Numeric lawsQualitative lawsStructural modelsProcess models Legend

42 Why Are Process Models Interesting? they incorporate scientific formalisms rather than AI notations; they incorporate scientific formalisms rather than AI notations; that are easily communicable to scientists and engineers; that are easily communicable to scientists and engineers; they move beyond descriptive generalization to explanation; they move beyond descriptive generalization to explanation; while retaining the modularity needed to support induction. while retaining the modularity needed to support induction. Process models are a crucial target for machine learning because: These reasons point to process models as an ideal representation for scientific and engineering knowledge. Process models are an important alternative to formalisms used currently in machine learning.

43 Challenges of Inductive Process Modeling process models characterize behavior of dynamical systems; process models characterize behavior of dynamical systems; variables are mainly continuous and data are unsupervised; variables are mainly continuous and data are unsupervised; observations are not independently and identically distributed; observations are not independently and identically distributed; process models contain unobservable processes and variables; process models contain unobservable processes and variables; multiple processes can interact to produce complex behavior. multiple processes can interact to produce complex behavior. Process model induction differs from typical learning tasks in that: Compensating factors include a focus on deterministic systems and the availability of background knowledge.

44 Making Predictions with Process Models Specify initial values for input variables and the size for time steps On each time step, check conditions to decide which processes are active Solve algebraic and differential equations with known values Propagate values and recurse to solve other equations Add the effects of different processes on each variable

45 Predictions from IPMs Induced Model

46 Observed values for a set of continuous variables as they vary over time or situations Generic processes that characterize causal relationships among variables in terms of conditional equations Inductive Process Modeling A specific process model that explains the observed values and predicts future data accurately Induction training data background knowledge learned model

47 Inductive Process Modeling as Search an initial state from which to start search; an initial state from which to start search; some operators that generate new states; some operators that generate new states; an evaluation function that selects among states; an evaluation function that selects among states; an overall control regime for the search; and an overall control regime for the search; and a halting criterion for ending the search. a halting criterion for ending the search. To construct a quantitative process model, we need an algorithm to search the space of models that assumes: We have implemented a four-stage method that takes positions on these design decisions.

48 The IPM Method for Process Model Induction Find all ways to instantiate known generic processes with specific variables Combine subsets of instantiated processes into generic models Remove candidates that are too complex or not connected graphs For each generic model, search for good parameter values Return parameterized model with the smallest error

49 http://www.bio.ic.ac.uk/research/barber/photosystemII.html A Biologists Depiction of Photosynthesis

50 An Interactive Environment for Biological Modeling

51 The NPPc Portion of CASA NPPc = month max (E · IPAR, 0) E = 0.56 · T1 · T2 · W E = 0.56 · T1 · T2 · W T1 = 0.8 + 0.02 · Topt – 0.0005 · Topt 2 T1 = 0.8 + 0.02 · Topt – 0.0005 · Topt 2 T2 = 1.18 / [(1 + e 0.2 · (Topt – Tempc – 10) ) · (1 + e 0.3 · (Tempc – Topt – 10) )] T2 = 1.18 / [(1 + e 0.2 · (Topt – Tempc – 10) ) · (1 + e 0.3 · (Tempc – Topt – 10) )] W = 0.5 + 0.5 · EET / PET W = 0.5 + 0.5 · EET / PET PET = 1.6 · (10 · Tempc / AHI) A · PET-TW-M if Tempc > 0 PET = 1.6 · (10 · Tempc / AHI) A · PET-TW-M if Tempc > 0 PET = 0 if Tempc < 0 PET = 0 if Tempc < 0 A = 0.00000068 · AHI 3 – 0.000077 · AHI 2 + 0.018 · AHI + 0.49 A = 0.00000068 · AHI 3 – 0.000077 · AHI 2 + 0.018 · AHI + 0.49 IPAR = 0.5 · FPAR-FAS · Monthly-Solar · Sol-Conver IPAR = 0.5 · FPAR-FAS · Monthly-Solar · Sol-Conver FPAR-FAS = min [(SR-FAS – 1.08) / SR (UMD-VEG), 0.95] FPAR-FAS = min [(SR-FAS – 1.08) / SR (UMD-VEG), 0.95] SR-FAS = (Mon-FAS-NDVI + 1000) / (Mon-FAS-NDVI – 1000) SR-FAS = (Mon-FAS-NDVI + 1000) / (Mon-FAS-NDVI – 1000)

52 The NPPc Portion of CASA NPPc IPAR PET T1T2We_max E EET Tempc Topt NDVI SOLAR AHI A PET TWM SR FPAR VEG

53 A Process Model for Carbon Production model npp; variables NPPc, E, IPAR, T1, T2, W, Topt, tempc, eet, PET, PETTWM, ahi, A, FPARFAS, monthlySolar, SolConver, MONFASNDVI, umd_veg; ahi, A, FPARFAS, monthlySolar, SolConver, MONFASNDVI, umd_veg; observable ahi,eet,tempc,Topt,MONFASNDVI,monthlySolar,PETTWM,umd_veg; process CarbonProd; equations NPPc = E * IPAR; equations NPPc = E * IPAR; process PhotoEfficiency; equations E = (0.389 * (T1 * (T2 * W))); equations E = (0.389 * (T1 * (T2 * W))); process TempStress1; equations T1 = (0.8 + ((0.02 * Topt) - (0.0005 * (Topt ^ 2)))); equations T1 = (0.8 + ((0.02 * Topt) - (0.0005 * (Topt ^ 2)))); process TempStress2; equations T2 = ((1.1814 / equations T2 = ((1.1814 / (1 + (2.718281828 ^ (0.2 * (Topt - 10 - tempc))))) / (1 + (2.718281828 ^ (0.2 * (Topt - 10 - tempc))))) / (1 + (2.718281828 ^ (0.3 * (tempc - 10 - Topt))))); (1 + (2.718281828 ^ (0.3 * (tempc - 10 - Topt))))); process WaterStress; conditions PET!=0; conditions PET!=0; equations W = (0.5 + (0.5 * (eet / PET))); equations W = (0.5 + (0.5 * (eet / PET))); process WSNoEvapoTrans; conditions PET==0; conditions PET==0; equations W = 0.5; equations W = 0.5; process EvapoTrans; conditions tempc>0; conditions tempc>0; equations PET = 1.6 * (10 * tempc / ahi) ^ A * PETTWM; equations PET = 1.6 * (10 * tempc / ahi) ^ A * PETTWM;

54 Visualizing an Improved Model One way to visualize a model involves plotting its rules spatially. Our Earth science collaborators found this useful, as regions often correspond to recognizable ecological zones.


Download ppt "Pat Langley Institute for the Study of Learning and Expertise Palo Alto, California and Center for the Study of Language and Information Stanford University,"

Similar presentations


Ads by Google