Knowledge Space Map for Organic Reactions Knowledge Space Theory Existing Rule Set Basis for Chemistry Knowledge Space Model Data Model Proposal Constructing.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.
Nucleophilic Substitutions and Eliminations
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 4-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Unit 2: SMELLS Molecular Structure and Properties
Ionization of Biological Molecules This tutorial extends the concepts of acid-base chemistry by showing that [H + ], i.e., the pH of a solution, affects.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
Using Statistics in Research Psych 231: Research Methods in Psychology.
Protein Classification A comparison of function inference techniques.
CEEN-2131 Business Statistics: A Decision-Making Approach CEEN-2130/31/32 Using Probability and Probability Distributions.
Information theoretic interpretation of PAM matrices Sorin Istrail and Derek Aguiar.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 7-1 Chapter 7 Confidence Interval Estimation Statistics for Managers.
1. An Overview of the Data Analysis and Probability Standard for School Mathematics? 2.
Sampling : Error and bias. Sampling definitions  Sampling universe  Sampling frame  Sampling unit  Basic sampling unit or elementary unit  Sampling.
Maths and Chemistry for Biologists
Sampling January 9, Cardinal Rule of Sampling Never sample on the dependent variable! –Example: if you are interested in studying factors that lead.
Chapter 3. Atoms are very small, but we need to know the mass of different atoms to compare them. To do this, we define a unit, called the atomic mass.
The Mole & Stoichiometry
Technical Adequacy Session One Part Three.
ADVANCED CHEMISTRY Chapter 3 Stoichiometry. WHAT IS STOICHIOMETRY? Antoine Lavoisier observed that the total mass before a reaction is equal to the total.
8/25/05 Cognitive Computations Software Tutorial Page 1 SNoW: Sparse Network of Winnows Presented by Nick Rizzolo.
WELNS 670: Wellness Research Design Chapter 5: Planning Your Research Design.
Today: Our process Assignment 3 Q&A Concept of Control Reading: Framework for Hybrid Experiments Sampling If time, get a start on True Experiments: Single-Factor.
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
Chapter 1 Introduction to Statistics. Statistical Methods Were developed to serve a purpose Were developed to serve a purpose The purpose for each statistical.
Hybrid Simulation with Qualitative and Quantitative Integrated Model under Uncertainty Business Environment Masanori Akiyoshi (Osaka University) Masaki.
Independent Samples 1.Random Selection: Everyone from the Specified Population has an Equal Probability Of being Selected for the study (Yeah Right!)
Chap 4-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 4 Using Probability and Probability.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 16.
Redox Reactions, Acid, Base, Paul G. Mezey
Preview Objectives Oxidation States Oxidation Reduction Oxidation and Reduction as a Process Chapter 19.
Chapter 7 Probability and Samples: The Distribution of Sample Means.
Chapter 16 The Chemistry of Benzene Derivatives. 2 MorphineValium.
Networks Igor Segota Statistical physics presentation.
Computational Intelligence: Methods and Applications Lecture 16 Model evaluation and ROC Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
CHAPTER SEVEN ESTIMATION. 7.1 A Point Estimate: A point estimate of some population parameter is a single value of a statistic (parameter space). For.
Chapter 9 Genetic Algorithms.  Based upon biological evolution  Generate successor hypothesis based upon repeated mutations  Acts as a randomized parallel.
Why Addition Reactions Proceed: Thermodynamic Feasibility 12-1 Because the C-C  bond is relatively weak, alkene chemistry is dominated by its reactions.
© Prentice Hall 2001Chapter 101 On Line Course Evaluation for Chemistry 350/Section We are participating in the online course evaluation Please log.
1 The Role of Statistics in Engineering ENM 500 Chapter 1 The adventure begins… A look ahead.
DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.
Organic Chemistry. Homologous Series A grouping of organic compounds based on their composition and properties A series has: A general formula The same.
Reliability performance on language tests is also affected by factors other than communicative language ability. (1) test method facets They are systematic.
MATH 256 Probability and Random Processes Yrd. Doç. Dr. Didem Kivanc Tureli 14/10/2011Lecture 3 OKAN UNIVERSITY.
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
Lecture №4 METHODS OF RESEARCH. Method (Greek. methodos) - way of knowledge, the study of natural phenomena and social life. It is also a set of methods.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
Organic Reactions Hydrogenation Addition Substitution Combustion
Lesson 3 Measurement and Scaling. Case: “What is performance?” brandesign.co.za.
Chap 4-1 Chapter 4 Using Probability and Probability Distributions.
Chapter 2 Families of Carbon Compounds. Basic Definitions Hydrocarbons- Compounds containing only carbon and hydrogen. Alkanes- hydrocarbons that contain.
What does the Research Say About . . .
Logic of Sampling Cornel Hart February 2007.
Chapter 14 Protein Structure Classification
Self-Organizing Network Model (SOM) Session 11
Chapter 10 Organic Chemistry
4 Sampling.
Power, Sample Size, & Effect Size:
Stat 217 – Day 28 Review Stat 217.
Random sampling Carlo Azzarri IFPRI Datathon APSU, Dhaka
Chapter 2 Review Regular rules No notes no books
Chemistry.
Chemistry.
Chapter 19 Preview Objectives Oxidation States Oxidation Reduction
Chapter 10 Organic Chemistry
Sampling.
AMIDES.
Sampling Chapter 6.
Producing good data through sampling and experimentation
Presentation transcript:

Knowledge Space Map for Organic Reactions Knowledge Space Theory Existing Rule Set Basis for Chemistry Knowledge Space Model Data Model Proposal Constructing and Learning the Map

Knowledge Space Map Isolate atomic knowledge units / nodes / elements Determine dependency graph of knowledge units (defines a learning order by topological sort) Enables targeted and purposeful lesson plans based on the “fringes” of student’s current knowledge state MultiplicationDivision LogarithmsExponents Fractions VocabularyGrammarSpelling SubtractionAddition

Chemistry Knowledge Space? Current system has user driven selection of which chapter(s) to work on, then system randomly generates problem Idealized approach: Assess student’s current knowledge state and auto-generate next problem to target next most useful subject Existing tutorial based on predictive power of 80+ reagents, which are based on elemental rules. These could be interpreted as knowledge units

Rule Clustering Many rules are just variants of the same concept / knowledge unit Alkene, Protic Acid Addition, Alkoxy Alkene, Protic Acid Addition, Benzyl Alkene, Protic Acid Addition, Allyl Alkene, Protic Acid Addition, Tertiary Alkene, Protic Acid Addition, Secondary Alkene, Protic Acid Addition, Generic … Some rules will always be used in conjunction with another (like “qu”) Not really a learning dependency order between these rules then, you essentially know one of the rules IFF (if and only if) you know the others

Data Model Proposal Want general framework for representing relationships Each reaction rule represents an elementary knowledge unit node Weighted, directed edge between each node represents learning dependency relationship A  B (90%) Given that a student “knows” rule B, there is a 90% probability that they “know” rule A Conversely, if do NOT know rule A, 90% probability that do NOT know rule B. Define “know”: Student should consistently answer correct any problem that is based only on rules that they “know” Define rule similarity measure as average of reciprocal dependency relationships

Major Relationship Cases Strong learning dependency A  B (99%) A  B (50%) Strong similarity / mutual dependency A  B (99%) A  B (99%) No relation (random correlation) A  B (50%) A  B (50%)

Additional Enhancements Add baseline probability of “knowing” each node, instead of assuming uniform 50% Analogous to using background weights for amino acid distribution in protein sequence Add a confidence number for each of these probability weights to reflect how trustworthy our prior data is Analogous (maybe equal) to n, the number of data points that were used to arrive at the current estimate

Learning Relationship Map Give students assessment exams based on the rule sets with criteria to distinguish problems that students get “right” vs. “wrong” Defines sets of rules R: All rules used in problems students got right W: All rules used in problems students got wrong (that are not in R) Adjust rule relation values Decrease R i  W j relations Increase R i  R k relations Scale adjustment based on confidence in prior

Learning Propagation Each assessment exam may only cover a handful of specific rules in R and W When updating relation for rule R 1  R 2, look for all rules similar to R 1 and all similar to R 2 Assume respective updates for all relations between similar rule pairs, scaled by the magnitude of similarity to R 1 and R 2 Technically, all rules are similar to all others by some degree, but don’t want to update relations every time. Set similarity threshold, which effectively defines clusters around rules.

Constructing Relationship Map Initial pass should be able to automatically find a lot of “similarity” relationships just based on existing structured data Rule names Combined usage in test examples Included in common reagents, chapters, etc. Use book chapters order as initial guess for dependency orders Similarity analysis could reduce rules to ~100? rule “clusters” which is more tractable to manually assign major dependencies not automatically addressed by book chapter order

Open Questions Student knowledge evolves over time, maybe even with one exam. How to hit “moving target” of their current knowledge state? Baseline probabilities of knowing a rule. Random sample of all students? Will differ greatly based on population sample chosen.

SMILES Extensions Atom Mapping Necessary to map reactant to product atoms Proper transform requires balanced stoichiometry Hydrogens generally must be explicitly specified Carboxylic acid +[O:1]=[C:2]([*:9])[O:3][H:7]. Primary amine  [H:8][N:4]([*:10])[H:5]>> Amide +[O:1]=[C:2]([*:9])[N:4]([*:10])[H:5]. Water [H:7][O:3][H:8] R1R1 O OHOH NH-R 2 H + R1R1 O + H 2 O NH-R ,

Transformation Rules  -bond protic acid addition carbocation halide addition Chemical state machine modeling at mechanistic level of detail State information: Molecular structure State transition: Transformation rules SMIRKSDescription [C:1]=[C:2].[H:3][Cl,Br,I:4]>>[+0:3][C:1][C+:2].[Cl,Br,I;-:4] Alkene, Protic Acid Addition [C+:1].[Cl,Br,I;-:2]>>[C+0:1][+0:2] Carbocation, Halide Addition