陆文聪 Data Mining Applied to Chemistry and chemical engineering Department of Chemistry, College of Sciences, Shanghai University, P. R. China.

Slides:



Advertisements
Similar presentations
The Robert Gordon University School of Engineering Dr. Mohamed Amish
Advertisements

Introduction to Support Vector Machines (SVM)
Analysis of High-Throughput Screening Data C371 Fall 2004.
ECG Signal processing (2)
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Automatic classification of weld cracks using artificial intelligence and statistical methods Ryszard SIKORA, Piotr BANIUKIEWICZ, Marcin CARYK Szczecin.
Pattern Recognition and Machine Learning
A SOFTWARE TOOL DEVELOPED FOR THE CLASSIFICATION OF REMOTE SENSING SPECTRAL REFLECTANCE DATA Abdullah Faruque School of Computing & Software Engineering.
An Introduction of Support Vector Machine
« هو اللطیف » By : Atefe Malek. khatabi Spring 90.
Groundwater 3D Geological Modeling: Solving as Classification Problem with Support Vector Machine A. Smirnoff, E. Boisvert, S. J.Paradis Earth Sciences.
S-SENCE Signal processing for chemical sensors Martin Holmberg S-SENCE Applied Physics, Department of Physics and Measurement Technology (IFM) Linköping.
AUTOMATIC SPEECH CLASSIFICATION TO FIVE EMOTIONAL STATES BASED ON GENDER INFORMATION ABSTRACT We report on the statistics of global prosodic features of.
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
Fei Xing1, Ping Guo1,2 and Michael R. Lyu2
São Paulo Advanced School of Computing (SP-ASC’10). São Paulo, Brazil, July 12-17, 2010 Looking at People Using Partial Least Squares William Robson Schwartz.
A Study on Feature Selection for Toxicity Prediction*
1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Quantitative Structure-Activity Relationships (QSAR) Comparative Molecular Field Analysis (CoMFA) Gijs Schaftenaar.
Multidimensional Analysis If you are comparing more than two conditions (for example 10 types of cancer) or if you are looking at a time series (cell cycle.
Support Vector Machines Kernel Machines
Support Vector Machines and Kernel Methods
Multilayer feed-forward artificial neural networks for Class-modeling F. Marini, A. Magrì, R. Bucci Dept. of Chemistry - University of Rome “La Sapienza”
Computer Science Universiteit Maastricht Institute for Knowledge and Agent Technology Data mining and the knowledge discovery process Summer Course 2005.
Introduction to machine learning
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Classification with Hyperplanes Defines a boundary between various points of data which represent examples plotted in multidimensional space according.
Australian Journal of Basic and Applied Sciences, 5(11): , 2011 ISSN Monte Carlo Optimization to Solve a Two-Dimensional Inverse Heat.
IE 585 Introduction to Neural Networks. 2 Modeling Continuum Unarticulated Wisdom Articulated Qualitative Models Theoretic (First Principles) Models Empirical.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,
Identifying Applicability Domains for Quantitative Structure Property Relationships Mordechai Shacham a, Neima Brauner b Georgi St. Cholakov c and Roumiana.
EMBC2001 Using Artificial Neural Networks to Predict Malignancy of Ovarian Tumors C. Lu 1, J. De Brabanter 1, S. Van Huffel 1, I. Vergote 2, D. Timmerman.
Paola Gramatica, Elena Bonfanti, Manuela Pavan and Federica Consolaro QSAR Research Unit, Department of Structural and Functional Biology, University of.
QSAR Study of HIV Protease Inhibitors Using Neural Network and Genetic Algorithm Akmal Aulia, 1 Sunil Kumar, 2 Rajni Garg, * 3 A. Srinivas Reddy, 4 1 Computational.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Design of PCA and SVM based face recognition system for intelligent robots Department of Electrical Engineering, Southern Taiwan University, Tainan County,
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Reservoir Uncertainty Assessment Using Machine Learning Techniques Authors: Jincong He Department of Energy Resources Engineering AbstractIntroduction.
Data Mining and Decision Support
Use of Machine Learning in Chemoinformatics
An Improved Algorithm for Decision-Tree-Based SVM Sindhu Kuchipudi INSTRUCTOR Dr.DONGCHUL KIM.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
Computacion Inteligente Least-Square Methods for System Identification.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
Strategies for Metabolomic Data Analysis Dmitry Grapov, PhD.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.
CS 9633 Machine Learning Support Vector Machines
Data Transformation: Normalization
INVESTIGATION OF SQUEEZE CAST AA7075-B4C COMPOSITES
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
Machine Learning. Support Vector Machines A Support Vector Machine (SVM) can be imagined as a surface that creates a boundary between points of data.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
DESIGN OF EXPERIMENTS by R. C. Baker
Chapter 1: Introduction to Chemistry
SVMs for Document Ranking
Presentation transcript:

陆文聪 Data Mining Applied to Chemistry and chemical engineering Department of Chemistry, College of Sciences, Shanghai University, P. R. China

2 1 Introduction 1.1 Concept Data Mining is an analytic process designed to explore data in search of consistent patterns and/or systematic relationships

3 between variables, and then to validate the findings by applying the detected patterns to new subsets of data.

4 1.2 Main Focuses (1) Materials design How to find the best conditions of preparation or the structure-property relationship of materials, in order to make experimental design for new materials preparation or to predict the physico-chemical properties of unknown materials systems.

5 (2) Molecular design How to find the structure-active relationship of molecules, in order to design new compounds with expected biological activities or predict the physico- chemical properties of unknown molecules.

6 (3) Industrial optimization How to acquire the optimized conditions of processing productions, in order to achieve the good results of industrial production.

7 (1) Optimal map recognition The projection map with best separability can be selected out according to the rate of correctness for classification. 2. Methods in MASTER

8 Fig.1 OMR Comparison to PCA (a) Classification diagram by using Optimal Map Recognition (OMR) (b) Classification diagram by using Pincipal Component Analysis (PCA)

9 HP Model can be created in such a way that the optimal zone can be expressed by a series of inequalities to describe the boundaries of two types of samples. (2) Hyper-polyhedron (HP)

10 Fig.2 Conceptual HP model

11 (3) Optimal projection regression (OPR) The OPR method is a quantitative model with the data fusion of regression and Optimal Map Recognition (OMR) method. It utilizes the information of classification of data set to select the most appropriate features for regression.

12 Fig.3 Conceptual OPR model X1X1 X2X2 Projection from hyperspace to 2- dimensional space

13 (4) Inverse projection Fig.4 Projection from 2-dimensional space to hyperspace

14 (5) Hierachical projection model Fig.5 Conceptual hierachical projection

15 (6) Support Vector Machine Support Vector Classification:

16 回归超平面 支持向量 支持向量超曲面 支持向量 不敏感通道 Support Vector Regression:

17 3 Examples of Application 3.1 Applications in Materials Design (1) Optimization of high temperature superconductor A nonlinear function based on 5 terms with the PRESS value of was obtained. By using inverse projection and OPR method, the critical temperature was promoted from 116 K to 121 K.

18 Inverse projection result of high temperature superconductor

19 (2) Composition design of rare-earth containing phosphor By extrapolation we obtained a series of new compositions located outside of the scope of German patents. Our experimental work confirmed that the brightness of these newly designed phosphor was higher than those the German patents had declared.

Importance of features

Classification diagram using Fisher method

22 (3) Optimization of VPTC ceramic semiconductors By using MASTER, some proposed new composition and technological condition of VPTC materials gave much better result: the ratio of the electric resistance at 273K and minimum resistance was elevated from 20 to 27.3.

23 Partial Least Square (PLS) result of VPTC ceramic semiconductors

24 (4) Composition design of cathode materials of Ni/H battery By using Support Vector Machine (SVM), the mathematical models with powerful prediction ability had been built, and new formulations were predicted and proved by experiments.

25 Cal. vs Exp. values of C 400 /C 0

26 (5) Formation condition for amorphous phase of ternary fluorides By using OMR method, the inequalities obtained were used to predict whether a new ternary fluoride could form amorphous phase or not. The results predicted were in agreement with the experimental ones.

27 OMR result of formation condition for amorphous phase of ternary fluorides

28 (6) Formation condition of ternary intermetallic compounds Using 2400 known phase diagrams as training set, the regularities of formation condition of ternary intermetallic compounds were found. A series of newly discovered ternary intermetallic compounds were “predicted” in this way with good results.

29 OMR result of formation condition of ternary intermetallic compounds

30 (1) Molecular screening of guanidine compounds The Hyper polyhedron (HP) and Support Vector Classification (SVC) methods were used for the computer-aided molecular screening of guanidine compounds. It was found that the predicted results of HP and SVC were better than those of the PCA, KNN and FDV methods etc. 3.2 Applications in Molecular design

31 (2) Structure-activity relationship of antagonists SVC was used to investigate SAR of 26 compounds of antagonists. The results of leave-one-out cross-validation proved that the prediction ability of SVC method was better than those of the PCA, KNN and FDV methods etc.

32 (3) Molecular screening of triazoles compounds (1) OMR model was used for the molecular screening of new triazoles compounds with probable higher anti- fungicidal activities. (2) The predicted results of SVC were better than those of the PCA, KNN and FDV methods etc.

33 (4) Structure-property relationship of azo dyestuff Support Vector Regression (SVR) method was employed to predict the absorption maximum wavelength of 37 azo dyestuff molecules. The mean relative error is 4.22% for the training set and 4.52% for the predicted set, respectively.

Applications in industrial optimization (1) Optimization of nitriding technique for crankshaft production The problem is that the surface hardness of crankshaft products in the Factory of Wuxi Diesel Engine was too low. It was found that there existed an “optimal zone” in the multidimensional feature space. After optimization, the rate of rejection decreased from 1.7% to 0.3%.

35 (2) Springback prediction in sheet metal forming MASTER combining with FEA software (ANSYS/LS-DYNA 5.71) was used to predict the springback in V-type sheet steel forming. The relative error of springback predicted could be controlled within 10% compared with the experiments.

36 4 Conclusion (1) MASTER software package is a comprehensive system consisting of orthogonal design, statistical analysis, data visualization, pattern recognition, regression analysis, artificial neural networks (ANN) and support vector machine (SVM) etc.

37 4 Conclusion (2) MASTER could be used to optimize the formula and technological conditions predict the biological activities and physico-chemical properties improve the product quality and analyze the fault of processing production.

38 Thank you