Direct Kernel Methods. Data mining is the process of automatically extracting valid, novel, potentially useful and ultimately comprehensible information.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Presented at the Alabany Chapter of the ASA February 25, 2004 Washinghton DC.
Neural networks Introduction Fitting neural networks
Aggregating local image descriptors into compact codes
Olivier Duchenne , Armand Joulin , Jean Ponce Willow Lab , ICCV2011.
New Technologies Supporting Technical Intelligence Anthony Trippe, 221 st ACS National Meeting.
An Introduction of Support Vector Machine
Pattern Recognition and Machine Learning: Kernel Methods.
Machine learning continued Image source:
CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
The Center for Signal & Image Processing Georgia Institute of Technology Kernel-Based Detectors and Fusion of Phonological Attributes Brett Matthews Mark.
S-SENCE Signal processing for chemical sensors Martin Holmberg S-SENCE Applied Physics, Department of Physics and Measurement Technology (IFM) Linköping.
A Comprehensive Study on Third Order Statistical Features for Image Splicing Detection Xudong Zhao, Shilin Wang, Shenghong Li and Jianhua Li Shanghai Jiao.
1 DNA Classifications with Self-Organizing Maps (SOMs) Thanakorn Naenna Mark J. Embrechts Robert A. Bress May 2003 IEEE International Workshop on Soft.
Pattern Recognition and Machine Learning
Principal Component Analysis
1 Introduction to Kernels Max Welling October (chapters 1,2,3,4)
Data Mining with Neural Networks
Presented at the Alabany Chapter of the ASA February 25, 2004 Washinghton DC.
Analyze/StripMiner ™ Overview To obtain an idiot’s guide type “analyze > readme.txt” Standard Analyze Scripts Predicting on Blind Data PLS (Please Listen.
EE491D Special Topics in Communications Adaptive Signal Processing Spring 2005 Prof. Anthony Kuh POST 205E Dept. of Elec. Eng. University of Hawaii Phone:
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Data Mining – Intro.
Computer Science Universiteit Maastricht Institute for Knowledge and Agent Technology Data mining and the knowledge discovery process Summer Course 2005.
Classification with Hyperplanes Defines a boundary between various points of data which represent examples plotted in multidimensional space according.
Overview of Kernel Methods Prof. Bennett Math Model of Learning and Discovery 2/27/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Anomaly detection with Bayesian networks Website: John Sandiford.
Knowledge Discovery and Data Mining Evgueni Smirnov.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Face Detection Using Large Margin Classifiers Ming-Hsuan Yang Dan Roth Narendra Ahuja Presented by Kiang “Sean” Zhou Beckman Institute University of Illinois.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Interactive Learning of the Acoustic Properties of Objects by a Robot
ROOT ROOT.PAT ROOT.TES (ROOT.WGT) (ROOT.FWT) (ROOT.DBD) MetaNeural ROOT.XXX ROOT.TTT ROOT.TRN (ROOT.DBD) ROOT.WGT ROOT.FWT Use Analyze root –34 for easy.
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Scientific Data Analysis via Statistical Learning Raquel Romano romano at hpcrd dot lbl dot gov November 2006.
Gist 2.3 John H. Phan MIBLab Summer Workshop June 28th, 2006.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Soft Computing & Computational Intelligence Biologically inspired computing models Compatible with human expertise/reasoning Intensive numerical computations.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Day 17: Duality and Nonlinear SVM Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Data statistics and transformation revision Michael J. Watts
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 6: Artificial Neural Networks for Data Mining.
Brief Intro to Machine Learning CS539
CSE 4705 Artificial Intelligence
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
DATA MINING © Prentice Hall.
Theme Introduction : Learning from Data
School of Computer Science & Engineering
Supervised Time Series Pattern Discovery through Local Importance
Welcome to the Kernel-Club
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Recursively Adapted Radial Basis Function Networks and its Relationship to Resource Allocating Networks and Online Kernel Learning Weifeng Liu, Puskal.
FOUNDATIONS OF BUSINESS ANALYTICS Introduction to Machine Learning
Presentation transcript:

Direct Kernel Methods

Data mining is the process of automatically extracting valid, novel, potentially useful and ultimately comprehensible information from very large databases

Direct Kernel Methods database data prospecting and surveying selected data select transformed data preprocess & transform make model Interpretation& rule formulation The Data Mining Process

How is Data Mining Different? Emphasis on large data sets - Not all data fit in memory (necessarily) - Outlier detection, rare events, errors, missing data, minority classes - Scaling of computation time with data size is an issue - Large data sets: i.e., large number of records and/or large number of attributes fusion of databases Emphasis on finding interesting, novel non-obvious information - It is not necessarily known what exactly one is looking for - Models can be highly nonlinear - Information nuggets can be valuable Different methods - Statistics - Association rules & Pattern recognition - AI - Computational intelligence (neural nets, genetic algorithms, fuzzy logic) - Support vector machines and kernel-based methods - Visualization (SOM, pharmaplots) Emphasis on explaining and feedback Interdisciplinary nature of data mining

Direct Kernel Methods Data Mining Challenges Large data sets - Data sets can be rich in the number of data - Data sets can be rich in the number of attributes Data preprocessing and feature definition - Data representation - Attribute/Feature selection - Transforms and scaling Scientific data mining - Classification, multiple classes, regression - Continuous and binary attributes - Large datasets - Nonlinear Problems Erroneous data, outliers, novelty, and rare events - Erroneous data - Outliers - Rare events - Novelty detection Smart visualization techniques Feature Selection & Rule formulation

Direct Kernel Methods UNDERSTANDING WISDOM DATA INFORMATION KNOWLEDGE

Direct Kernel Methods A Brief History in Data Mining: Pascal  Bayes  Fisher  Werbos  Vapnik The meaning of “Data Mining” changed over time: - Pre 1993: “Data mining is art of torturing the data into a confession” - Post 1993: “Data mining is the art of charming the data into confession” From the supermarket scanner to the human genome - Pre 1998: Database marketing and marketing driven applications - Post 1998: The emergence of scientific data mining From AI expert systems  data-driven expert systems: - Pre 1990: The experts speak (AI Systems) - Post 1995: Attempts to let the data to speak for themselves : The data speak … A brief history of statistics and statistical learning theory: - From the calculus of chance to the calculus of probabilities (Pascal  Bayes) - From probabilities to statistics (Bayes  Fisher) - From statistics to machine learning (Fisher & Tuckey  Werbos  Vapnik) From theory to application

Data Preparation - Missing data - Data cleansing - Visualization - Data transformation Clustering/Classification Statistics Factor analysis/Feature selection Associations Regression models Data driven expert systems Meta-Visualization/Interpretation Database Marketing Finance Health Insurance Medicine Bioinformatics Manufacturing WWW Agents Text Retrieval Data Mining Applications and Operations “Homeland” “Security” BioDefense

Direct Kernel Methods Direct Kernel Methods for Data Mining: Outline Classical (linear) regression analysis and the learning paradox Resolving the learning paradox by - Resolving the rank deficiency (e.g., PCA) - Regularization (e.g., Ridge Regression) Linear and nonlinear kernels Direct kernel methods for nonlinear regression - Direct Kernel Principal Component Analysis  DK-PCA - (Direct) Kernel Ridge Regression  Least Squares SVM (LS-SVM) - Direct Kernel Partial Least Squares  Partial Least-Squares SVM - Direct Kernel Self-Organizing Maps  DK-SOM Feature selection, memory requirements, hyperparameter selection Examples: - Nonlinear toy examples (DK-PCA Haykin’s Spiral, LS-SVM for Cherkassky data) - K-PLS for Time series data - K-PLS for QSAR drug design - LS-SVM Nerve agent classification with electronic nose - K-PLS with feature selection on microarray gene expression data (leukemia) - Direct Kernel SOM and DK-PLS for Magnetocardiogram data - Direct Kernel SOM for substance identification from spectrograms

Direct Kernel Methods Outline Classical (linear) regression analysis and the learning paradox Resolving the learning paradox by - Resolving the rank deficiency (e.g., PCA) - Regularization (e.g., Ridge Regression) Linear and nonlinear kernels Direct kernel methods for nonlinear regression - Direct Kernel Principal Component Analysis  DK-PCA - (Direct) Kernel Ridge Regression  Least Squares SVMs (LS-SVM) - Direct Kernel Partial Least Squares  Partial Least-Squares SVMs - Direct Kernel Self-Organizing Maps  DK-SOM Feature selection, memory requirements, hyperparameter selection Examples: - Nonlinear toy examples (DK-PCA Haykin’s Spiral, LS-SVM for Cherkassky data) - K-PLS for Time series data - K-PLS for QSAR drug design - LS-SVM Nerve agent classification with electronic nose - K-PLS with feature selection on microarray gene expression data (leukemia) - Direct Kernel SOM and DK-PLS for Magnetocardiogram data

Direct Kernel Methods Review: What is in a Kernel? A kernel can be considered as a (nonlinear) data transformation - Many different choices for the kernel are possible - The Radial Basis Function (RBF) or Gaussian kernel is an effective nonlinear kernel The RBF or Gaussian kernel is a symmetric matrix - Entries reflect nonlinear similarities amongst data descriptions - As defined by:

Docking Ligands is a Nonlinear Problem

Direct Kernel Methods Surface properties are encoded on e/au 3 surface Breneman, C.M. and Rhem, M. [1997] J. Comp. Chem., Vol. 18 (2), p Histograms or wavelet encoded of surface properties give Breneman’s TAE property descriptors 10x16 wavelet descriptore Electron Density-Derived TAE-Wavelet Descriptors PIP (Local Ionization Potential) Histograms Wavelet Coefficients

Direct Kernel Methods Binding affinities to human serum albumin (HSA): log K’hsa Gonzalo Colmenarejo, GalaxoSmithKline J. Med. Chem. 2001, 44, molecules, descriptors 84 training, 10 testing (1 left out) 551 Wavelet + PEST + MOE descriptors Widely different compounds Acknowledgements: Sean Ekins (Concurrent) N. Sukumar (Rensselaer)

Direct Kernel Methods Validation Model: 100x leave 10% out validations

Direct Kernel Methods PLS, K-PLS, SVM, ANN Feature Selection (data strip mining)

Direct Kernel Methods 511 features 32 features K-PLS Pharmaplots

Direct Kernel Methods Microarray Gene Expression Data for Detecting Leukemia 38 data for training 36 data for testing Challenge: select ~10 out of 6000 genes used sensitivity analysis for feature selection (with Kristin Bennett)

Direct Kernel Methods

with Wunmi Osadik and Walker Land (Binghamton University) Acknowledgement: NSF

Direct Kernel Methods Magnetocardiography at CardioMag Imaging inc.

Direct Kernel Methods Left: Filtered and averaged temporal MCG traces for one cardiac cycle in 36 channels (the 6x6 grid). Right Upper: Spatial map of the cardiac magnetic field, generated at an instant within the ST interval. Right Lower: T3-T4 sub-cycle in one MCG signal trace

Direct Kernel Methods Magneto-cardiogram Data with Karsten Sternickel (Cardiomag Inc.) and Boleslaw Szymanski (Rensselaer) Acknowledgemnent: NSF SBIR phase I project

Direct Kernel Methods SVMLib Linear PCA Direct Kernel PLS SVMLib

Direct Kernel Methods Direct Kernel PLS with 3 Latent Variables

Direct Kernel Methods Direct Kernel with Robert Bress and Thanakorn Naenna

GATCAATGAGGTGGACACCAGAGGCGGGGACTTGTAAATAACACTGGGCTGTAGGAGTGA TGGGGTTCACCTCTAATTCTAAGATGGCTAGATAATGCATCTTTCAGGGTTGTGCTTCTA TCTAGAAGGTAGAGCTGTGGTCGTTCAATAAAAGTCCTCAAGAGGTTGGTTAATACGCAT GTTTAATAGTACAGTATGGTGACTATAGTCAACAATAATTTATTGTACATTTTTAAATAG CTAGAAGAAAAGCATTGGGAAGTTTCCAACATGAAGAAAAGATAAATGGTCAAGGGAATG GATATCCTAATTACCCTGATTTGATCATTATGCATTATATACATGAATCAAAATATCACA CATACCTTCAAACTATGTACAAATATTATATACCAATAAAAAATCATCATCATCATCTCC ATCATCACCACCCTCCTCCTCATCACCACCAGCATCACCACCATCATCACCACCACCATC ATCACCACCACCACTGCCATCATCATCACCACCACTGTGCCATCATCATCACCACCACTG TCATTATCACCACCACCATCATCACCAACACCACTGCCATCGTCATCACCACCACTGTCA TTATCACCACCACCATCACCAACATCACCACCACCATTATCACCACCATCAACACCACCA CCCCCATCATCATCATCACTACTACCATCATTACCAGCACCACCACCACTATCACCACCA CCACCACAATCACCATCACCACTATCATCAACATCATCACTACCACCATCACCAACACCA CCATCATTATCACCACCACCACCATCACCAACATCACCACCATCATCATCACCACCATCA CCAAGACCATCATCATCACCATCACCACCAACATCACCACCATCACCAACACCACCATCA CCACCACCACCACCATCATCACCACCACCACCATCATCATCACCACCACCGCCATCATCA TCGCCACCACCATGACCACCACCATCACAACCATCACCACCATCACAACCACCATCATCA CTATCGCTATCACCACCATCACCATTACCACCACCATTACTACAACCATGACCATCACCA CCATCACCACCACCATCACAACGATCACCATCACAGCCACCATCATCACCACCACCACCA CCACCATCACCATCAAACCATCGGCATTATTATTTTTTTAGAATTTTGTTGGGATTCAGT ATCTGCCAAGATACCCATTCTTAAAACATGAAAAAGCAGCTGACCCTCCTGTGGCCCCCT TTTTGGGCAGTCATTGCAGGACCTCATCCCCAAGCAGCAGCTCTGGTGGCATACAGGCAA CCCACCACCAAGGTAGAGGGTAATTGAGCAGAAAAGCCACTTCCTCCAGCAGTTCCCTGT GATCAATGAGGTGGACACCAGAGGCGGGGACTTGTAAATAACACTGGGCTGTAGGAGTGA TGGGGTTCACCTCTAATTCTAAGATGGCTAGATAATGCATCTTTCAGGGTTGTGCTTCTA TCTAGAAGGTAGAGCTGTGGTCGTTCAATAAAAGTCCTCAAGAGGTTGGTTAATACGCAT GTTTAATAGTACAGTATGGTGACTATAGTCAACAATAATTTATTGTACATTTTTAAATAG CTAGAAGAAAAGCATTGGGAAGTTTCCAACATGAAGAAAAGATAAATGGTCAAGGGAATG GATATCCTAATTACCCTGATTTGATCATTATGCATTATATACATGAATCAAAATATCACA CATACCTTCAAACTATGTACAAATATTATATACCAATAAAAAATCATCATCATCATCTCC ATCATCACCACCCTCCTCCTCATCACCACCAGCATCACCACCATCATCACCACCACCATC ATCACCACCACCACTGCCATCATCATCACCACCACTGTGCCATCATCATCACCACCACTG TCATTATCACCACCACCATCATCACCAACACCACTGCCATCGTCATCACCACCACTGTCA TTATCACCACCACCATCACCAACATCACCACCACCATTATCACCACCATCAACACCACCA CCCCCATCATCATCATCACTACTACCATCATTACCAGCACCACCACCACTATCACCACCA CCACCACAATCACCATCACCACTATCATCAACATCATCACTACCACCATCACCAACACCA CCATCATTATCACCACCACCACCATCACCAACATCACCACCATCATCATCACCACCATCA CCAAGACCATCATCATCACCATCACCACCAACATCACCACCATCACCAACACCACCATCA CCACCACCACCACCATCATCACCACCACCACCATCATCATCACCACCACCGCCATCATCA TCGCCACCACCATGACCACCACCATCACAACCATCACCACCATCACAACCACCATCATCA CTATCGCTATCACCACCATCACCATTACCACCACCATTACTACAACCATGACCATCACCA CCATCACCACCACCATCACAACGATCACCATCACAGCCACCATCATCACCACCACCACCA CCACCATCACCATCAAACCATCGGCATTATTATTTTTTTAGAATTTTGTTGGGATTCAGT ATCTGCCAAGATACCCATTCTTAAAACATGAAAAAGCAGCTGACCCTCCTGTGGCCCCCT TTTTGGGCAGTCATTGCAGGACCTCATCCCCAAGCAGCAGCTCTGGTGGCATACAGGCAA CCCACCACCAAGGTAGAGGGTAATTGAGCAGAAAAGCCACTTCCTCCAGCAGTTCCCTGT WORK IN PROGRESS

Direct Kernel Methods Santa Fe Time Series Prediction Competition 1994 Santa Fe Institute Competition: 1000 data chaotic laser data, predict next 100 data Competition is described in Time Series Prediction: Forecasting the Future and Understanding the Past, A. S. Weigend & N. A. Gershenfeld, eds., Addison-Wesley, 1994 Method: - K-PLS with  = 3 and 24 latent variables - Used records with 40 past data for training for next point - Predictions bootstrap on each other for 100 real test data Entry “would have won” the competition

Direct Kernel Methods Kristin Bennett and Mark Embrechts