KDD Cup 2004 Winning Model for Task 1: Particle Physics Prediction David S. Vogel: MEDai / AI Insight, University of Central Florida Eric Gottschalk: MEDai.

Slides:



Advertisements
Similar presentations
Week 3. Logistic Regression Overview and applications Additional issues Select Inputs Optimize complexity Transforming Inputs.
Advertisements

Multiple Linear Regression
Predictive Modeling for Disability Pricing May 13, 2009 Claim Analytics Inc. Barry Senensky FSA FCIA MAAA Jonathan Polon FSA
LINEAR REGRESSION: What it Is and How it Works Overview What is Bivariate Linear Regression? The Regression Equation How It’s Based on r.
LINEAR REGRESSION: What it Is and How it Works. Overview What is Bivariate Linear Regression? The Regression Equation How It’s Based on r.
x – independent variable (input)
Evaluating data quality issues from an industrial data set Gernot Liebchen Bheki Twala Mark Stephens Martin Shepperd Michelle.
Classification for High Dimensional Problems Using Bayesian Neural Networks and Dirichlet Diffusion Trees Radford M. Neal and Jianguo Zhang the winners.
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 7: Expert Systems and Artificial Intelligence Decision Support.
Machine Learning: Symbol-Based
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
Classification and Prediction: Basic Concepts Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Classification and Prediction: Regression Analysis
Gavin Russell-Rockliff BI Technical Specialist Microsoft BIN305.
Quantitative Business Analysis for Decision Making Multiple Linear RegressionAnalysis.
Classifiers, Part 3 Week 1, Video 5 Classification  There is something you want to predict (“the label”)  The thing you want to predict is categorical.
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Chapter 3 Data Exploration and Dimension Reduction 1.
1 1 Slide © 2009 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
by B. Zadrozny and C. Elkan
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Overview of Data Mining Methods Data mining techniques What techniques do, examples, advantages & disadvantages.
Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates David Speights Senior Research Statistician HNC Insurance.
Chapter 7 Neural Networks in Data Mining Automatic Model Building (Machine Learning) Artificial Intelligence.
METHODS IN BEHAVIORAL RESEARCH NINTH EDITION PAUL C. COZBY Copyright © 2007 The McGraw-Hill Companies, Inc.
Look-ahead Linear Regression Trees (LLRT)
M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Ensemble Methods: Bagging and Boosting
An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes Members: Hung-Yi Lo, Kai-Wei.
Computational Intelligence: Methods and Applications Lecture 20 SSV & other trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Industrial Data Modeling with DataModeler Mark Kotanchek Evolved Analytics
Chapter 3, Part B Descriptive Statistics: Numerical Measures n Measures of Distribution Shape, Relative Location, and Detecting Outliers n Exploratory.
M Machine Learning F# and Accord.net.
XIAO WU DATA ANALYSIS & BASIC STATISTICS.
Brian Lukoff Stanford University October 13, 2006.
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Special Challenges With Large Data Mining Projects CAS PREDICTIVE MODELING SEMINAR Beth Fitzgerald ISO October 2006.
© Galit Shmueli and Peter Bruce 2010 Chapter 6: Multiple Linear Regression Data Mining for Business Analytics Shmueli, Patel & Bruce.
Sports Market Research. Know Your Customer How do businesses know their customers needs and wants?  Ask them/talking to customers  Surveys  Questionnaires.
Bar Graphs Used to compare amounts, or quantities The bars provide a visual display for a quick comparison between different categories.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
CPH Dr. Charnigo Chap. 11 Notes Figure 11.2 provides a diagram which shows, at a glance, what a neural network does. Inputs X 1, X 2,.., X P are.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Data Mining Concept Submitted TO: Mrs. MONIKA SUBMITTED BY: SHALU 4717.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Chapter 12 Understanding Research Results: Description and Correlation
Outlier Detection Identifying anomalous values in the real- world database is important both for improving the quality of original data and for reducing.
Artificial Intelligence, P.II
Boosting and Additive Trees (2)
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.
Machine learning in Action: Unpacking the Biographical Questionnaire
Regression and Classification Analysis for Improved Life Insurance Underwriting – reducing information requirements to improve enrollment Script: Hello,
Basic Intro Tutorial on Machine Learning and Data Mining
Dr. Morgan C. Wang Department of Statistics
Team 2: Graham Leech, Austin Woods, Cory Smith, Brent Niemerski
Predict Failures with Developer Networks and Social Network Analysis
Alain Goossens & Jean-Pierre Van Loo Data scientists – SII Belgium
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Classification & Prediction
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
InCyber Block Diagram User “1 User “n” Customer Site
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Machine Learning Algorithms – An Overview
Machine Learning – a Probabilistic Perspective
March Madness Data Crunch Overview
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
What is Artificial Intelligence?
Presentation transcript:

KDD Cup 2004 Winning Model for Task 1: Particle Physics Prediction David S. Vogel: MEDai / AI Insight, University of Central Florida Eric Gottschalk: MEDai / AI Insight Morgan C. Wang: University of Central Florida Orlando, FL

What did we know? Given 12 million numbers. No information given about what these numbers represent. No knowledge of particle physics. Predict 100,000 ones and zeros.

Unsuccessful Modeling Packages Software #1: Tree-based boosting algorithms Software #2: Logistic Regression and Neural Networks Software #3: Support Vector Machines Software #4: Rule-finding algorithms

Key Modeling Tools MITCH (Multiple Intelligent Tasking Computer Heuristics) –Used for its visualizations, variable analysis, transformations, Neural Networks, and scoring tools. NICA (Numerical Interaction CAlibrator) –Used to detect interactions within the data.

Category Analysis Nearly one tenth of records are 100% predictive. Values of Variable #63N# Class 0# Class 1 {-8,-2,1,14}2350 (4.7%)02350 {8,2,-1,-14}2294 (4.6%)22940

Investigation of Variables Group 1: 8 variables with values {-1,0,1}. Interactive and symmetric. Group 2: A key nominal variable. Group 3: 6 individually predictive variables. Group 4: All others variables, no correlation to dependent variable.

Complete Interaction Search Variable 1Variable 2Z-Score V01V V65V V01V V04V V05V V04V V23V V19V :::

Predictor V01: r=.006 V01 Class 1 Probability

Predictor V01 where V04=1 V01 Class 1 Probability

Predictor V01 where V04=-1 V01 Class 1 Probability

Predictor V04*(V ): r=.23 V01 Class 1 Probability

Interactions between variables: Red: Extremely Strong Green: Strong Yellow: Moderate (p<.01)

Details of 639 Predictors Majority of original variables (after null value replacement) 100% predictive groups High volume categories of the nominal variable 2 variables indicating null values 72 first order interactions 185 second order interactions 301 third order interactions

Model Details 40,000 training cases 10,000 validation cases MITCH Self-Organizing Neural Network “Bernoulli” function optimization generally performed the best Generalized extremely well on validation set, considering the number of variables Small secondary model based on residuals

Customization Severe penalty for incorrect probabilities of 0 or 1: a “googol”!!! “Gimmees” forced to be at or Accept 9300 tiny penalties to avoid risking “disaster.” 14 teams had a “disaster.” Remaining predictions truncated at 0.01 and 0.99 to compensate for over-fitting at extremes.

Customization (continued) Q-Score predictions were maximized by retraining with a “creative” optimization function: (Predicted – Actual) ^ 6. Predictions re-calibrated using the function:

Where do we go from here? Accuracy -- independent of content Scientific & Industry Applications

Questions?