Statistical Learning Introduction: Data Mining Process and Modeling Examples Data Mining Process.

Slides:



Advertisements
Similar presentations
Market Research Ms. Roberts 10/12. Definition: The process of obtaining the information needed to make sound marketing decisions.
Advertisements

INTRODUCTION TO MACHINE LEARNING David Kauchak CS 451 – Fall 2013.
Comparison of Data Mining Algorithms on Bioinformatics Dataset Melissa K. Carroll Advisor: Sung-Hyuk Cha March 4, 2003.
Jeff Howbert Introduction to Machine Learning Winter Collaborative Filtering Nearest Neighbor Approach.
1 How Business Intelligence and Mapping can Improve Your Business: Customer and Competitive Analysis Monica L. Perry.
G54DMT – Data Mining Techniques and Applications Dr. Jaume Bacardit
Sharing Enterprise Data Data administration Data administration Data downloading Data downloading Data warehousing Data warehousing.
Error detection and concealment for Multimedia Communications Senior Design Fall 06 and Spring 07.
Tattletale Toy Company ®
Rubi’s Motivation for CF  Find a PhD problem  Find “real life” PhD problem  Find an interesting PhD problem  Make Money!
Statistical Learning Introduction: Data Mining Process and Modeling Examples Data Mining Process.
Business plan cum start-up competition Sample template
Quest for $1,000,000: The Netflix Prize Bob Bell AT&T Labs-Research July 15, 2009 Joint work with Chris Volinsky, AT&T Labs-Research and Yehuda Koren,
Approval System (Workflow) Tender Information System Bid Selection Tool Pre-Qualification Portal Vendor Mgmt System Tendering Software Risk Mgmt System.
SELECTING THE RIGHT TARGET MARKET Entrp 1: Lecture 4.
Introduction to machine learning
Lecture 29 Electronic Business (MGT-485). Affiliate Programs.
Comparison of Classification Methods for Customer Attrition Analysis Xiaohua Hu, Ph.D. Drexel University Philadelphia, PA, 19104
On Demand Data Mining OD D M ™ from LaserSearch, Inc.
Statistical Learning Introduction: Modeling Examples.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Matrix Factorization Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CLEANROOM SOFTWARE ENGINEERING.
Performance of Recommender Algorithms on Top-N Recommendation Tasks RecSys 2010 Intelligent Database Systems Lab. School of Computer Science & Engineering.
INTRODUCTION TO THE WORLD OF ONLINE Bright Digital Minds Delivering Great Digital Results.
By Rachsuda Jiamthapthaksin 10/09/ Edited by Christoph F. Eick.
Pattern Recognition & Machine Learning Debrup Chakraborty
Netflix Prize and Heritage Health Prize Philip Chan.
Chapter 16 Conducting & Reading Research Baumgartner et al Chapter 16 Developing the Research Proposal.
Sage Bionetworks A non-profit organization with a vision to enable networked team approaches to building better models of disease BIOMEDICINE INFORMATION.
On-line data submission training California Partnership for Achieving Student Success.
Click here to advance to the next slide.. Chapter 5 Entrepreneurship Section 5.2 The Business Plan.
The Main Idea Once an entrepreneur discovers a good business opportunity, the next step is to do market research. Market research helps to determine.
Market Analysis CHAPTER 6 BBUSS 2403 BUSINESS PLANNING 3-1.
FINDING TALENT Chase-Creating your network Dave-Organizing Ryan- Skill-set Stephanie-Persuasion Tasia-Innovation.
Netflix Netflix is a subscription-based movie and television show rental service that offers media to subscribers: Physically by mail Over the internet.
Compare data from two time periods Descriptive statistics My Law Minitab software Prepared by: Mark J. Nigrini Copyright © 2012 by Mark J. Nigrini. All.
RecBench: Benchmarks for Evaluating Performance of Recommender System Architectures Justin Levandoski Michael D. Ekstrand Michael J. Ludwig Ahmed Eldawy.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Formulating a Simulation Project Proposal Chapter3.
Sage Bionetworks A non-profit organization with a vision to enable networked team approaches to building better models of disease BIOMEDICINE INFORMATION.
MM271 Introduction to Marketing Topic 4 Identifying Market Segments & Targets.
Collaborative Filtering with Temporal Dynamics Yehuda Koren Yahoo Research Israel KDD’09.
ESL Chap1 - Introduction Statistical Learning Problems Identify the risk factors for prostate cancer, based on clinical and demographic variables.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College LAPP-Top Computer Science February 2005.
Certification for Enterprise Account Services Class One.
CSCI 347, Data Mining Evaluation: Training and Testing, Section 5.1, pages
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
© 2014 IBM Corporation IBM SPSS Modeler Gold on Cloud Jump Start Service.
Collaborative Filtering with Temporal Dynamics Yehuda Koren Yahoo! Israel KDD 2009.
24 October 2002Data Mining & Visualization1 Data Mining and Visualization Jeremy Walton NAG Ltd, Oxford.
Welcome to Using the Compensation Feature of Talent in ADP Workforce Now Technical Assistance  If you encounter problems during this Live Meeting, call.
Introduction Our mission. Company introduction Location The team Technology Skill.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
October Rotation Paper-Reviewer Matching Dina Elreedy Supervised by: Prof. Sanmay Das.
Welcome Today’s Presenters:
Marketing Research.
Statistics 202: Statistical Aspects of Data Mining
Histograms CSE 6363 – Machine Learning Vassilis Athitsos
Statistical Learning Introduction: Modeling Examples
Statistical Learning Introduction: Modeling Examples
Key Account Sales Methodology
Q4 : How does Netflix recommend movies?
Ensembles.
Marketing & Sales Strategy & Planning.
Data Mining Ensembles Last modified 1/9/19.
Welcome slide (insert title)
NAÏVE BAYES CLASSIFICATION
Presentation transcript:

Statistical Learning Introduction: Data Mining Process and Modeling Examples Data Mining Process

We can see associations between customer type and fraudulent behavior. Are they legitimate? Data leakage? Our goal is to build model to predict fraud in advance

Targeting, Sales force mgmt. Business problem definition Wallet / opportunity estimation Modeling problem definition Quantile est., Latent variable est. Statistical problem definition Quantile est., Graphical model Modeling methodology design Programming, Simulation, IBM Wallets Model generation & validation OnTarget, MAP Implementation & application development Project evolution and relevance to our course Outside scope Keep in mind This is our domain!

Predict whether someone will have a heart attack on the basis of demographic, diet and clinical measurements

ESL Chap1 - Introduction Identify the risk factors for prostate cancer (lpsa), based on clinical and demographic variables.

Classify a recorded phoneme, based on a log-periodogram. A restricted model (red) does much better than an unrestricted one (jumpy black)

Predict whether someone will have a heart attack on the basis of demographic, diet and clinical measurements

Customize an spam detection system. X = which words appear and how much Y = Spam or not?

Identify the numbers in a handwritten zip code, from a digitized image X = color of each pixel Y = which digit is it?

Classify a tissue sample into one of several cancer classes, based on a gene expression profile. X = expression levels of genes Y = which cancer?

Classify the pixels in a LANDSAT image, according to usage: Y = {red soil, cotton, vegetation stubble, mixture, gray soil, damp gray soil, very damp gray soil} X = values of pixels in several wavelength bands

October 2006 Announcement of the NETFLIX Competition USAToday headline: “Netflix offers $1 million prize for better movie recommendations” Details: Beat NETFLIX current recommender model ‘Cinematch’ by 10% based on absolute rating error prior to 2011 $50K for the annual progress price (relative to baseline) Data contains a subset of 100 million movie ratings from NETFLIX including 480,189 users and 17,770 movies Performance is evaluated on holdout movies-users pairs NETFLIX competition has attracted contestants on teams from 180 different countries valid submissions from 4336 different teams Leaderboard: current best result is 9.63% better than baseline (getting close!)Leaderboard

All movies (80K) All users (6.8 M) NETFLIX Competition Data 17K Selection unclear 480 K At least 20 Ratings by end M ratings Data Overview: NETFLIX Internet Movie Data Base Fields Title Year Actors Awards Revenue …

17K movies Training Data Movie Arrival 1998 Time 2005 User Arrival 45? 3 2 ? Qualifier Dataset 3M NETFLIX data generation process

Netflix and us We will hear talks about Netflix and also work with the data throughout our course: We will have a modeling challenge in our course which will use the Netflix data. The winners will get a grade boost! –You are all also welcome to try your hand at winning the $1M… Both yearly $50K prizes were awarded to a team from AT&T, with an Israeli participant (Yehuda Koren) –He is now back in Israel, and will give us a talk! While I was at IBM Research, our team won a related competition in 2007 (same data, more “standard” modeling tasks) –We will probably have a “case study” lecture on that as well