Data Science in Industry

Slides:

Advertisements

Similar presentations

Rerun of machine learning Clustering and pattern recognition.

Advertisements

CHAPTER 40 Probability.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for 1 Lecture Notes for E Alpaydın 2010.

Uncertainty in Engineering The presence of uncertainty in engineering is unavoidable. Incomplete or insufficient data Design must rely on predictions or.

MIS2502: Data Analytics Association Rule Mining. Uses What products are bought together? Amazon’s recommendation engine Telephone calling patterns Association.

Chapter 9 Business Intelligence Systems

Chapter Extension 14 Database Marketing © 2008 Pearson Prentice Hall, Experiencing MIS, David Kroenke.

Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.

BCOR 1020 Business Statistics Lecture 8 – February 12, 2007.

Chapter 5 Data mining : A Closer Look.

Beyond Opportunity; Enterprise Miner Ronalda Koster, Data Analyst.

Bayesian Decision Theory Making Decisions Under uncertainty 1.

Soft Computing Lecture 20 Review of HIS Combined Numerical and Linguistic Knowledge Representation and Its Application to Medical Diagnosis.

Jon Curwin and Roger Slater, QUANTITATIVE METHODS: A SHORT COURSE ISBN © Thomson Learning 2004 Jon Curwin and Roger Slater, QUANTITATIVE.

Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.

Using Probability and Discrete Probability Distributions

ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.

Lecture 2 Forestry 3218 Lecture 2 Statistical Methods Avery and Burkhart, Chapter 2 Forest Mensuration II Avery and Burkhart, Chapter 2.

© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 1. The Statistical Imagination.

Summary -1 Chapters 2-6 of DeCoursey. Basic Probability (Chapter 2, W.J.Decoursey, 2003) Objectives: -Define probability and its relationship to relative.

Today Ensemble Methods. Recap of the course. Classifier Fusion

EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.

MATH – High School Common Core Vs Kansas Standards.

Evaluation of Recommender Algorithms for an Internet Information Broker based on Simple Association Rules and on the Repeat-Buying Theory WEBKDD 2002 Edmonton,

Section 3.2 Notes Conditional Probability. Conditional probability is the probability of an event occurring, given that another event has already occurred.

ASSOCIATION RULES (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.

CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

AP Statistics Semester One Review Part 2 Chapters 4-6 Semester One Review Part 2 Chapters 4-6.

Statistics What is statistics? Where are statistics used?

Chapter 1.1 – What is Science?. State and explain the goals of science. Describe the steps used in the scientific method. Daily Objectives.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.

Data Mining and Decision Support

Sampling Measures Of Central Tendency SymbolsProbability Random Stuff.

WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.

MIS2502: Data Analytics Association Rule Mining David Schuff

MIS2502: Data Analytics Association Rule Mining Jeremy Shafer

. Chapter 14 From Randomness to Probability. Slide Dealing with Random Phenomena A is a situation in which we know what outcomes could happen, but.

CHAPTER 5 Probability: What Are the Chances?

Is My Model Valid? Using Simulation to Understand Your Model and If It Can Accurately Predict Events Brad Foulkes JMP Discovery Summit 2016.

By Arijit Chatterjee Dr

Probability David Kauchak CS158 – Fall 2013.

Table 1. Advantages and Disadvantages of Traditional DM/ML Methods

Chapter 12 Tests with Qualitative Data

Data Mining 101 with Scikit-Learn

COMP61011 : Machine Learning Ensemble Models

Adrian Tuhtan CS157A Section1

Week 3 Vocabulary Science Scientific Method Engineering Method

7-1 Statistics Sampling Techniques.

Probability and Statistics

Science vocabulary (12) 8/22/18 quiz

CHAPTER 7 BAYESIAN NETWORK INDEPENDENCE BAYESIAN NETWORK INFERENCE MACHINE LEARNING ISSUES.

Propagation Algorithm in Bayesian Networks

THE NATURE OF SCIENCE Visual Vocabulary.

Qualitative Observation

Chapter 1.1 – What is Science?

MIS2502: Data Analytics Association Rule Mining

MIS2502: Data Analytics Association Rule Mining

Additional notes on random variables

Additional notes on random variables

Parametric Methods Berlin Chen, 2005 References:

Thinking critically with psychological science

Bioinformatics 김유환, 문현구, 정태진, 정승우.

Section 13.1 Sampling Techniques

MIS2502: Data Analytics Association Rule Learning

Chapter 14 – Association Rules

Machine Learning in Business John C. Hull

Presentation transcript:

Data Science in Industry

What is the hardest part of data science?

What is the hardest part of data science? Dealing with people  Figuring out what questions to ask (domain knowledge) Getting the data for those questions Organizing the data Dealing with missing data Training supervised Machine Learning

How to become successful?

Association Rules Gather all frequent item sets (specified by support % of occurrences) From those frequent items consider each possible combination and calculate the supports. The most frequent rules that match our support level are presented.

Basic idea behind rule generation

Association rule terminology Confidence: Confidence (xy) = support(xUy)/support(x) Lift: Lift (xy) = support(xUy)/(support(x)*support(y)) If the rule had a lift of 1, it would imply that the probability of occurrence of the antecedent and that of the consequent are independent of each other. When two events are independent of each other, no rule can be drawn involving those two events. Conviction: interpreted as the ratio of the expected frequency that X occurs without Y (that is to say, the frequency that the rule makes an incorrect prediction) if X and Y were independent divided by the observed frequency of incorrect predictions.

Jupyter Example

K-Means will give you clusters that you can label! Clustering K-Means will give you clusters that you can label!

Putting it together From clusters you can label them which allows you to engineer statistically the association rule role ups. From there you can see all the rules and target a dependent variable Set your decision tree, random forests, or neural network to target this dependent variable and independent variables Update with feedback from the field and you don’t have to worry about changing any code. Plus what code/variables would you change? The time savings of ML .

Questions 