Data Mining 101 with Scikit-Learn

Slides:



Advertisements
Similar presentations
Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.
Advertisements

Data Mining: A Closer Look Chapter Data Mining Strategies.
Chapter 9 Business Intelligence Systems
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining By Archana Ketkar.
Data Mining Adrian Tuhtan CS157A Section1.
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
Data Mining – Intro.
Presented To: Madam Nadia Gul Presented By: Bi Bi Mariam.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Data Mining: A Closer Look
Chapter 5 Data mining : A Closer Look.
Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.
Introduction to machine learning
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Chapter 5: Data Mining for Business Intelligence
Data Mining Techniques
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
Data Mining Chun-Hung Chou
Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related.
MACHINE LEARNING 張銘軒 譚恆力 1. OUTLINE OVERVIEW HOW DOSE THE MACHINE “ LEARN ” ? ADVANTAGE OF MACHINE LEARNING ALGORITHM TYPES  SUPERVISED.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Knowledge Discovery and Data Mining Evgueni Smirnov.
AI Week 14 Machine Learning: Introduction to Data Mining Lee McCluskey, room 3/10
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
 Fundamentally, data mining is about processing data and identifying patterns and trends in that information so that you can decide or judge.  Data.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 12 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining
MIS2502: Data Analytics Advanced Analytics - Introduction.
Data Mining and Decision Support
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization.
Oracle Advanced Analytics
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining – Intro.
By Arijit Chatterjee Dr
Data Mining Generally, (Sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it.
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Applications of Data Mining in Software Engineering
Adrian Tuhtan CS157A Section1
Data Mining: Concepts and Techniques Course Outline
Introduction to Azure Machine Learning Studio
Machine Learning Week 1.
Prepared by: Mahmoud Rafeek Al-Farra
Prepared by: Mahmoud Rafeek Al-Farra
Classification and Prediction
Course Introduction CSC 576: Data Mining.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Welcome! Knowledge Discovery and Data Mining
Azure Machine Learning
CSE591: Data Mining by H. Liu
Machine Learning in Business John C. Hull
Presentation transcript:

Data Mining 101 with Scikit-Learn An informal introduction of data mining Shuhan Yuan sy005@uark.edu

What is data mining? Data mining is the computing process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. (https://en.wikipedia.org/wiki/Data_mining) Data mining is the process of discovering interesting knowledge from large amounts of data stored either in databases, data warehouses, or other information repositories. (Data Mining: Concepts and Techniques)

What is data mining? Data mining is the computing process of discovering patterns in large data sets involving methods at the intersection of machine learning, statistics, and database systems. (https://en.wikipedia.org/wiki/Data_mining) Data mining is the process of discovering interesting knowledge from large amounts of data stored either in databases, data warehouses, or other information repositories. (Data Mining: Concepts and Techniques)

What is data mining? Data Mining Knowledge Data (Models) A naïve view of data mining Data Mining Data Knowledge (Models) knowledge discovery from data http://hanj.cs.illinois.edu/bk1/

Six common classes of tasks Classification – is the task of generalizing known structure to apply to new data. For example, an e-mail program might attempt to classify an e-mail as "legitimate" or as "spam". Clustering – is the task of discovering groups and structures in the data that are in some way or another "similar", without using known structures in the data. Regression – attempts to find a function which models the data with the least error that is, for estimating the relationships among data or datasets. Summarization – providing a more compact representation of the data set, including visualization and report generation. Anomaly detection (outlier/change/deviation detection) – The identification of unusual data records, that might be interesting or data errors that require further investigation. Association rule learning (dependency modelling) – Searches for relationships between variables. For example, a supermarket might gather data on customer purchasing habits. Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis. https://en.wikipedia.org/wiki/Data_mining

Six common classes of tasks Classification – is the task of generalizing known structure to apply to new data. For example, an e-mail program might attempt to classify an e-mail as "legitimate" or as "spam". Clustering – is the task of discovering groups and structures in the data that are in some way or another "similar", without using known structures in the data. Regression – attempts to find a function which models the data with the least error that is, for estimating the relationships among data or datasets. Summarization – providing a more compact representation of the data set, including visualization and report generation. Anomaly detection (outlier/change/deviation detection) – The identification of unusual data records, that might be interesting or data errors that require further investigation. Association rule learning (dependency modelling) – Searches for relationships between variables. For example, a supermarket might gather data on customer purchasing habits. Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis. https://en.wikipedia.org/wiki/Data_mining

Classification Supervised Learning https://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/

Regression Supervised Learning https://quantdare.com/machine-learning-a-brief-breakdown/ https://medium.com/simple-ai/linear-regression-intro-to-machine-learning-6-6e320dbdaf06

Clustering Unsupervised Learning Clustering Algorithms https://machinelearningmastery.com/a-tour-of-machine-learning-algorithms/ https://apandre.wordpress.com/visible-data/cluster-analysis/

Anomaly Detection http://machine-learning-class-notes.readthedocs.io/en/latest/lecture16.html http://amid.fish/anomaly-detection-with-k-means-clustering

Association Rule Market Basket Analysis http://www.kdnuggets.com/2016/04/association-rules-apriori-algorithm-tutorial.html https://blogs.adobe.com/digitalmarketing/analytics/shopping-for-kpis-market-basket-analysis-for-web-analytics-data/

Summarization Know your data https://generalassemb.ly/blog/the-best-topical-data-visualizations-of-2015/

Pipeline for Data Mining Data Preprocessing Feature Engineering Model Training Testing Prediction

Linus Torvalds: “Talk is cheap. Show me the code.” http://www.skilledup.com/articles/become-software-engineer

Python Ecosystem

Jupyter Notebook Contain both computer code (e.g. python) and rich text elements (paragraph, equations, figures, links, etc...).

Scikit-Learn http://scikit-learn.org/stable/

http://peekaboo-vision. blogspot http://peekaboo-vision.blogspot.de/2013/01/machine-learning-cheat-sheet-for-scikit.html

Like this graph? More here: https://unsupervisedmethods.com/cheat-sheet-of-machine-learning-and-python-and-math-cheat-sheets-a4afe4e791b6 http://peekaboo-vision.blogspot.de/2013/01/machine-learning-cheat-sheet-for-scikit.html

Scikit-learn Simple and consistent API Instantiate the model m = Model() Fit the model m.fit(train_data) Predict m.predict(test_data) Evaluate m.score(predict_y, target_y) https://medium.com/towards-data-science/train-test-split-and-cross-validation-in-python-80b61beca4b6

Classification: k-nearest neighbors (K-NN) http://bdewilde.github.io/blog/blogger/2012/10/26/classification-of-hand-written-digits-3/

Decision tree

Clustering: k-means Given a data set where each observed example has a set of features, but no labels http://stanford.edu/~cpiech/cs221/handouts/kmeans.html