Data Mining Basics Image Source: http://www.alfainternational.com/content/podcasts.aspx.

Slides:



Advertisements
Similar presentations
Renata Benda Prokeinova Department of Statistics and Operation Research FEM SUA in Nitra.
Advertisements

Chapter 9 Business Intelligence Systems
Data Mining.
Data Mining By Archana Ketkar.
Data Mining Ketaki Borkar CS157A November 29, 2007.
Data mining By Aung Oo.
DataMining By Guan Hang Su CS157A section 2 fall 2005.
Presented To: Madam Nadia Gul Presented By: Bi Bi Mariam.
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
April 11, 2008 Data Mining Competition 2008 The 4 th Annual Business Intelligence Symposium Hualin Wang Manager of Advanced.
Data Mining Techniques
Data Mining Chun-Hung Chou
Data Mining CS157B Section 2 Larry Varela. What is Data Mining? Data Mining is "The science of extracting useful information from large data sets or databases“.
Chapter 13 Genetic Algorithms. 2 Data Mining Techniques So Far… Chapter 5 – Statistics Chapter 6 – Decision Trees Chapter 7 – Neural Networks Chapter.
Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
1 1 Slide Introduction to Data Mining and Business Intelligence.
Data MINING Data mining is the process of extracting previously unknown, valid and actionable information from large data and then using the information.
Fox MIS Spring 2011 Data Mining Week 9 Introduction to Data Mining.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Data Mining In contrast to the traditional (reactive) DSS tools, the data mining premise is proactive. Data mining tools automatically search the data.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
APPLICATION OF DATAMINING TOOL FOR CLASSIFICATION OF ORGANIZATIONAL CHANGE EXPECTATION Şule ÖZMEN Serra YURTKORU Beril SİPAHİ.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
MIS2502: Data Analytics Advanced Analytics - Introduction.
Lesson 9: Types of information system. Introduction  An MIS is a decision support system in which the form of input query and response is predetermined.
DATA MINING PREPARED BY RAJNIKANT MODI REFERENCE:DOUG ALEXANDER.
Data Mining Basics. “Copyright and Terms of Service Copyright © Texas Education Agency. The materials found on this website are copyrighted © and trademarked.
1 Categories of data Operational and very short-term decision making data Current, short-term decision making, related to financial transactions, detailed.
Data Resource Management Agenda What types of data are stored by organizations? How are different types of data stored? What are the potential problems.
Data Mining. Overview the extraction of hidden predictive information from large databases Data mining tools predict future trends and behaviors, allowing.
Knowledge Discovery and Data Mining 19 th Meeting Course Name: Business Intelligence Year: 2009.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Data Mining NATE BUTLER, BRENT DAVIS, BROCK NOLAN, AND NICK THORNHILL.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Data Mining Introduction to data mining concepts.
Department of Computer Science Sir Syed University of Engineering & Technology, Karachi-Pakistan. Presentation Title: DATA MINING Submitted By.
Data Mining is the process of analyzing data and summarizing it into useful information Data Mining is usually used for extremely large sets of data It.
Supplemental Chapter: Business Intelligence Information Systems Development.
Data Mining and its Application in Marketing and Business
AP CSP: Data and Trends.
Data Mining Functionalities
Data Mining.
Introduction BIM Data Mining.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining – Intro.
By Arijit Chatterjee Dr
Data Mining Generally, (Sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it.
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Data Mining 101 with Scikit-Learn
The Accounting Division
eCommerce Customer Behavior Analytics
NBA Draft Prediction BIT 5534 May 2nd 2018
MIS5101: Data Analytics Advanced Analytics - Introduction
DATA MINING.
Supporting End-User Access
Collecting and Analyzing Data
Data Mining: Introduction
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Big DATA.
Kenneth C. Laudon & Jane P. Laudon
Presentation transcript:

Data Mining Basics Image Source: http://www.alfainternational.com/content/podcasts.aspx

Discovery On any given day, what are some ways you gather information? Every two days now, we create as much information as we did from the dawn of civilization until 2003. ~Eric Schmidt On any given day, what are some ways you gather information? Do you think any of these ways are better than others? Why or why not? Do you realize what kind of information is being gathered on you on any given day? What methods do you think are being used to gather information on you? Ask and Say part of lesson. Have students brainstorm a list of ways they gather information. Some answers might include through sight, sound, asking questions, listening, looking something up on their phones, looking something up on their computers, reading, etc. Copyright © Texas Education Agency, 2014. All right reserved. 2

Activity Review If you can draw conclusions, are they accurate? What was your most used technique to gather information/data? Which technique was most accurate? Why? Can you view your data and draw any conclusions or assumptions based on your gathered data/information? Why or Why not? If you can draw conclusions, are they accurate? Have students answer these questions on their own or do a pair and share where you have them pair up and discuss their findings. Once students have been given the opportunity to answer these questions, have them share information with the class. Copyright © Texas Education Agency, 2014. All right reserved. 3

Instruction/Discussion Data Mining is the process of analyzing data from different perspectives and summarizing it into useful information. Based on our activity and our discussion of the findings, were you able to analyze data from different perspectives and summarize it into useful information? What were you analyses? Have students make assumptions about the information they gathered from the class activity—then have them share findings with the class. Copyright © Texas Education Agency, 2014. All right reserved. 4

Instruction/Discussion When did data mining start? Informal data mining has been around since time began, but it has evolved. Manual data mining Bayes’ Theorem (Thomas Bayes, 1700s) Regression Analysis (1800s) Electronic data mining 1990s Day after day, electronic data mining continues to evolve due to advancements in statistical analysis software, computer processing power, and disk storage. Information Source: http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm Bayes’ Theorem Probability measures a degree of belief and Bayes’ Theorem links the degree of belief in a proposition before and after accounting for evidence. Example: For example, suppose somebody proposes that a biased coin is twice as likely to land heads than tails. Degree of belief in this might initially be 50%. The coin is then flipped a number of times to collect evidence. Belief may rise to 70% if the evidence supports the proposition. Regression Analysis A statistical process for estimating the relationships among variables…between one dependent variable and one or more independent variables. It helps one understand how the typical value of the dependent variable (or 'Criterion Variable') changes when any one of the independent variables is varied, while the other independent variables are held fixed. Copyright © Texas Education Agency, 2014. All right reserved. 5

Data Mining Why use it? How does it work? What makes it data mining? Different levels of analysis. Required Technological Infrastructure Copyright © Texas Education Agency, 2014. All right reserved. 6

Instruction/Discussion Why use date mining? To determine market trends To save money To make money To determine future consumer spending To analyze consumer spending habits To help see determine patterns To save time Information Source: http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm Examples: Share with students the following taken from this article. How they use it to determine market trends, to save money, to make money, to determine consumer spending, and to analyze consumer spending habits: For example, one Midwest grocery chain used the data mining capacity of Oracle software to analyze local buying patterns. They discovered that when men bought diapers on Thursdays and Saturdays, they also tended to buy beer. Further analysis showed that these shoppers typically did their weekly grocery shopping on Saturdays. On Thursdays, however, they only bought a few items. The retailer concluded that they purchased the beer to have it available for the upcoming weekend. The grocery chain could use this newly discovered information in various ways to increase revenue. For example, they could move the beer display closer to the diaper display. And, they could make sure beer and diapers were sold at full price on Thursdays. How they use it to help people and save time. The National Basketball Association (NBA) is exploring a data mining application that can be used in conjunction with image recordings of basketball games. The Advanced Scout software analyzes the movements of players to help coaches orchestrate plays and strategies. For example, an analysis of the play-by-play sheet of the game played between the New York Knicks and the Cleveland Cavaliers on January 6, 1995 reveals that when Mark Price played the Guard position, John Williams attempted four jump shots and made each one! Advanced Scout not only finds this pattern, but explains that it is interesting because it differs considerably from the average shooting percentage of 49.30% for the Cavaliers during that game. By using the NBA universal clock, a coach can automatically bring up the video clips showing each of the jump shots attempted by Williams with Price on the floor, without needing to comb through hours of video footage. Those clips show a very successful pick-and-roll play in which Price draws the Knick's defense and then finds Williams for an open jump shot. Copyright © Texas Education Agency, 2014. All right reserved. 7

Instruction/Discussion How does it work? It analyzes relationships and patterns in stored transaction data based on open-ended user queries and generally four types of relationships are sought: Classes Clusters Associations Sequential Patterns Information Source: http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm Classes: Stored data is used to locate data in predetermined groups. For example, a restaurant chain could mine customer purchase data to determine when customers visit and what they typically order. This information could be used to increase traffic by having daily specials. Clusters: Data items are grouped according to logical relationships or consumer preferences. For example, data can be mined to identify market segments or consumer affinities. Associations: Data can be mined to identify associations. The beer-diaper example is an example of associative mining. Sequential patterns: Data is mined to anticipate behavior patterns and trends. For example, an outdoor equipment retailer could predict the likelihood of a backpack being purchased based on a consumer's purchase of sleeping bags and hiking shoes. Copyright © Texas Education Agency, 2014. All right reserved. 8

Instruction/Discussion What makes it data mining? Data mining consists of five major elements: Extract, transform, and load transaction data onto a warehouse data system Store and manage the data Provide data access to business analysts and information technology professionals Analyze the data by application software Present data in a useful format, such as a graph or table. Information Source: http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm Share with students how the PSAT is a data mining tool because it is used to help predict scores for the SAT…and STAAR Tests. Any time anyone is given a test and/or a questionnaire on how much he/she knows about a subject, that also is data mining. Copyright © Texas Education Agency, 2014. All right reserved. 9

Instruction/Discussion Different Levels of Analysis Artificial Neural Networks Genetic Algorithms Decision Trees Nearest Neighborhood Method Rule Induction Data Visualization Information Source: http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm Artificial neural networks: Non-linear predictive models that learn through training and resemble biological neural networks in structure. Genetic algorithms: Optimization techniques that use processes such as genetic combination, mutation, and natural selection in a design based on the concepts of natural evolution. Decision trees: Tree-shaped structures that represent sets of decisions. These decisions generate rules for the classification of a dataset. Specific decision tree methods include Classification and Regression Trees (CART) and Chi Square Automatic Interaction Detection (CHAID) . CART and CHAID are decision tree techniques used for classification of a dataset. They provide a set of rules that you can apply to a new (unclassified) dataset to predict which records will have a given outcome. CART segments a dataset by creating 2-way splits while CHAID segments using chi square tests to create multi-way splits. CART typically requires less data preparation than CHAID. Nearest neighbor method: A technique that classifies each record in a dataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset (where k 1). Sometimes called the k-nearest neighbor technique. Rule induction: The extraction of useful if-then rules from data based on statistical significance. Data visualization: The visual interpretation of complex relationships in multidimensional data. Graphics tools are used to illustrate data relationships. Copyright © Texas Education Agency, 2014. All right reserved. 10