1 Ahmed K. Ezzat, Data, Text, and Web Mining for BI Data Mining and Big Data.

Slides:



Advertisements
Similar presentations
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Advertisements

Chapter 9 DATA WAREHOUSING Transparencies © Pearson Education Limited 1995, 2005.
DATA, TEXT, AND WEB MINING
Week 9 Data Mining System (Knowledge Data Discovery)
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Data Mining Knowledge Discovery in Databases Data 31.
DATA WAREHOUSING.
Data Mining By Archana Ketkar.
Building Knowledge-Driven DSS and Mining Data
Data Mining – Intro.
1 Data and Knowledge Management. 2 Data Management: A Critical Success Factor The difficulties and the process Data sources and collection Data quality.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Overview of Web Data Mining and Applications Part I
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Business Intelligence
Enterprise systems infrastructure and architecture DT211 4
Chapter 4 Data, Text, and Web Mining
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization.
Chapter 5: Data Mining for Business Intelligence
CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the.
Data Mining Techniques
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
DATA, TEXT, AND WEB MINING
Chapter 4 Data, Text, and Web Mining
CISB594 – Business Intelligence Data Mining Part I.
Chapter 7 DATA, TEXT, AND WEB MINING Pages , 311, Sections 7.3, 7.5, 7.6.
Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.
Zhangxi Lin ISQS 3358 Texas Tech University 1.  Define data mining and list its objectives and benefits  Understand different purposes and applications.
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
Introduction to Text and Web Mining. I. Text Mining is part of our lives.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
Data Mining By Dave Maung.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Decision Trees. Decision trees Decision trees are powerful and popular tools for classification and prediction. The attractiveness of decision trees is.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
CHAPTER 4 Data Warehousing, Access, Analysis, Mining, and Visualization 2 1.
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
DATA MINING PREPARED BY RAJNIKANT MODI REFERENCE:DOUG ALEXANDER.
Data Mining and Decision Support
Academic Year 2014 Spring Academic Year 2014 Spring.
Chapter 2 Data, Text, and Web Mining. Data Mining Concepts and Applications  Data mining (DM) A process that uses statistical, mathematical, artificial.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization.
Data mining in web applications
Data Mining – Intro.
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
Introduction C.Eng 714 Spring 2010.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Chapter 13 – Data Warehousing
Data Warehousing and Data Mining
Supporting End-User Access
TEXTAND WEB MINING.
TEXT and WEB MINING.
Presentation transcript:

1 Ahmed K. Ezzat, Data, Text, and Web Mining for BI Data Mining and Big Data

2 Outline Data Mining Overview Text Mining Overview Web Mining Overview

333 Data Mining Overview

Learning Objectives Define data mining and list its objectives and benefits Understand different purposes and applications of data mining Understand different methods of data mining, especially clustering and decision tree models Build expertise in use of some data mining software 4

Learn the process of data mining projects Understand data mining pitfalls and myths Define text mining and its objectives and benefits Appreciate use of text mining in business applications Define Web mining and its objectives and benefits Learning Objectives 5

Data Mining Concepts and Applications Six factors behind the sudden rise in popularity of data mining 1. General recognition of the untapped value in large databases; 2. Consolidation of database records tending toward a single customer view; 3. Consolidation of databases, including the concept of an information warehouse; 4. Reduction in the cost of data storage and processing, providing for the ability to collect and accumulate data; 5. Intense competition for a customer’s attention in an increasingly saturated marketplace; and 6. The movement toward the de-massification of business practices 6

Data mining (DM) A process that uses statistical, mathematical, artificial intelligence and machine-learning techniques to extract and identify useful information and subsequent knowledge from large databases Data Mining Concepts and Applications 7

Major characteristics and objectives of data mining:  Data are often buried deep within very large databases, which sometimes contain data from several years; sometimes the data are cleansed and consolidated in a data warehouse  The data mining environment is usually client/server architecture or a Web-based architecture Data Mining Concepts and Applications 8

Major characteristics and objectives of data mining:  Sophisticated new tools help to remove the information buried in corporate files or archival public records; finding it involves massaging and synchronizing the data to get the right results.  The miner is often an end user, empowered by data drills and other power query tools to ask ad hoc questions and obtain answers quickly, with little or no programming skill Data Mining Concepts and Applications 9

Major characteristics and objectives of data mining:  Striking it rich often involves finding an unexpected result and requires end users to think creatively  Data mining tools are readily combined with spreadsheets and other software development tools; the mined data can be analyzed and processed quickly and easily  Parallel processing is sometimes used because of the large amounts of data and massive search efforts Data Mining Concepts and Applications 10

How data mining works  Data mining tools find patterns in data and may even infer rules from them  Three methods are used to identify patterns in data: 1. Simple models 2. Intermediate models 3. Complex models Data Mining Concepts and Applications 11

Classification Supervised induction used to analyze the historical data stored in a database and to automatically generate a model that can predict future behavior Common tools used for classification are:  Neural networks  Decision trees  If-then-else rules Data Mining Concepts and Applications 12

Clustering Partitioning a database into segments in which the members of a segment share similar qualities Association A category of data mining algorithm that establishes relationships about items that occur together in a given record Data Mining Concepts and Applications 13

Sequence discovery The identification of associations over time Visualization can be used in conjunction with data mining to gain a clearer understanding of many underlying relationships Data Mining Concepts and Applications 14

Regression is a well-known statistical technique that is used to map data to a prediction value Forecasting estimates future values based on patterns within large sets of data Data Mining Concepts and Applications 15

Hypothesis-driven data mining Begins with a proposition by the user, who then seeks to validate the truthfulness of the proposition Discovery-driven data mining Finds patterns, associations, and relationships among the data in order to uncover facts that were previously unknown or not even contemplated by an organization Data Mining Concepts and Applications 16

– Marketing – Banking – Retailing and sales – Manufacturing and production – Brokerage and securities trading – Insurance – Computer hardware and software – Government and defense – Airlines – Health care – Broadcasting – Police – Homeland security Data mining applications Data Mining Concepts and Applications 17

Data mining tools and techniques can be classified based on the structure of the data and the algorithms used:  Statistical methods  Decision trees Defined as a root followed by internal nodes. Each node (including root) is labeled with a question and arcs associated with each node cover all possible responses Data Mining Concepts and Applications 18

Data mining tools and techniques can be classified based on the structure of the data and the algorithms used:  Case-based reasoning  Neural computing  Intelligent agents  Genetic algorithms  Other tools Rule induction Data visualization Data Mining Concepts and Applications 19

A general algorithm for building a decision tree: 1. Create a root node and select a splitting attribute. 2. Add a branch to the root node for each split candidate value and label 3. Take the following iterative steps: a. Classify data by applying the split value. b. If a stopping point is reached, then create leaf node and label it. Otherwise, build another subtree Data Mining Concepts and Applications 20

Gini index Used in economics to measure the diversity of the population. The same concept can be used to determine the ‘purity’ of a specific class as a result of a decision to branch along a particular attribute/variable Data Mining Concepts and Applications 21

Data Mining Techniques and Tools 22

The ID3 algorithm decision tree approach  Entropy Measures the extent of uncertainty or randomness in a data set. If all the data in a subset belong to just one class, then there is no uncertainty or randomness in that dataset, therefore the entropy is zero Data Mining Techniques and Tools 23

Cluster analysis for data mining  Cluster analysis is an exploratory data analysis tool for solving classification problems  The object is to sort cases into groups so that the degree of association is strong between members of the same cluster and weak between members of different clusters Data Mining Techniques and Tools 24

Cluster analysis results may be used to:  Help identify a classification scheme  Suggest statistical models to describe populations  Indicate rules for assigning new cases to classes for identification, targeting, and diagnostic purposes  Provide measures of definition, size, and change in what were previously broad concepts  Find typical cases to represent classes Data Mining Techniques and Tools 25

Cluster analysis methods  Statistical methods  Optimal methods  Neural networks  Fuzzy logic  Genetic algorithms Each of these methods generally works with one of two general method classes:  Divisive  Agglomerative Data Mining Techniques and Tools 26

Hierarchical clustering method and example 1. Decide which data to record from the items 2. Calculate the distances between all initial clusters. Store the results in a distance matrix 3. Search through the distance matrix and find the two most similar clusters 4. Fuse those two clusters together to produce a cluster that has at least two items 5. Calculate the distances between this new cluster and all the other clusters 6. Repeat steps 3 to 5 until you have reached the pre- specified maximum number of clusters Data Mining Techniques and Tools 27

Classes of data mining tools and techniques as they relate to information and business intelligence (BI) technologies  Mathematical and statistical analysis packages  Personalization tools for Web-based marketing  Analytics built into marketing platforms  Advanced CRM tools  Analytics added to other vertical industry-specific platforms  Analytics added to database tools (e.g., OLAP)  Standalone data mining tools Data Mining Techniques and Tools 28

Data Mining Project Processes 29

Data Mining Project Processes 30

Knowledge discovery in databases (KDD) A comprehensive process of using data mining methods to find useful information and patterns in data Data Mining Project Processes 31

KDD process 1. Selection 2. Preprocessing 3. Transformation 4. Data mining 5. Interpretation/evaluation Data Mining Project Processes 32

33 Text Mining Overview

Classification of data mining vs. text mining Classification of data mining vs. text mining 34 Finding Patterns FindingNuggets NovelNon-Novel Non-textual Data Standard Data Mining AnalyticsDatabase Queries Textual DataCPL & NLPReal Text Mining Information Retrieval

Text Mining Overview Classification of data mining vs. text mining Classification of data mining vs. text mining 35

36 The Intelligence Pyramid 36

Text mining helps organizations:  Find the “hidden” content of documents, including additional useful relationships  Relate documents across previous unnoticed divisions  Group documents by common themes Text Mining Overview 37

Text Mining Overview Text mining Process Text mining Process 38  Text Preprocessing (NLP): text cleanup, tokenization, part-of-speech tagging, Word sense disambiguation, semantic structure  Text Transformation: text representation, doc feature selection  Attribute Selection: reduce dimensionality, remove irrelevant attributes  Pattern Discovery: classic data mining techniques  Interpretation Evaluation: terminate or iterate

How to mine text 1. Eliminate commonly used words (stop-words) 2. Replace words with their stems or roots (stemming algorithms) 3. Consider synonyms and phrases 4. Calculate the weights of the remaining terms Text Mining Overview 39

Text Mining Overview Text mining Text mining  Is a special case of data mining that aims at extracting useful information from non-structured or semi-structured textual data  Involves the process of structuring the input text and detect patterns and trends through data mining techniques such as statistical pattern learning  The above is typically done through using CPL/NLP engine to read free text, identify terms, classify the terms (noun, verb, adjective), disambiguate and perform fact extraction in the appropriate structure representation, simplify attributes, apply data mining techniques to discover patterns & trends, and evaluate for possible more iteration/refinement 40

Text Mining Overview Text mining Text mining  Typically text mining will entails the generation of meaningful indices from the unstructured text and then use various data mining algorithms leveraging these indices  Taxonomy – even with filtering the terms we still have very large number, and knowledge discovery typically relies on some hierarchy of terms to form results at higher level. Term taxonomy would enable the production of general association rules that capture relationship between groups of terms rather than individual terms  Ontology – would help us to do reasoning on the generated facts by associating terms with ontology concepts, e.g., OWL with semantic web triples. In this case we would be treating the triples as knowledge management system 41

Text Mining Overview Applications of text mining Applications of text mining  Automatic detection of spam or phishing through analysis of the document content  Automatic processing of messages or s to route a message to the most appropriate party to process that message  Analysis of warranty claims, help desk calls/reports, and so on to identify the most common problems and relevant responses  Analysis of related scientific publications in journals to create an automated summary view of a particular discipline  Creation of a “relationship view” of a document collection  Qualitative analysis of documents to detect deception 42

43 Web Mining Overview

Web mining Web mining The discovery and analysis of interesting and useful information from the Web, about the Web, and usually through Web-based tools 44

Web Mining Overview 45

Web Mining Overview Web content mining Web content mining The extraction of useful information from Web pages Web structure mining Web structure mining The development of useful information from the links included in the Web documents Web usage mining Web usage mining The extraction of useful information from the data being generated through webpage visits, transaction, etc. 46

Web Mining Overview Uses for Web mining Uses for Web mining:  Determine the lifetime value of clients  Design cross-marketing strategies across products  Evaluate promotional campaigns  Target electronic ads and coupons at user groups  Predict user behavior  Present dynamic information to users 47

Web Mining Overview 48

49 END