Chapter 7 DATA, TEXT, AND WEB MINING Pages 304-309, 311, Sections 7.3, 7.5, 7.6.

Slides:



Advertisements
Similar presentations
Web Mining.
Advertisements

Chapter 1 Business Driven Technology
Back to Table of Contents
By: Mr Hashem Alaidaros MIS 211 Lecture 4 Title: Data Base Management System.
Data Mining Glen Shih CS157B Section 1 Dr. Sin-Min Lee April 4, 2006.
Chapter 12: Web Usage Mining - An introduction
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
DATA MINING CS157A Swathi Rangan. A Brief History of Data Mining The term “Data Mining” was only introduced in the 1990s. Data Mining roots are traced.
Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India.
DATA, TEXT, AND WEB MINING
Week 9 Data Mining System (Knowledge Data Discovery)
Data Mining Knowledge Discovery in Databases Data 31.
The Web is perhaps the single largest data source in the world. Due to the heterogeneity and lack of structure, mining and integration are challenging.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
Data Mining By Archana Ketkar.
Data Mining Adrian Tuhtan CS157A Section1.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Huimin Ye.
Overview of Web Data Mining and Applications Part I
DASHBOARDS Dashboard provides the managers with exactly the information they need in the correct format at the correct time. BI systems are the foundation.
Chapter 5 Data mining : A Closer Look.
Data Mining CS 157B Section 2 Keng Teng Lao. Overview Definition of Data Mining Application of Data Mining.
TURKISH STATISTICAL INSTITUTE INFORMATION TECHNOLOGIES DEPARTMENT (Muscat, Oman) DATA MINING.
Enterprise systems infrastructure and architecture DT211 4
Chapter 4 Data, Text, and Web Mining
Chapter 5: Data Mining for Business Intelligence
CISB594 – Business Intelligence Data Mining. CISB594 – Business Intelligence Reference Materials used in this presentation are extracted mainly from the.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
Shilpa Seth.  What is Data Mining What is Data Mining  Applications of Data Mining Applications of Data Mining  KDD Process KDD Process  Architecture.
Consumer Behavior, Market Research
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Chapter 4 Data, Text, and Web Mining
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Lecture 9: Knowledge Discovery Systems Md. Mahbubul Alam, PhD Associate Professor Dept. of AEIS Sher-e-Bangla Agricultural University.
INTRODUCTION TO DATA MINING MIS2502 Data Analytics.
Introduction to Text and Web Mining. I. Text Mining is part of our lives.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Data Mining By : Tung, Sze Ming ( Leo ) CS 157B. Definition A class of database application that analyze data in a database using tools which look for.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Data Mining By Dave Maung.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
CRM - Data mining Perspective. Predicting Who will Buy Here are five primary issues that organizations need to address to satisfy demanding consumers:
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
Business Intelligence - 2 BUS 782. Topics Data warehousing Data Mining.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
MIS2502: Data Analytics Advanced Analytics - Introduction.
DATA MINING PREPARED BY RAJNIKANT MODI REFERENCE:DOUG ALEXANDER.
Academic Year 2014 Spring Academic Year 2014 Spring.
Becerra-Fernandez, et al. -- Knowledge Management 1/e -- © 2004 Prentice Hall Week 11 Knowledge Discovery Systems: Systems That Create Knowledge.
Chapter 2 Data, Text, and Web Mining. Data Mining Concepts and Applications  Data mining (DM) A process that uses statistical, mathematical, artificial.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
Chapter 8: Web Analytics, Web Mining, and Social Analytics
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
1 Ahmed K. Ezzat, Data, Text, and Web Mining for BI Data Mining and Big Data.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Data mining in web applications
Data Based Decision Making
MIS2502: Data Analytics Advanced Analytics - Introduction
MIS 451 Building Business Intelligence Systems
Knowledge Discovery Systems: Systems That Create Knowledge
Adrian Tuhtan CS157A Section1
Week 11 Knowledge Discovery Systems & Data Mining :
TEXTAND WEB MINING.
TEXT and WEB MINING.
Kenneth C. Laudon & Jane P. Laudon
Presentation transcript:

Chapter 7 DATA, TEXT, AND WEB MINING Pages , 311, Sections 7.3, 7.5, 7.6

Data mining A process that uses statistical, mathematical, artificial intelligence and machine-learning techniques to extract and identify new knowledge from large databases Recognizes the untapped value of data in large databases You may unexpectedly strike rich in understanding relationships among data

Example Task: Find the best route to cover the territory

Challenge of finding relationships in large databases

Connect equal elevation points to make a contour map The dark vertical line shows the best route to cross the territory without falling off a cliff.

Once relationships are discovered, they can be used for prediction

Uses of Data Mining-1 Classification Identify attribute of interest (eg. You want to classify who is likely to pay late) Examine all other attribute values of customer from data warehouse and locate the one that is most related to the attribute of interest (eg. monthly income level) Mining Algorithm The most common algorithm used for Classification is Decision trees Gini Index: helps to determine where to find the split between two classes (eg. at what income level) - used in developing decision trees (see example on page 316)

Which product class is the best seller? Conclusion: Clay products with a price below $25!

Segmentation Partitioning a database into groups in which the members of each group share similar characteristics Mining Algorithm Clustering: The object is to sort cases into groups so that the similarities within the group are strong among members of the same cluster and weak between members of different clusters Eg. Companies with over 100 employees may share similar characteristics (eg. revenue size) than those with less than 100 employees. Knowledge can help with developing different policies when dealing with different type of companies Uses of Data Mining-2

Association A category of data mining algorithm that establishes relationships about items that occur together in a given record Eg. You may discover from data that senior students take elective courses together in the final semester Can be helpful to schedule courses People who buy a suit may also buy dress shirt People who buy swimwear may buy fins, goggles, cap, etc. Uses of Data Mining-3

Sequence discovery The identification of associations over time. Discovering the order in which events occur. The algorithm can examine data and predict what event is most likely to occur next. Widely used in studying how visitors navigate a Web site. Helps to improve chances of making a sale. Uses of Data Mining-4

Regression is a statistical technique that is used to map data to a prediction value Forecasting estimates future values based on patterns within large sets of data Eg. Gasoline prices this month may predict next month’s sales of SUVs Uses of Data Mining-5

Data Mining Concepts and Applications –Marketing –Banking –Retailing and sales –Manufacturing and production –Brokerage and securities trading –Insurance –Computer hardware and software –Government and defense –Airlines –Health care –Broadcasting –Police –Homeland security Data mining applications

Text Mining Application of data mining to text files, typically freestyle text material Discovers new knowledge that is not obvious Examples: Examine all news services, cluster similar topics, create a new summary for each topic Find the “hidden” content of documents, including additional useful relationships, eg. Lies, deceptions, scams Not same as the search engine on the Web.

Text Mining – how is it done? It entails the generation of meaningful numerical indices/factors from the unstructured text and then processing these indices using various data mining algorithms Example: Extract each word from the document being text mined Eliminate commonly used words (the, and, other, etc) Combine synonyms and phrases Calculate weights for each term: tf factor (term frequency) – actual number of times a word appears in a document idf factor (inter document frequency) – across multiple documents High tf factor value of a given term indicates that the document topic is probably around the meaning of that term!

Text Mining - applications –Automatic detection of spam or phishing through analysis of the document content –Automatic processing of messages or s to route a message to the most appropriate party to process that message –Analysis of warranty claims, help desk calls/reports, and so on to identify the most common problems and relevant responses

Web Mining The discovery and analysis of interesting and useful information from the Web

Web content mining The extraction of useful information from Web pages Eg. Search with the help of keywords in the Meta tags of the web page You can analyze the document content of the first 10 links of Google in a search response You can generate a summary of the contents automatically in a new document!

Web structure mining The development of useful information from the links included in the Web documents If a web site’s pages predominantly link to each other, you may consider the site to exist ‘independent’ If a collection of web sites are linked to each other heavily, it points to a web community or clan that share common interests Example application: Web structure mining can lead to better understanding of extremist groups

Web usage mining The extraction of useful information from the data being generated through webpage visits, transaction, etc. Clickstream analysis Uses cookies, number of logs, time of log, etc Can help profile users

Uses for Web mining –Determine the lifetime value of clients –Design cross-marketing strategies across products –Evaluate promotional campaigns –Target electronic ads and coupons at user groups –Predict user behavior –Present dynamic information to users

Data Mining Project Processes

Steps for Data Mining Problem definition: Decide the measure to study and the suitable mining algorithm (see Exercise 11) Data preparation: Design the cube and populate it relevant data from the data warehouse Training: Run the mining algorithm on a subset of the data warehouse data for the system to learn to find segments, associations, etc among data Validation: Run the ‘learnt’ model from previous step to the remaining subset of data and try to ‘predict’. Since you have historical data, you can verify if the ‘learnt’ model is any good. Deploy: Implement to predict in real environment where you do not know the actual results.