Subject Name: Data Warehousing and data Mining

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Spatial Database Systems. Spatial Database Applications GIS applications (maps): Urban planning, route optimization, fire or pollution monitoring, utility.
Web Mining Research: A Survey Authors: Raymond Kosala & Hendrik Blockeel Presenter: Ryan Patterson April 23rd 2014 CS332 Data Mining pg 01.
WFM 6202: Remote Sensing and GIS in Water Management © Dr. Akm Saiful IslamDr. Akm Saiful Islam WFM 6202: Remote Sensing and GIS in Water Management Akm.
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
It’s the Geography, Cupid!. GTECH 201 Lecture 04 Introduction to Spatial Data.
Data Mining – Intro.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Overview of Search Engines
Rebecca Boger Earth and Environmental Sciences Brooklyn College.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining Techniques
Data Mining Solutions (Westphal & Blaxton, 1998) Dr. K. Palaniappan Dept. of Computer Engineering & Computer Science, UMC.
Chapter 1: Introduction to Spatial Databases 1.1 Overview 1.2 Application domains 1.3 Compare a SDBMS with a GIS 1.4 Categories of Users 1.5 An example.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
IST 210 Introduction to Spatial Databases. IST 210 Evolution of acronym “GIS” Fig 1.1 Geographic Information Systems (1980s) Geographic Information Science.
Chapter 1 Introduction to Data Mining
1 1 Slide Introduction to Data Mining and Business Intelligence.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Data Mining By Dave Maung.
Principles of Data Mining. Introduction: Topics 1. Introduction to Data Mining 2. Nature of Data Sets 3. Types of Structure Models and Patterns 4. Data.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Introduction to Text Mining By Soumyajit Manna 11/10/08.
Algorithmic Detection of Semantic Similarity WWW 2005.
INTRODUCTION TO GIS  Used to describe computer facilities which are used to handle data referenced to the spatial domain.  Has the ability to inter-
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
What is GIS? “A powerful set of tools for collecting, storing, retrieving, transforming and displaying spatial data”
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Data Mining - Introduction Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Overview of Mining Spatial Data
Data mining in web applications
Information Retrieval in Practice
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining – Intro.
Information Organization: Overview
Search Engine Architecture
INTRODUCTION TO GEOGRAPHICAL INFORMATION SYSTEM
DATA MINING © Prentice Hall.
Multimedia Database.
Web Mining Ref:
Introduction C.Eng 714 Spring 2010.
CHAPTER 3 Architectures for Distributed Systems
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
ACS1803 Lecture Outline 2   DATA MANAGEMENT CONCEPTS Text, Ch. 3
Topic 3: Cluster Analysis
Text & Web Mining 9/22/2018.
Sangeeta Devadiga CS 157B, Spring 2007
Social Knowledge Mining
Data Warehousing and Data Mining
Data Mining: Concepts and Techniques — Chapter 8 — 8
Spatial Databases - Introduction
Value of SDBMS Non-spatial queries: Spatial Queries:
Data Mining: Concepts and Techniques — Chapter 8 — 8
Web Mining Department of Computer Science and Engg.
Spatial Databases - Introduction
Data Mining: Introduction
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Topic 5: Cluster Analysis
Data Mining: Concepts and Techniques — Chapter 8 — 8
Information Organization: Overview
Data Pre-processing Lecture Notes for Chapter 2
CSE591: Data Mining by H. Liu
Information Retrieval and Web Design
Presentation transcript:

Subject Name: Data Warehousing and data Mining Subject Code: 10MCA542 & IS 74 Prepared By: V.Srikanth& Harkiran Preet Department : MCA & IS Date : 5/11/2014 5/11/2014

UNIT 8 WEB MINING Introduction Web Content Mining Text Mining: Unstructured Text Text Clustering and its applications Mining Spatial and Temporal databases 5/11/2014

INTRODUCTION The web has become an enormously important tool for communicating ideas, conducting business and entertainment. From the beginning in the early 1990s the web is estimated to grow to almost 13 billion pages in 2011 with millions of people from all over the world accessing them every day. The web has become the number one source for information for internet users . Web mining is the application of data mining techniques to find interesting and potentially useful knowledge from web data. It is normally expected that either the hyperlink structure of the web or the web log data or both have been used in the mining process. Web mining may be divided in to 1. Web content mining 2. Web structure Mining 3. Web usage Mining. 5/11/2014

WEB CONTENT MINING Web content mining deals with discovering useful information or knowledge from web page contents. It goes well beyond using keywords in a search engine. In contrast to web usage mining and web structure mining, web content mining focuses on the web page rather than the links. Web content is a very rich information resource consisting of many types of information. For example, Unstructured free text, image,audio,video and metadata as well as hyperlinks. Here portals are employed to find what a user might be looking for . Web structure mining deals with discovering and modeling the link structure of the web. Work has been carried out to model the web based on the topology of the hyperlinks. Web usage mining deals with understanding user behavior in interacting with the web or with a web site. 5/11/2014

TEXT MINING:UNSTRUCTURED TEXT “The non trivial extraction of implicit, previously unknown, and potentially useful information from (large amount of) textual data”. An exploration and analysis of textual (natural-language) data by automatic and semi automatic means to discover new knowledge. What is “previously unknown” information ? Strict definition Information that not even the writer knows. e.g., Discovering a new method for a hair growth that is described as a side effect for a different procedure Lenient definition Rediscover the information that the author encoded in the text e.g., Automatically extracting a product’s name from a web-page. 5/11/2014

TEXT MINING METHODS Information Retrieval Indexing and retrieval of textual documents Information Extraction Extraction of partial knowledge in the text Web Mining Indexing and retrieval of textual documents and extraction of partial knowledge using the web Clustering Generating collections of similar text documents 5/11/2014

TEXT MINING PROCESS 5/11/2014

TEXT MINING PROCESS Text preprocessing Syntactic/Semantic text analysis Features Generation Bag of words Features Selection Simple counting Statistics Text/Data Mining Classification- Supervised learning Clustering- Unsupervised learning Analyzing results 5/11/2014

SYNTACTIC/SEMANTIC TEXT ANALYSIS Part Of Speech (pos) tagging Find the corresponding pos for each word e.g., John (noun) gave (verb) the (det) ball (noun) ~98% accurate. Word sense disambiguation Context based or proximity based Very accurate Parsing Generates a parse tree (graph) for each sentence Each sentence is a stand alone graph 5/11/2014 6/25/2018

TEXT MINING: CLASSIFICATION DEFINITION Given: a collection of labeled records (training set) Each record contains a set of features (attributes), and the true class (label) Find: a model for the class as a function of the values of the features Goal: previously unseen records should be assigned a class as accurately as possible A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it 5/11/2014

TEXT CLUSTERING -APPLICATIONS Marketing: Discover distinct groups of potential buyers according to a user text based profile e.g. amazon Industry: Identifying groups of competitors web pages e.g., competing products and their prices Job seeking: Identify parameters in searching for jobs e.g., www.flipdog.com 5/11/2014

MINING SPATIAL AND TEMPORAL DATABASES Geometric, geographic or spatial data: space-related data Example: Geographic space (2-D abstraction of earth surface), VLSI design, model of human brain, 3-D space representing the arrangement of chains of protein molecule. Spatial database system vs. image database systems. Image database system: handling digital raster image (e.g., satellite sensing, computer tomography), may also contain techniques for object analysis and extraction from images and some spatial database functionality. Spatial (geometric, geographic) database system: handling objects in space that have identity and well-defined extents, locations, and relationships. 5/11/2014

GIS (Geographic Information System) Analysis and visualization of geographic data Common analysis functions of GIS Search (thematic search, search by region) Location analysis (buffer, corridor, overlay) Terrain analysis (slope/aspect, drainage network) Flow analysis (connectivity, shortest path) Distribution (nearest neighbor, proximity, change detection) Spatial analysis/statistics (pattern, centrality, similarity, topology) Measurements (distance, perimeter, shape, adjacency, direction) 5/11/2014 6/25/2018

SPATIAL DBMS SDBMS is a software system that supports spatial data models, spatial ADTs, and a query language supporting them supports spatial indexing, spatial operations efficiently, and query optimization can work with an underlying DBMS Examples Oracle Spatial Data Cartridge ESRI Spatial Data Engine 5/11/2014 6/25/2018

MODELLING SPATIAL OBJECTS What needs to be represented? Two important alternative views Single objects: distinct entities arranged in space each of which has its own geometric description modeling cities, forests, rivers Spatially related collection of objects: describe space itself (about every point in space) modeling land use, partition of a country into districts 5/11/2014

TEMPORAL DATABASES Time-series database Consists of sequences of values or events changing with time Data is recorded at regular intervals Characteristic time-series components Trend, cycle, seasonal, irregular Applications Financial: stock price, inflation Industry: power consumption Scientific: experiment results Meteorological: precipitation 5/11/2014

Categories of Time-Series Movements TEMPORAL DATABASES Categories of Time-Series Movements Long-term or trend movements (trend curve): general direction in which a time series is moving over a long interval of time Cyclic movements or cycle variations: long term oscillations about a trend line or curve e.g., business cycles, may or may not be periodic Seasonal movements or seasonal variations i.e, almost identical patterns that a time series appears to follow during corresponding months of successive years. Irregular or random movements Time series analysis: decomposition of a time series into these four basic movements Additive Modal: TS = T + C + S + I Multiplicative Modal: TS = T  C  S  I 5/11/2014

Time-sequence query language TEMPORAL DATABASES Time-sequence query language Should be able to specify sophisticated queries like Find all of the sequences that are similar to some sequence in class A, but not similar to any sequence in class B Should be able to support various kinds of queries: range queries, all-pair queries, and nearest neighbor queries Shape definition language Allows users to define and query the overall shape of time sequences Uses human readable series of sequence transitions or macros Ignores the specific details E.g., the pattern up, Up, UP can be used to describe increasing degrees of rising slopes Macros: spike, valley, etc. 5/11/2014

SIMILAR TIME SERIES ANALYSIS 5/11/2014

MINING SPATIO-TEMPORAL DATA Data has spatial extensions and changes with time Ex: Forest fire, moving objects, hurricane & earthquakes Automatic anomaly detection in massive moving objects Moving objects are ubiquitous: GPS, radar, etc. Ex: Maritime vessel surveillance Problem: Automatic anomaly detection 5/11/2014