1 Chapter 1 INTRODUCTION. 2 What is Pattern Recognition? Pattern Recognition by Human perceptual specialized – decision making Pattern Recognition by.

1 Chapter 1 INTRODUCTION

2 What is Pattern Recognition? Pattern Recognition by Human perceptual specialized – decision making Pattern Recognition by Computers benefit of automated pattern recognition advantage in complex calculations Pattern Recognition from Data (Data Mining)

3 Pattern Recognition from Data Pattern recognition from data is the process of learning the historical data by finding data dependency and getting the knowledge from data.

4 What is Data? Studies Education Works Income (D) 1Poor SPMPoorNone 2Poor SPMGoodLow 3Moderate SPMPoorLow 4Moderate Diploma Poor Low 5Poor SPMPoorNone 6Moderate Diploma PoorLow 7Good MSCGoodMedium : 99Poor SPMGoodLow 100Moderate DiplomaPoorLow

5 What is Knowledge?? studies(Poor) AND work(Poor) => income(None) studies(Poor) AND work(Good) => income(Low) education(Diploma) => income(Low) education(MSc) => income(Medium) OR income(High) studies(Mod) => income(Low) studies(Good) => income(Medium) OR income(High) education(SPM) AND work(Good) => income(Low)

6 Why is Data Mining prevalent? 1. Lots of data is collected and stored in data warehouses Business Wal-Mart logs nearly 20 million transactions per day Astronomy Telescope collecting large amounts of data. Space NASA is collecting peta bytes of data from satellites Physics High energy physics experiments are expected to generate 100 to 1000 tera bytes in the next decade.

7 Why is Data Mining prevalent? 2. Quality and richness of data collected is improving Retailers Scanner data is much more accurate than other means E-commerce Rich data on customer browsing Science Accurate of sensor is improving

8 Why is Data Mining prevalent? 3. The gap between data and analysts is increasing Existing of Hidden information High cost of human labor Much of data is never analyzed at all

9 Origins of Data Mining Drawn ideas from Machine Learning, Pattern Recognition, Statistics, and Database Systems for applications that have Enormous of data High dimensionality of data Heterogeneous data Unstructured data

10 Data Mining: confluence of multiple discipline DATA MINING Database technology statistic Machine learning Information science Neural network Pattern recognition visualization Information retrieval HPerformance computing Spatial data analysis

11 Data Mining – What it isn’t Small Scale Data mining methods are designed for large data sets Foolproof Data mining techniques will discover patterns in any data The patterns discovered may be meaningless It is up to the user to determine how to interpret the results “Make it foolproof and they’ll just invent a better fool” Magic Data mining techniques cannot generate information that is not present in the data They can only find the patterns that are already there

12 Example: Data Mining is not …. Generating multidimensional cubes of a relational table Searching for a phone number in a phone book Searching for keywords on Google (IR) Generating a histogram of salaries for different age groups Issuing SQL query to a database, and reading the reply

13 Data Mining – What it is Extracting knowledge from large amounts of data Uses techniques from: Pattern Recognition Machine Learning Statistics Plus techniques unique to data mining (Association rules) Data mining methods must be efficient and scalable

14 Example: Data mining is … What goods should be promoted to this customer? What is the probability that a certain customer will respond to a planned promotion? Can one predict the most profitable securities to buy/sell during the next trading session? Will this customer default on a loan or pay back on schedule? What medical diagnose should be assigned to this patient? What kind of cars should be sell this year?? Finding groups of people with similar hobbies Are chances of getting cancer higher if you live near a power line?

15 Data Mining is simply... Finds relationship make prediction

16 Data Mining: Definition The non trivial extraction of implicit, previously unknown, and potentially useful information from data (William J Fawley, Gregory Piatetsky- Shapiro and Christopher J Matheus)

17 Data Mining : 1-step of KDD KDD = Knowledge Discovery in Databases Patterns Data Warehouse Databases Flat files Selection and Transformation Data Mining Evaluation & Presentation Cleaning and Integration Knowledge

18 Cont’d Data cleaning To remove noise and inconsistent data Data integration Multiple data sources may be combined Data selection Data relevant to the analysis task are retrieved from the database Data transformation Data are transformed or consolidated into forms appropriate for mining by performing summary or aggregation operations

19 Cont’d Data mining An essential process where intelligent methods are applied in order to extract data patterns Pattern evaluation To identify the truly interesting patterns representing knowledge based on some interestingness measures Knowledge presentation Visualization and knowledge representation techniques are used to present the mined knowledge to the users

20 Early Steps of Data Mining Data preprocessing handling incomplete data, noisy data, uncertain data Data discretization/representation transforms data into suitable values for the mining algorithm to find patterns Data selection selects the suitable data for mining purposes

21 Data base Systems Kinds of DB Relational Data warehouse Transactional DB Advanced DB system Flat files WWW Kinds of Knowledge Classification Association Clustering Prediction …

22 Data Mining – Types of Data Mining can be performed on data in a variety of forms Relational Database Traditional DMBS everyone is familiar with Data is stored in a series of tables (Collection of tables) Data is extracted via queries, typically with SQL SQL: “Show me a list of items that were sold in the last quarter” “show me the total sales of the last month, grouped by branch” “How many transactions occurred in the month of December?” “which sales person had the highest amount of sales” Relational language: aggregate function such as sum, avg, count, max, min

23 Data Mining – Types of Data Apply data mining – go further Searching for trends or data patterns Analyzed customer data to predict credit risk of new customers based on their income Detect deviation – items whose sales are far from those expected in comparison with the previous year (further investigated: change in packaging, increase in price?) Transaction Database Similar to relational database (transactions stored in a table) Each row (record) is a transaction with id & list of items in transaction Nested relation Can be unfolded into a relational database or stored in flat files since nested relational structures did not supported by relational db system Which items sold well together?

24 Data Mining – Types of Data Data Warehouse Stores historical data, potentially from multiple sources Organized around major subjects Contains summary statistics Object / Object-Relational Databases Database consisting of objects Object = set of variables + associated methods Eg: Intel uses regularity extraction in automatic circuit layout Images Can mine features extracted from images, OR Can use mining techniques to extract features Content based image retrieval

25 Data Mining – Types of Data Vector Geometries (spatial db) Include GIS and CAD data Raster data – n-dimensional bit maps /pixel maps Vector format – point, line, polygon Can find spatial patterns between features Describing the characteristics of houses located near a specified kind of location Describe the climate of mountainous areas located at various altitudes Text Can be unstructured, semi-structured, or structured Documentation, newspaper articles, web sites etc. Can facilitate search by linking related documents / concepts

26 Data Mining – Types of Data Video / Audio Speech recognition – recognized spoken command Security applications Integrated with standard data mining methods (storage and searching) Temporal Databases / Time Series Global change databases (temperature records) Space shuttle telemetry Stock market data (stock exchange) Usually stores relational data that include time-related attributes Find the trend of changes for objects – decision making/strategy planning

27 Data Mining – Types of Data Stock exchange data can be mined to uncover trends that could help in planning investment strategies (when is the best time to purchase TNB stock?) Legacy Databases Group of heterogeneous databases (relational, OO db, network db, multimedia db etc.) Connected by intra- or inter-computer networks Information exchange is very difficult – student academic performance among different schools/universities Data mining – transforming the given data into higher, more generalized, conceptual levels

28 The evolution of database technology Data mining can viewed as a result of the natural evolution of data base technology (Fig. 1.1). The figure shows 5 stages of functionalities: - data collection and database creation - database management systems - advanced databases systems - web-based databases systems - data warehousing and data mining

30 The evolution of database technology..cont Databases systems provide data storage and retrieval, and transaction processing. Data warehousing and data mining provide data analysis and understanding. Data ware house is a database architecture that store many different types of databases, a repository of multiple heterogeneous data sources. They are organized under a unified schema at a single site in order to facilitate management decision making.

31 The evolution of database technology..cont Data warehouse technology includes: - data cleansing - data integration, and - On-Line Analytical Processing (OLAP) OLAP is the analysis technique for performing summarization, consolidation, and aggregation, as well as ability to view information from different angles. Although OLAP tools support data analysis but not in- depth-analysis such as data classification, clustering, and the characterization of data changes over time

32 DBMS, OLAP & Data Mining AreaDBMSOLAPData Mining Task Extraction of detailed and summary data Summaries, trends and forecast Knowledge discovery of hidden patterns and insight Type of result InformationAnalysisInsight and prediction MethodDeduction (Ask the question, verify with data) Multidimensional data modeling, Aggregation, statistics Induction (Build the model, apply it to new data, get the result) Example question Who purchased mutual funds in the last 3 years What is the average income of mutual fund buyers by region by year? Who will buy a mutual fund in the next 6 months and why?

33 Example: Weather data Record of the weather conditions during a two- week period, along with the decisions of a tennis player whether or not to play tennis on each particular day Generated tuples (or examples, instances) consisting of values of 4 independent variables Outlook Temperature Humidity Windy One dependent variable - play

34 Cont’d Dayoutlooktemperaturehumiditywindyplay 1sunny85 falseNo 2sunny8090trueNo 3overcast8386FalseYes 4rainy7096FalseYes 5rainy6880FalseYes 6rainy6570TrueNo 7overcast6465TrueYes 8sunny7295FalseNo 9sunny6970FalseYes 10rainy7580FalseYes 11sunny7570TrueYes 12overcast7290TrueYes 13overcast8175FalseYes 14rainy7191trueno

35 DBMS We may answer questions by querying a DBMS containing the above table What was the temperature in the sunny days? Which days the humidity was less than 75? Which days the temperature was greater than 70? Which days the temperature was greater than 70 and the humidity was less than 75?

36 OLAP (On-line analytical processing) Using OLAP – create Multidimensional Model (Data cube) Eg. Dimensions: time, outlook, play – can create the model below 9/5sunnyrainyovercast Week 1 0/22/12/0 Week 2 2/11/12/0

37 Cont’d Observing the data cube – easily identify some important properties of the data Find regularities or pattern Eg. The 3 rd column: if the outlook is overcast the play attribute is always yes If outlook = overcast then play = yes

38 Drill-down: time dimension Concept hierarchy 9/5sunnyrainyovercast 10/10/0 20/10/0 3 1/0 40/01/00/0 5 1/00/0 6 0/10/0 7 1/0 80/10/0 91/00/0 100/01/00/0 111/00/0 120/0 1/0 130/0 1/0 140/00/10/0

39 Roll-up (reverse of drill-down) 9/5sunnyrainyovercast Week 1 0/22/12/0 Week 2 2/11/12/0

40 Data Mining Tasks Prediction methods Use some variables to predict unknown or future values of the same or other variables. Inference on the current data in order to make prediction Description methods Find human interpretable patterns that describe data Characterize the general properties of data in db Descriptive mining is complementary to predictive mining but it is closer to decision support than decision making

41 Cont’d Association Rule Mining (descriptive) Classification and Prediction (predictive) Clustering (descriptive) Sequential Pattern Discover (descriptive) Regression (predictive) Deviation Detection (predictive)

42 Association Rule Mining Initially developed for market basket analysis Goal is to discover relationships between attributes Data is typically stored in very large databases, sometimes in flat files or images Uses include decision support, classification and clustering Application areas include business, medicine and engineering

43 Association Rule Mining Given a set of transactions, each of which is a set of items, find all rules (X  Y) that satisfy user specified minimum support and confidence constraints Support = (#T containing X and Y)/(#T) Confidence=(#T containing X and Y)/ (#T containing X) Applications Cross selling and up selling Supermarket shelf management Some rules discovered Bread  Jem Sup=60%, conf=75% Jelly  Bread Sup=60%, conf=100% Jelly  Jem Sup=20%, conf=100% Jelly  Milk Sup=0%

44 Association Rule Mining: Definition Given a set of records, each of which contain some number of items from a given collection: Produce dependency rules which will predict occurrence of an item based on occurrences of other items Example: {Bread}  {Jem} {Jelly}  {Jem}

45 Association Rule Mining: Marketing and sales promotion Say the rule discovered is {Bread, …}  {Jem} Jem as a consequent: can be used to determine what products will boost its sales. Bread as antecedent: can be used to see which products will be impacted if the store stops selling bread Bread as an antecedent and Jem as a consequent: can be used to see what products should be stocked along with Bread to promote the sale of Jem.

46 Association Rule Mining: Supermarket shelf management Goal: To identify items that are bought concomitantly by a reasonable fraction of customers so that they can be shelved. Data Used: Point-of sale data collected with barcode scanners to find dependencies among products. Example If customer buys jelly, then he is very likely to by Jem. So don’t be surprised if you find Jem next to Jelly on an aisle in the super market. Also salsa next to tortilla chips.

47 Association Rule Mining Association rule mining will produce LOTS of rules How can you tell which ones are important? High Support High Confidence Rules involving certain attributes of interest Rules with a specific structure Rules with support / confidence higher than expected Completeness – Generating all interesting rules Efficiency – Generating only rules that are interesting

48 Clustering Determine object groupings such that objects within the same cluster are similar to each other, while objects in different groups are not Typically objects are represented by data points in a multidimensional space with each dimension corresponding to one or more attributes. Clustering problem in this case reduces to the following: Given a set of data points, each having a set of attributes, and a similarity measure, find cluster such that Data points in one cluster are more similar to one another Data points in separate clusters are less similar to one another

49 Cont’d Similarity measures: Euclidean distance (continuous attr.) Other problem – specific measures Types of Clustering Group-Based Clustering Hierarchical Clustering

50 Clustering Example Euclidean distance based clustering in 3D space Intra cluster distances are minimised Inter cluster distances are maximised

51 Clustering: Market Segmentation Goal: To subdivide a market into distinct subset of customers where each subset can be targeted with a distinct marketing mix Approach: Collect different attributes of customers based on their geographical and lifestyle related information Find clusters of similar customers Measure the clustering quality by observing the buying patterns of customers in the same cluster vs. those from different clusters.

52 Clustering: Document Clustering Goal: To find groups of documents that are similar to each other based on important terms appearing in them Approach: To identify frequently occurring terms in each document. Form a similarity measure based on frequencies of different terms. Use it to generate clusters. Gain: Information Retrieval can utilize the clusters to relate a new document or search to clustered documents

53 Clustering: Document Clustering Example Clustering points: 3204 articles of LA Times Similarity measure: Number of common words in documents (after some word filtering)

54 Classification: Definition Given a set of records (called the training set) Each record contains a set of attributes. One of the attributes is the class Find a model for the class attribute as a function of the values of other attributes Goal: Previous unseen records should be assigned to a class as accurately as possible Usually, the given data set is divided into training and test set, with training set used to build the model and test set used to validate it. The accuracy of the model is determined on the test set.

55 Classification: cont’d Classifiers are created using labeled training samples Classifiers are evaluated using independent labeled samples (test set) Training samples created by ground truth / experts Classifier later used to classify unknown samples Measurements must be able to predict the phenomenon! Examples Direct marketing Fraud detection Customer churn Sky survey cataloging Classifying galaxies

56 Classification Example

57 Classification: Direct Marketing Goal: Reduce cost of mailing by targeting a set of consumers likely to buy a new cell phone product Approach: Use the data collected for a similar product introduced in the recent past. Use the profiles of consumers along with their (buy, didn’t buy} decision. The latter becomes the class attribute. The profile of the information may consist of demographic, lifestyle and company interaction. Demographic – Age, Gender, Geography, Salary Psychographic - Hobbies Company Interaction – Recentness, Frequency, Monetary Use these information as input attributes to learn a classifier model

58 Classification: Fraud Detection Goal: Predict fraudulent cases in credit card transactions Approach: Use credit card transactions and the information on its account holders as attributes (important: when and where the card was used) Label past transactions as {fraud, fair} transactions. This forms the class attribute Learn a model for the class of transactions Use this model to detect fraud by observing credit card transactions on an account.

59 Regression Predict the value of a given continuous valued variable based on the values of other variables, assuming a linear or non-linear model of dependency Extensively studied in the fields of Statistics and Neural Networks Predicting sales number of new product based on advertising expenditure Predicting wind velocities based on temperature, humidity, air pressure, etc Time series prediction of stock market indices

60 Deviation/Anomaly Detection Some data objects do not comply with the general behavior or model of the data. Data objects that are different from or inconsistent with the remaining set are called outliers Outliers can be caused by measurement or execution error. Or they represent some kind of fraudulent activity Goal of deviation/anomaly detection is to detect significant deviations from normal behavior

61 Deviation/Anomaly Detection: Definition Given a set of n points or objects, and k, the expected number of outliers, find the top k objects that considerably dissimilar, exceptional or inconsistent with the remaining data This can be viewed as two sub problems Define what data can be considered as inconsistent in a given data set Find an efficient method to mine the outliers

62 Deviation: Credit Card Fraud Detection Goal: to detect fraudulent credit card transactions Approach: Based on past usage patterns, develop model for authorized credit card transactions Check for deviation from model, before authenticating new credit card transactions Hold payment and verify authenticity of “doubtful” transaction by other means (phone call, etc.)

63 Anomaly detection: Network Intrusion Detection Goal: to detect intrusion of a computer network Approach: Define and develop a model for normal user behavior on the computer network Continuously monitor behavior of users to check if it deviates from the defined normal behavior Raise an alarm, if such deviation is found

64 Sequential pattern discovery: definition Given is a set of objects, with each object associated with its own time of events, find rules that predict strong sequential dependencies among different events Sequence discovery aims at extracting sets of events that commonly occur over a period of time (A B) (C)  (D E)

65 Sequential pattern discovery: Telecommunication Alarm Logs Telecommunication alarm logs (Inverter_Problem Excessive_Line_Current) (Rectifier_Alarm)  (Fire_Alarm)

66 Sequential pattern discovery: Point of Sell Up Sell / Cross Sell Point of sale transaction sequences Computer bookstore (Intro_to_Visual_C) (C++ Primer)  (Perl_For_Dummies, Tcl_Tk) 60% customers who buy Intro toVisual C and C++ Primer also buy Perl for dummies and Tcl Tk within a month Athletic apparel store (Shoes) (Racket, Racket ball)  (Sport_Jacket)

67 Example: Data Mining(Weather data) By applying various data mining techniques, we can find associations and regularities in our data Extract knowledge in the forms of rules, decision trees etc. Predict the value of the dependent variable in new situation Some example Mining association rules Classification by decision trees and rules Prediction methods

68 Mining association rules First, discretize the numeric attributes (a part of the data preprocessing stage) Group the temperature values in three intervals (hot, mild, cool) and humidity values in two (high, normal) Substitute the values in data with the corresponding names Apply the Apriori algorithm and get the following rules

69 Discretized weather data Dayoutlooktemperaturehumiditywindyplay 1sunnyhothighfalseNo 2sunnyhothightrueNo 3overcasthothighFalseYes 4rainymildhighFalseYes 5rainycoolnormalFalseYes 6rainycoolnormalTrueNo 7overcastcoolnormalTrueYes 8sunnymildhighFalseNo 9sunnycoolnormalFalseYes 10rainymildnormalFalseYes 11sunnymildnormalTrueYes 12overcastmildhighTrueYes 13overcasthotnormalFalseYes 14rainymildhightrueno

70 Cont’d 1.humidity=normal windy=false  play=yes (4,1) 2.temperature=cool  humidity=normal (4,1) 3.outlook=overcast  play=yes (4,1) 4.temperature=cool play=yes  humidity=normal (3,1) 5.outlook=rainy windy=false  play=yes (3, 1) 6.outlook=rainy play=yes  windy=false (3, 1) 7.outlook=sunny humidity=high  play=no (3, 1) 8.outlook=sunny play=no  humidity=high (3, 1) 9.temperature=cool windy=false  humidity=normal play=yes (2, 1) 10.temperature=cool humidity=normal windy=false  play=yes (2, 1)

71 Cont’d These rules show some attribute values sets (itemsets) that appear frequently in the data Support (the number of occurrences of the itemset in the data) Confidence (accuracy) of the rules Rule 3 – the same as the one that is produced by observing the data cube

72 Classification by Decision Trees and Rules Using ID3 algorithm, the following decision tree is produced Outlook=sunny Humidity=high:no Humidity=normal:yes Outlook=overcast:yes Outlook=rainy Windy=true:no Windy=false:yes

73 Cont’d Decision tree consists of: Decision nodes that test the values of their corresponding attribute Each value of this attribute leads to a subtree and so on, until the leaves of the tree are reached They determine the value of the dependent variable Using a decision tree we can classify new tuples

74 Cont’d A decision tree can be presented as a set of rules Each rule represents a path through the tree from the root to a leaf Other data mining techniques can produce rules directly: Prism algorithm if outlook=overcast then yes if humidity=normal and windy=false then yes If temperature=mild and humidity=normal the yes If outlook=rainy and windy=false then yes If outlook=sunny and humidity=high then no If outlook=rainy and windy=true then no

75 Prediction methods DM offers techniques to predict the value of the dependent variable directly without first generating a model The most popular approaches is based of statistical methods Uses the Bayes rule to predict the probability of each value of the dependent variable given the values of the independent variables

76 Cont’d Eg: applying Bayes to the new tuple: (sunny, mild, normal, false, ?) P(play=yes| outlook=sunny, temperature=mild, humidity=normal, windy=false) = 0.8 P(play=no| outlook=sunny, temperature=mild, humidity=normal, windy=false) = 0.2  The predicted value must be “yes”

77 Data Mining : Problems and Challenges Noisy data Difficult Training Set Incomplete Data Dynamic Database s Large Database s

78 Noisy data many of attribute values will be inexact or incorrect erroneous instruments measuring some property human errors occurring at data entry two forms of noise in the data corrupted values - some of the values in the training set are altered from the original form missing values - one or more of the attribute values may be missing both for examples in the training set and for object which are to be classified.

79 Difficult Training Set Non-representative data Learning are based on a few examples Using large db, the rules probably representative Absence of boundary cases To find the real differences between two classes Limited information Two objects to be classified give the same conditional attributes but are classified in the diff class Not have enough information of distinguishing two types of objects

80 Dynamic databases Db change continually Rules that reflect the content of the db at all time (preferred) If same changes are made, the whole learning process may have to be conducted again

81 Large databases The size of db to be ever increasing Machine learning algorithms – handling a small training set (a few hundred examples) Much care on using similar techniques in larger db Large db – provide more knowledge (eg. rules may be enormous)

82 Data Mining – Issues in Data Mining User Interaction / Visualization Incorporation of Background Knowledge Noisy or Incomplete Data Determining Interestingness of Patterns Efficiency and Scalability Parallel and Distributed Mining Incremental Learning / Mining Time-Changing Phenomena Mining from Image / Video / Audio Data Mining Unstructured Data

1 Chapter 1 INTRODUCTION. 2 What is Pattern Recognition? Pattern Recognition by Human perceptual specialized – decision making Pattern Recognition by.

Similar presentations

Presentation on theme: "1 Chapter 1 INTRODUCTION. 2 What is Pattern Recognition? Pattern Recognition by Human perceptual specialized – decision making Pattern Recognition by."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Chapter 1 INTRODUCTION. 2 What is Pattern Recognition? Pattern Recognition by Human perceptual specialized – decision making Pattern Recognition by.

Similar presentations

Presentation on theme: "1 Chapter 1 INTRODUCTION. 2 What is Pattern Recognition? Pattern Recognition by Human perceptual specialized – decision making Pattern Recognition by."— Presentation transcript:

Similar presentations

About project

Feedback