Data Management and Database Technologies 1 DATA MINING Extracting Knowledge From Data Petr Olmer CERN

Slides:



Advertisements
Similar presentations
Lecture 3: CBR Case-Base Indexing
Advertisements

COMP3740 CR32: Knowledge Management and Adaptive Systems
Decision Tree Approach in Data Mining
Data Mining Classification: Alternative Techniques
Naïve Bayes: discussion
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall.
Decision Trees.
1 Bayesian Classification Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Dan Weld, Eibe Frank.
Instance-based representation
Creating Concept Hierarchies in a Customer Self-Help System Bob Wall CS /29/05.
1 Data Mining with Bayesian Networks (I) Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Dan Weld, Eibe.
Classification: Decision Trees
Biological Data Mining (Predicting Post-synaptic Activity in Proteins)
1 Bayesian Classification Instructor: Qiang Yang Hong Kong University of Science and Technology Thanks: Dan Weld, Eibe Frank.
Algorithms for Classification: Notes by Gregory Piatetsky.
Decision Trees an Introduction.
Data Mining Classification: Alternative Techniques
Review. 2 Statistical modeling  “Opposite” of 1R: use all the attributes  Two assumptions: Attributes are  equally important  statistically independent.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
K Nearest Neighbor Classification Methods Qiang Yang.
Inferring rudimentary rules
Algorithms for Classification: The Basic Methods.
K Nearest Neighbor Classification Methods Qiang Yang.
Classification: Decision Trees 2 Outline  Top-Down Decision Tree Construction  Choosing the Splitting Attribute  Information Gain and Gain Ratio.
5. Machine Learning ENEE 759D | ENEE 459D | CMSC 858Z
Chapter 4: Algorithms CS 795.
Data Mining – Algorithms: OneR Chapter 4, Section 4.1.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Short Introduction to Machine Learning Instructor: Rada Mihalcea.
and Confidential NOTICE: Proprietary and Confidential This material is proprietary to A. Teredesai and GCCIS, RIT. Slide 1 Decision.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Classification II. 2 Numeric Attributes Numeric attributes can take many values –Creating branches for each value is not ideal The value range is usually.
Data Mining – Algorithms: Prism – Learning Rules via Separating and Covering Chapter 4, Section 4.4.
Classification I. 2 The Task Input: Collection of instances with a set of attributes x and a special nominal attribute Y called class attribute Output:
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Decision-Tree Induction & Decision-Rule Induction
Data Management and Database Technologies 1 DATA MINING Extracting Knowledge From Data Petr Olmer CERN
Data Mining – Algorithms: Decision Trees - ID3 Chapter 4, Section 4.3.
Copyright R. Weber Machine Learning, Data Mining INFO 629 Dr. R. Weber.
CS690L Data Mining: Classification
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.2 Statistical Modeling Rodney Nielsen Many.
 Classification 1. 2  Task: Given a set of pre-classified examples, build a model or classifier to classify new cases.  Supervised learning: classes.
Algorithms for Classification: The Basic Methods.
Data Mining – Algorithms: Naïve Bayes Chapter 4, Section 4.2.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Sections 4.1 Inferring Rudimentary Rules Rodney Nielsen.
Slide 1 DSCI 4520/5240: Data Mining Fall 2013 – Dr. Nick Evangelopoulos Lecture 5: Decision Tree Algorithms Material based on: Witten & Frank 2000, Olson.
Classification And Bayesian Learning
Example: input data outlooktemp.humiditywindyplay sunnyhothighfalseno sunnyhothightrueno overcasthothighfalseyes rainymildhighfalseyes rainycoolnormalfalseyes.
K Nearest Neighbor Classification Methods. Training Set.
An Exercise in Machine Learning
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.5: Mining Association Rules Rodney Nielsen.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Chapter 4: Algorithms CS 795. Inferring Rudimentary Rules 1R – Single rule – one level decision tree –Pick each attribute and form a single level tree.
Decision Trees by Muhammad Owais Zahid
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Data Mining Chapter 4 Algorithms: The Basic Methods Reporter: Yuen-Kuei Hsueh.
Data Mining Practical Machine Learning Tools and Techniques Chapter 6.3: Association Rules Rodney Nielsen Many / most of these slides were adapted from:
Data Mining and Text Mining. The Standard Data Mining process.
Data Science Algorithms: The Basic Methods
Prepared by: Mahmoud Rafeek Al-Farra
Data Science Algorithms: The Basic Methods
Artificial Intelligence
Data Science Algorithms: The Basic Methods
Decision Tree Saed Sayad 9/21/2018.
Clustering.
A task of induction to find patterns
Data Mining CSCI 307, Spring 2019 Lecture 6
Presentation transcript:

Data Management and Database Technologies 1 DATA MINING Extracting Knowledge From Data Petr Olmer CERN

Data Management and Database Technologies 2 Petr Olmer: Data Mining Motivation ● What if we do not know what to ask? ● How to discover a knowledge in databases without a specific query? Computers are useless, they can only give you answers.

Data Management and Database Technologies 3 Petr Olmer: Data Mining Many terms, one meaning ● Data mining ● Knowledge discovery in databases ● Data exploration ● A non trivial extraction of novel, implicit, and actionable knowledge from large databases. – without a specific hypothesis in mind! ● Techniques for discovering structural patterns in data.

Data Management and Database Technologies 4 Petr Olmer: Data Mining What is inside? ● Databases – data warehousing ● Statistics – methods – but different data source! ● Machine learning – output representations – algorithms

Data Management and Database Technologies 5 Petr Olmer: Data Mining CRISP-DM CRoss Industry Standard Process for Data Mining task specification task specification data understanding data understanding data preparation data preparation deployment evaluation modeling

Data Management and Database Technologies 6 Petr Olmer: Data Mining Input data: Instances, attributes ABCD Mon21ayes Wed19byes Mon23ano Sun23dyes Fri24cno Fri18dyes Sat20cno Tue21bno Mon25bno

example: input data outlooktemp.humiditywindyplay sunnyhothighfalseno sunnyhothightrueno overcasthothighfalseyes rainymildhighfalseyes rainycoolnormalfalseyes rainycoolnormaltrueno overcastcoolnormaltrueyes sunnymildhighfalseno sunnycoolnormalfalseyes rainymildnormalfalseyes sunnymildnormaltrueyes overcastmildhightrueyes overcasthotnormalfalseyes rainymildhightrueno

Data Management and Database Technologies 8 Petr Olmer: Data Mining Output data: Concepts ● Concept description = what is to be learned ● Classification learning ● Association learning ● Clustering ● Numeric prediction

Data Management and Database Technologies 9 Petr Olmer: Data Mining Task classes ● Predictive tasks – Predict an unknown value of the output attribute for a new instance. ● Descriptive tasks – Describe structures or relations of attributes. – Instances are not related!

Data Management and Database Technologies 10 Petr Olmer: Data Mining Models and algorithms ● Decision trees ● Classification rules ● Association rules ● k-nearest neighbors ● Cluster analysis

Data Management and Database Technologies 11 Petr Olmer: Data Mining Decision trees Inner nodes – test a particular attribute against a constant Leaf nodes – classify all instances that reach the leaf a b ca class C1 class C1 class C2 class C2 class C1 = 5 blue red > 0 <= 0hot cold

Data Management and Database Technologies 12 Petr Olmer: Data Mining Classification rules If precondition then conclusion An alternative to decision trees Rules can be read off a decision tree – one rule for each leaf – unambiguous, not ordered – more complex than necessary If (a>=5) then class C1 If (a 0) then class C1 If (a<5) and (b=“red”) and (c=“hot”) then class C2

Data Management and Database Technologies 13 Petr Olmer: Data Mining Classification rules Ordered or not ordered execution? ● Ordered – rules out of context can be incorrect – widely used ● Not ordered – different rules can lead to different conclusions – mostly used in boolean closed worlds ● only yes rules are given ● one rule in DNF

Data Management and Database Technologies 14 Petr Olmer: Data Mining Decision trees / Classification rules 1R algorithm for each attribute: for each value of that attribute: count how often each class appears find the most frequent class rule = assign the class to this attribute-value calculate the error rate of the rules choose the rules with the smallest error rate

outlook sunny-no2/5 overcast-yes0/4 rainy-yes2/5 total 4/14 outlooktemp.humiditywindyplay sunnyhothighfalseno sunnyhothightrueno overcasthothighfalseyes rainymildhighfalseyes rainycoolnormalfalseyes rainycoolnormaltrueno overcastcoolnormaltrueyes sunnymildhighfalseno sunnycoolnormalfalseyes rainymildnormalfalseyes sunnymildnormaltrueyes overcastmildhightrueyes overcasthotnormalfalseyes rainymildhightrueno temp. hot-no*2/4 mild-yes2/6 cool-yes1/4 total 5/14 humidity high-no3/7 normal-yes1/7 total 4/14 windy false-yes2/8 true-no*3/6 total 5/14 example: 1R

Data Management and Database Technologies 16 Petr Olmer: Data Mining Decision trees / Classification rules Naïve Bayes algorithm ● Attributes are – equally important – independent ● For a new instance, we count the probability for each class. ● Assign the most probable class. ● We use Laplace estimator in case of zero probability. ● Attribute dependencies reduce the power of NB.

outlooktemp.humiditywindyplay sunnyhothighfalseno sunnyhothightrueno overcasthothighfalseyes rainymildhighfalseyes rainycoolnormalfalseyes rainycoolnormaltrueno overcastcoolnormaltrueyes sunnymildhighfalseno sunnycoolnormalfalseyes rainymildnormalfalseyes sunnymildnormaltrueyes overcastmildhightrueyes overcasthotnormalfalseyes rainymildhightrueno sunnycoolhightrue? yes sunny2/9 cool3/9 high3/9 true3/9 overall9/ % no sunny3/5 cool1/5 high4/5 true3/5 overall5/ % example: Naïve Bayes

Data Management and Database Technologies 18 Petr Olmer: Data Mining Decision trees ID3: A recursive algorithm ● Select the attribute with the biggest information gain to place at the root node. ● Make one branch for each possible value. ● Build the subtrees. ● Information required to specify the class – when a branch is empty: zero – when the branches are equal: a maximum – f(a, b, c) = f(a, b + c) + g(b, c) ● Entropy:

outlooktemp.humiditywindyplay sunnyhothighfalseno sunnyhothightrueno overcasthothighfalseyes rainymildhighfalseyes rainycoolnormalfalseyes rainycoolnormaltrueno overcastcoolnormaltrueyes sunnymildhighfalseno sunnycoolnormalfalseyes rainymildnormalfalseyes sunnymildnormaltrueyes overcastmildhightrueyes overcasthotnormalfalseyes rainymildhightrueno outlook sunny 2:3 overcast 4:0 rainy 3:2 temp. hot 2:2 cool 3:1 mild 4:2 humidity high 3:4 normal 6:1 windy false 6:2 true 3:3 temp. hot 0:2 cool 1:0 mild 1:1 humidity high 0:3 normal 2:0 windy false 1:2 true 1:1 example: ID3

outlook windy humidity yes noyes no high normal false true sunny overcast rainy example: ID3

Data Management and Database Technologies 21 Petr Olmer: Data Mining Classification rules PRISM: A covering algorithm ● For each class seek a way of covering all instances in it. ● Start with: If ? then class C1. ● Choose an attribute-value pair to maximize the probability of the desired classification. – include as many positive instances as possible – exclude as many negative instances as possible ● Improve the precondition. ● There can be more rules for a class! – Delete the covered instances and try again. only correct unordered rules

example: PRISM outlooktemp.humiditywindyplay sunnyhothighfalseno sunnyhothightrueno overcasthothighfalseyes rainymildhighfalseyes rainycoolnormalfalseyes rainycoolnormaltrueno overcastcoolnormaltrueyes sunnymildhighfalseno sunnycoolnormalfalseyes rainymildnormalfalseyes sunnymildnormaltrueyes overcastmildhightrueyes overcasthotnormalfalseyes rainymildhightrueno If ? then P=yes If O=overcast then P=yes O = sunny2/5 O = overcast4/4 O = rainy3/5 T = hot 2/4 T = mild4/6 T = cool3/4 H = high3/7 H = normal6/7 W = false6/8 W = true3/6

outlooktemp.humiditywindyplay sunnyhothighfalseno sunnyhothightrueno overcasthothighfalseyes rainymildhighfalseyes rainycoolnormalfalseyes rainycoolnormaltrueno overcastcoolnormaltrueyes sunnymildhighfalseno sunnycoolnormalfalseyes rainymildnormalfalseyes sunnymildnormaltrueyes overcastmildhightrueyes overcasthotnormalfalseyes rainymildhightrueno example: PRISM If ? then P=yes If H=normal then P=yes O = sunny2/5 O = rainy3/5 T = hot 0/2 T = mild3/5 T = cool2/3 H = high1/5 H = normal4/5 W = false4/6 W = true1/4

outlooktemp.humiditywindyplay sunnyhothighfalseno sunnyhothightrueno overcasthothighfalseyes rainymildhighfalseyes rainycoolnormalfalseyes rainycoolnormaltrueno overcastcoolnormaltrueyes sunnymildhighfalseno sunnycoolnormalfalseyes rainymildnormalfalseyes sunnymildnormaltrueyes overcastmildhightrueyes overcasthotnormalfalseyes rainymildhightrueno example: PRISM O = sunny2/2 O = rainy2/3 T = mild2/2 T = cool2/3 W = false3/3 W = true1/2 If H=normal and ? then P=yes If H=normal and W=false then P=yes

outlooktemp.humiditywindyplay sunnyhothighfalseno sunnyhothightrueno overcasthothighfalseyes rainymildhighfalseyes rainycoolnormalfalseyes rainycoolnormaltrueno overcastcoolnormaltrueyes sunnymildhighfalseno sunnycoolnormalfalseyes rainymildnormalfalseyes sunnymildnormaltrueyes overcastmildhightrueyes overcasthotnormalfalseyes rainymildhightrueno If O=overcast then P=yes If H=normal and W=false then P=yes If T=mild and H=normal then P=yes If O=rainy and W=false then P=yes example: PRISM

Data Management and Database Technologies 26 Petr Olmer: Data Mining Association rules ● Structurally the same as C-rules: If - then ● Can predict any attribute or their combination ● Not intended to be used together ● Characteristics: – Support = a – Accuracy = a / (a + b) Cnon C Pab non Pcd

Data Management and Database Technologies 27 Petr Olmer: Data Mining Association rules Multiple consequences ● If A and B then C and D ● If A and B then C ● If A and B then D ● If A and B and C then D ● If A and B and D then C

Data Management and Database Technologies 28 Petr Olmer: Data Mining Association rules Algorithm ● Algorithms for C-rules can be used – very inefficient ● Instead, we seek rules with a given minimum support, and test their accuracy. ● Item sets: combinations of attribute-value pairs ● Generate items sets with the given support. ● From them, generate rules with the given accuracy.

Data Management and Database Technologies 29 Petr Olmer: Data Mining k-nearest neighbor ● Instance-based representation – no explicit structure – lazy learning ● A new instance is compared with existing ones – distance metric ● a = b, d(a, b) = 0 ● a <> b, d(a, b) = 1 – closest k instances are used for classification ● majority ● average

outlooktemp.humiditywindyplaydistance sunnyhothighfalseno2 sunnyhothightrueno1 overcasthothighfalseyes3 rainymildhighfalseyes3 rainycoolnormalfalseyes3 rainycoolnormaltrueno2 overcastcoolnormaltrueyes2 sunnymildhighfalseno2 sunnycoolnormalfalseyes2 rainymildnormalfalseyes4 sunnymildnormaltrueyes2 overcastmildhightrueyes2 overcasthotnormalfalseyes4 rainymildhightrueno2 example: kNN sunnycoolhightrue?

Data Management and Database Technologies 31 Petr Olmer: Data Mining Cluster analysis ● Diagram: how the instances fall into clusters. ● One instance can belong to more clusters. ● Belonging can be probabilistic or fuzzy. ● Clusters can be hierarchical.

Data Management and Database Technologies 32 Petr Olmer: Data Mining Data mining Conclusion ● Different algorithms discover different knowledge in different formats. ● Simple ideas often work very well. ● There’s no magic!

Data Management and Database Technologies 33 Petr Olmer: Data Mining Text mining ● Data mining discovers knowledge in structured data. ● Text mining works with unstructured text. – Groups similar documents – Classifies documents into taxonomy – Finds out the probable author of a document – … ● Is it a different task?

Data Management and Database Technologies 34 Petr Olmer: Data Mining How do mathematicians work Settings 1: – empty kettle – fire – source of cold water – tea bag How to prepare tea: – put water into the kettle – put the kettle on fire – when water boils, put the tea bag in the kettle Settings 2: – kettle with boiling water – fire – source of cold water – tea bag How to prepare tea: – empty the kettle – follow the previous case

Data Management and Database Technologies 35 Petr Olmer: Data Mining Text mining Is it different? ● Maybe it is, but we do not care. ● We convert free text to structured data… ● … and “follow the previous case”.

Data Management and Database Technologies 36 Petr Olmer: Data Mining Google News How does it work? ● ● Search web for the news. – Parse content of given web sites. ● Convert news (documents) to structured data. – Documents become vectors. ● Cluster analysis. – Similar documents are grouped together. ● Importance analysis. – Important documents are on the top

Data Management and Database Technologies 37 Petr Olmer: Data Mining From documents to vectors ● We match documents with terms – Can be given (ontology) – Can be derived from documents ● Documents are described as vectors of weights – d = (1, 0, 0, 1, 1) – t1, t4, t5 are in d – t2, t3 are not in d

Data Management and Database Technologies 38 Petr Olmer: Data Mining TFIDF Term Frequency / Inverse Document Frequency TF(t, d) = how many times t occurs in d DF(t) = in how many documents t occurs at least once Term is important if its – TF is high – IDF is high Weight(d, t) = TF(t, d) · IDF(t)

Data Management and Database Technologies 39 Petr Olmer: Data Mining Cluster analysis Vectors – Cosine similarity On-line analysis – A new document arrives. – Try k-nearest neighbors. – If neighbors are too far, leave it alone.

Data Management and Database Technologies 40 Petr Olmer: Data Mining Text mining Conclusion ● Text mining is very young. – Research is on-going heavily ● We convert text to data. – Documents to vectors – Term weights: TFIDF ● We can use data mining methods. – Classification – Cluster analysis – …

Data Management and Database Technologies 41 Petr Olmer: Data Mining References ● Ian H. Witten, Eibe Frank: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations ● Michael W. Berry: Survey of Text Mining: Clustering, Classification, and Retrieval ● ●

Data Management and Database Technologies 42 Questions? Petr Olmer Computers are useless, they can only give you answers.