Presentation is loading. Please wait.

Presentation is loading. Please wait.

Berendt: Knowledge and the Web, 2014, 1 Knowledge and the Web – Inferring new knowledge from data(bases):

Similar presentations


Presentation on theme: "Berendt: Knowledge and the Web, 2014, 1 Knowledge and the Web – Inferring new knowledge from data(bases):"— Presentation transcript:

1 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 1 Knowledge and the Web – Inferring new knowledge from data(bases): Knowledge Discovery in Databases I Bettina Berendt KU Leuven, Department of Computer Science http://people.cs.kuleuven.be/~bettina.berendt/teaching Last update: 15 November 2014

2 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 2 Agenda Motivation II: Types of reasoning The process of knowledge discovery (KDD) Input: Concepts and instances Input: Attributes and levels of measurement Some data preparation for WEKA Output: Patterns; e.g. Decision tree Motivation I: Application examples Algorithm: e.g. ID3 for decision-tree learning

3 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 3 Fictitious example: What determines whether a photo on Flickr is good?  Can we predict this from the data? NoTrueHighMildRainy YesFalseNormalHotOvercast YesTrueHighMildOvercast YesTrueNormalMildSunny YesFalseNormalMildRainy YesFalseNormalCoolSunny NoFalseHighMildSunny YesTrueNormalCoolOvercast NoTrueNormalCoolRainy YesFalseNormalCoolRainy YesFalseHighMildRainy YesFalseHighHotOvercast NoTrueHighHotSunny NoFalseHighHotSunny GoodPhotoFamousFlickrUserAngleTempWeather

4 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 4 What‘s spam and what isn‘t?

5 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 5 What makes people happy?

6 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 6 What „circles“ of friends do you have?

7 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 7 What should we recommend to a customer/user?

8 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 8 What topics exist in a collection of texts, and how do they evolve? News texts, scientific publications, …

9 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 9 Agenda Motivation II: Types of reasoning The process of knowledge discovery (KDD) Input: Concepts and instances Input: Attributes and levels of measurement Some data preparation for WEKA Output: Patterns; e.g. Decision tree Motivation I: Application examples Algorithm: e.g. ID3 for decision-tree learning

10 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 10 Styles of reasoning: „All swans are white“ n Deductive: towards the consequences l All swans are white. l Tessa is a swan. l  Tessa is white. n Inductive: towards a generalisation of observations l Joe and Lisa and Tex and Wili and... (all observed swans) are swans. l Joe and Lisa and Tex and Wili and... (all observed swans) are white. l  All swans are white. n Abductive: towards the (most likely) explanation of an observation. l Tessa is white. l All swans are white. l  Tessa is a swan.

11 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 11 What about truth? n Deductive: l Given the truth of the assumptions, a valid deduction guarantees the truth of the conclusion n Inductive: l the premises of an argument (are believed to) support the conclusion but do not ensure it l has been attacked several times by logicians and philosophers n Abductive: l formally equivalent to the logical fallacy affirming the consequent

12 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 12 What about new knowledge? C.S. Peirce: n Introduced „abduction“ to modern logic n (after 1900): used „abduction“ to mean: creating new rules to explain new observations (this meaning is actually closest to induction) n >  essential for scientific discovery

13 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 13 Agenda Motivation II: Types of reasoning The process of knowledge discovery (KDD) Input: Concepts and instances Input: Attributes and levels of measurement Some data preparation for WEKA Output: Patterns; e.g. Decision tree Motivation I: Application examples Algorithm: e.g. ID3 for decision-tree learning

14 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 14 „Data mining“ and „knowledge discovery“ n (informal definition): data mining is about discovering knowledge in (huge amounts of) data n Therefore, it is clearer to speak about “knowledge discovery in data(bases)” (KDD) Second reason for preferring the term “KDD”: n “data mining” is not uniquely defined: n Some people use it to denote certain types of knowledge discovery (e.g., finding association rules, but not classifier learning)

15 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 15 „Data mining“ is generally inductive n (informal definition): data mining is about discovering knowledge in (huge amounts of) data :... Looking at all the empirically observed swans...... Finding they are white... Concluding that swans are white

16 16 Berendt: Advanced databases, first semester 2011, http://people.cs.kuleuven.be/~bettina.berendt/teaching 16 The KDD process: The output The non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data - Fayyad, Platetsky-Shapiro, Smyth (1996) non-trivial process Multiple process valid Justified patterns/models novel Previously unknown useful Can be used understandable by human and machine

17 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 17 The process part of knowledge discovery CRISP-DM CRoss Industry Standard Process for Data Mining a data mining process model that describes commonly used approaches that expert data miners use to tackle problems.

18 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 18 Knowledge discovery, machine learning, data mining n Knowledge discovery = the whole process n Machine learning the application of induction algorithms and other algorithms that can be said to „learn.“ = „modeling“ phase n Data mining l sometimes = KD, sometimes = ML

19 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 19 Data Mining: Confluence of Multiple Disciplines Data Mining Database Technology Statistics Machine Learning Pattern Recognition Algorithm Other Disciplines Visualization

20 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 20 Recall: Knowledge discovery and styles of reasoning 1. Business understanding  Evaluation n Learn a model from the data (observed instances) n Generally involves induction (during Modelling) 2. Deployment n Apply the model to new instances n Corresponds to deduction l (if one assumes that the model is true)

21 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 21 Phases talked about today 1. Business understanding  Evaluation n Learn a model from the data (observed instances) n Generally involves induction (during Modelling)

22 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 22 Decision tree learning (1): Decision rules What contact lenses to give to a patient? (Could be based on background knowledge, but can also be learned from the WEKA contact lens data)

23 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 23 Decision tree learning (2): Classification / prediction In which weather will someone play (tennis etc.)? (Learned from the WEKA weather data)

24 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 24 Learned from data like this: NoTrueHighMildRainy YesFalseNormalHotOvercast YesTrueHighMildOvercast YesTrueNormalMildSunny YesFalseNormalMildRainy YesFalseNormalCoolSunny NoFalseHighMildSunny YesTrueNormalCoolOvercast NoTrueNormalCoolRainy YesFalseNormalCoolRainy YesFalseHighMildRainy YesFalseHighHotOvercast NoTrueHighHotSunny NoFalseHighHotSunny PlayWindyHumidityTempOutlook

25 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 25 Agenda Motivation II: Types of reasoning The process of knowledge discovery (KDD) Input: Concepts and instances Input: Attributes and levels of measurement Some data preparation for WEKA Output: Patterns; e.g. Decision tree Motivation I: Application examples Algorithm: e.g. ID3 for decision-tree learning

26 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 26 What’s a concept? Styles of learning:  Classification learning: predicting a discrete class  Association learning: detecting associations between features  Clustering: grouping similar instances into clusters  Numeric prediction: predicting a numeric quantity Concept: thing to be learned Concept description: output of learning scheme

27 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 27 Classification learning Example problems: weather data, contact lenses, irises, labor negotiations Classification learning is supervised  Scheme is provided with actual outcome Outcome is called the class of the example Measure success on fresh data for which class labels are known (test data)‏ In practice success is often measured subjectively

28 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 28 Example weather data NoTrueHighMildRainy YesFalseNormalHotOvercast YesTrueHighMildOvercast YesTrueNormalMildSunny YesFalseNormalMildRainy YesFalseNormalCoolSunny NoFalseHighMildSunny YesTrueNormalCoolOvercast NoTrueNormalCoolRainy YesFalseNormalCoolRainy YesFalseHighMildRainy YesFalseHighHotOvercast NoTrueHighHotSunny NoFalseHighHotSunny PlayWindyHumidityTempOutlook

29 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 29 Association learning Can be applied if no class is specified and any kind of structure is considered “interesting” Difference to classification learning:  Can predict any attribute’s value, not just the class, and more than one attribute’s value at a time  Hence: far more association rules than classification rules  Thus: constraints are necessary Minimum coverage and minimum accuracy

30 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 30 Clustering Finding groups of items that are similar Clustering is unsupervised  The class of an example is not known Success often measured subjectively … … … Iris virginica 1.95.12.75.8102 101 52 51 2 1 Iris virginica 2.56.03.36.3 Iris versicolor 1.54.53.26.4 Iris versicolor 1.44.73.27.0 Iris setosa 0.21.43.04.9 Iris setosa 0.21.43.55.1 TypePetal widthPetal lengthSepal widthSepal length

31 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 31 Numeric prediction Variant of classification learning where “class” is numeric (also called “regression”)‏ Learning is supervised  Scheme is being provided with target value Measure success on test data …………… 40FalseNormalMildRainy 55FalseHighHotOvercast 0TrueHighHotSunny 5FalseHighHotSunny Play-timeWindyHumidityTemperatureOutlook

32 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 32 What’s in an example? Instance: specific type of example Thing to be classified, associated, or clustered Individual, independent example of target concept Characterized by a predetermined set of attributes Input to learning scheme: set of instances/dataset Represented as a single relation/flat file Rather restricted form of input No relationships between objects Most common form in practical data mining

33 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 33 Instances in the weather data NoTrueHighMildRainy YesFalseNormalHotOvercast YesTrueHighMildOvercast YesTrueNormalMildSunny YesFalseNormalMildRainy YesFalseNormalCoolSunny NoFalseHighMildSunny YesTrueNormalCoolOvercast NoTrueNormalCoolRainy YesFalseNormalCoolRainy YesFalseHighMildRainy YesFalseHighHotOvercast NoTrueHighHotSunny NoFalseHighHotSunny PlayWindyHumidityTempOutlook

34 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 34 Agenda Motivation II: Types of reasoning The process of knowledge discovery (KDD) Input: Concepts and instances Input: Attributes and levels of measurement Some data preparation for WEKA Output: Patterns; e.g. Decision tree Motivation I: Application examples Algorithm: e.g. ID3 for decision-tree learning

35 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 35 What’s in an attribute? Each instance is described by a fixed predefined set of features, its “attributes” But: number of attributes may vary in practice  Possible solution: “irrelevant value” flag Related problem: existence of an attribute may depend of value of another one Possible attribute types (“levels of measurement”):  Nominal, ordinal, interval and ratio

36 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 36 Task: align example measures, scale of measurement, and allowed operations ExampleScale leveloperations Temperature (celsius) Grades at school/university Metres Temperature („warm“, „cold“,...) Weather („good“, „bad“) Weather („sunny“, „windy“, „cold crisp day“,...) Likert-scale values („on a scale of 1-7,...“) Duration of work tasks (in minutes) ECTS credits Nominal Ordinal Interval ratio =, ≠ +, - *, / % mode median arithmetic mean geom. mean

37 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 37 Nominal quantities Values are distinct symbols  Values themselves serve only as labels or names  Nominal comes from the Latin word for name Example: attribute “outlook” from weather data  Values: “sunny”,”overcast”, and “rainy” No relation is implied among nominal values (no ordering or distance measure)‏ Only equality tests can be performed

38 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 38 Ordinal quantities Impose order on values But: no distance between values defined Example: attribute “temperature” in weather data  Values: “hot” > “mild” > “cool” Note: addition and subtraction don’t make sense Example rule: temperature < hot  play = yes Distinction between nominal and ordinal not always clear (e.g. attribute “outlook”)‏

39 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 39 Interval quantities Interval quantities are not only ordered but measured in fixed and equal units Example 1: attribute “temperature” expressed in degrees Fahrenheit Example 2: attribute “year” Difference of two values makes sense Sum or product doesn’t make sense  Zero point is not defined!

40 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 40 Ratio quantities Ratio quantities are ones for which the measurement scheme defines a zero point Example: attribute “distance”  Distance between an object and itself is zero Ratio quantities are treated as real numbers  All mathematical operations are allowed But: is there an “inherently” defined zero point?  Answer depends on scientific knowledge (e.g. Fahrenheit knew no lower limit to temperature)‏

41 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 41 Task: What issues does this data collection have? (Curriculummapping, crosses for Bachelor course “Databases“) BTW, I think it does make a lot of sense for instructors to reflect on what they cover and what they test, & such lists can be helpful for this exercise.

42 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 42

43 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 43

44 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 44 Agenda Motivation II: Types of reasoning The process of knowledge discovery (KDD) Input: Concepts and instances Input: Attributes and levels of measurement Some data preparation for WEKA Output: Patterns; e.g. Decision tree Motivation I: Application examples Algorithm: e.g. ID3 for decision-tree learning

45 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 45 Notes on Preparing the input (CRISP-DM Data preparation stage) Denormalization is not the only issue Problem: different data sources (e.g. sales department, customer billing department, …)‏  Differences: styles of record keeping, conventions, time periods, data aggregation, primary keys, errors  Data must be assembled, integrated, cleaned up  “Data warehouse”: consistent point of access External data may be required (“overlay data”)‏ Critical: type and level of data aggregation

46 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 46 The ARFF format % % ARFF file for weather data with some numeric features % @relation weather @attribute outlook {sunny, overcast, rainy} @attribute temperature numeric @attribute humidity numeric @attribute windy {true, false} @attribute play? {yes, no} @data sunny, 85, 85, false, no sunny, 80, 90, true, no overcast, 83, 86, false, yes...

47 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 47 Additional attribute types ARFF supports string attributes: n Similar to nominal attributes but list of values is not pre- specified It also supports date attributes: n Uses the ISO-8601 combined date and time format yyyy- MM-dd-THH:mm:ss @attribute description string @attribute today date

48 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 48 Sparse data In some applications most attribute values in a dataset are zero n E.g.: word counts in a text categorization problem ARFF supports sparse data This also works for nominal attributes (where the first value corresponds to “zero”)‏ 0, 26, 0, 0, 0,0, 63, 0, 0, 0, “class A” 0, 0, 0, 42, 0, 0, 0, 0, 0, 0, “class B” {1 26, 6 63, 10 “class A”} {3 42, 10 “class B”}

49 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 49 Attribute types Interpretation of attribute types in ARFF depends on learning scheme  Numeric attributes are interpreted as ordinal scales if less-than and greater-than are used ratio scales if distance calculations are performed (normalization/standardization may be required)‏  Instance-based schemes define distance between nominal values (0 if values are equal, 1 otherwise)‏ Integers in some given data file: nominal, ordinal, or ratio scale?

50 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 50 Nominal vs. ordinal Attribute “age” nominal Attribute “age” ordinal (e.g. “young” < “pre-presbyopic” < “presbyopic”)‏ If age = young and astigmatic = no and tear production rate = normal then recommendation = soft If age = pre-presbyopic and astigmatic = no and tear production rate = normal then recommendation = soft If age  pre-presbyopic and astigmatic = no and tear production rate = normal then recommendation = soft

51 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 51 Missing values Frequently indicated by out-of-range entries  Types: unknown, unrecorded, irrelevant  Reasons: malfunctioning equipment changes in experimental design collation of different datasets measurement not possible Missing value may have significance in itself (e.g. missing test in a medical examination)‏  Most schemes assume that is not the case: “missing” may need to be coded as additional value  “?” in ARFF denotes a missing value

52 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 52 Inaccurate values Reason: data has not been collected for mining it Result: errors and omissions that don’t affect original purpose of data (e.g. age of customer)‏ Typographical errors in nominal attributes  values need to be checked for consistency Typographical and measurement errors in numeric attributes  outliers need to be identified Errors may be deliberate (e.g. wrong zip codes)‏ Other problems: duplicates, stale data

53 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 53 Getting to know the data Simple visualization tools are very useful  Nominal attributes: histograms (Distribution consistent with background knowledge?)‏  Numeric attributes: graphs (Any obvious outliers?)‏ 2-D and 3-D plots show dependencies Need to consult domain experts Too much data to inspect? Take a sample!

54 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 54 Agenda Motivation II: Types of reasoning The process of knowledge discovery (KDD) Input: Concepts and instances Input: Attributes and levels of measurement Some data preparation for WEKA Output: Patterns; e.g. Decision tree Motivation I: Application examples Algorithm: e.g. ID3 for decision-tree learning

55 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 55 Constructing decision trees Strategy: top down Recursive divide-and-conquer fashion  First: select attribute for root node Create branch for each possible attribute value  Then: split instances into subsets One for each branch extending from the node  Finally: repeat recursively for each branch, using only instances that reach the branch Stop if all instances have the same class

56 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 56 Which attribute to select?

57 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 57 Which attribute to select?

58 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 58 Criterion for attribute selection Which is the best attribute?  Want to get the smallest tree  Heuristic: choose the attribute that produces the “purest” nodes Popular impurity criterion: information gain  Information gain increases with the average purity of the subsets Strategy: choose attribute that gives greatest information gain

59 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 59 Computing information Measure information in bits  Given a probability distribution, the info required to predict an event is the distribution’s entropy  Entropy gives the information required in bits (can involve fractions of bits!)‏ Formula for computing the entropy:

60 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 60 Example: attribute Outlook

61 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 61 Computing information gain Information gain: information before splitting – information after splitting Information gain for attributes from weather data: gain(Outlook ) = 0.247 bits gain(Temperature ) = 0.029 bits gain(Humidity ) = 0.152 bits gain(Windy ) = 0.048 bits gain(Outlook )= info([9,5]) – info([2,3],[4,0],[3,2])‏ = 0.940 – 0.693 = 0.247 bits

62 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 62 Continuing to split gain(Temperature )= 0.571 bits gain(Humidity ) = 0.971 bits gain(Windy )= 0.020 bits

63 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 63 Final decision tree Note: not all leaves need to be pure; sometimes identical instances have different classes  Splitting stops when data can’t be split any further

64 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 64 Wishlist for a purity measure Properties we require from a purity measure:  When node is pure, measure should be zero  When impurity is maximal (i.e. all classes equally likely), measure should be maximal  Measure should obey multistage property (i.e. decisions can be made in several stages): Entropy is the only function that satisfies all three properties!

65 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 65 Properties of the entropy The multistage property: Simplification of computation: Note: instead of maximizing info gain we could just minimize information

66 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 66 Discussion / outlook decision trees Top-down induction of decision trees: ID3, algorithm developed by Ross Quinlan  Various improvements, e.g.  C4.5: deals with numeric attributes, missing values, noisy data  Gain ratio instead of information gain (see Witten & Frank slides, ch. 4, pp. 40-45) Similar approach: CART …

67 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 67 Weather data with mixed attributes Some attributes have numeric values …………… YesFalse8075Rainy YesFalse8683Overcast NoTrue9080Sunny NoFalse85 Sunny PlayWindyHumidityTemperatureOutlook If outlook = sunny and humidity > 83 then play = no If outlook = rainy and windy = true then play = no If outlook = overcast then play = yes If humidity < 85 then play = yes If none of the above then play = yes

68 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 68 Dealing with numeric attributes Discretize numeric attributes Divide each attribute’s range into intervals  Sort instances according to attribute’s values  Place breakpoints where class changes (majority class)‏  This minimizes the total error Example: temperature from weather data 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Yes | No | Yes Yes Yes | No No Yes | Yes Yes | No | Yes Yes | No …………… YesFalse8075Rainy YesFalse8683Overcast NoTrue9080Sunny NoFalse85 Sunny PlayWindyHumidityTemperatureOutlook

69 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 69 The problem of overfitting This procedure is very sensitive to noise  One instance with an incorrect class label will probably produce a separate interval Also: time stamp attribute will have zero errors Simple solution: enforce minimum number of instances in majority class per interval Example (with min = 3): 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Yes | No | Yes Yes Yes | No No Yes | Yes Yes | No | Yes Yes | No 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Yes No Yes Yes Yes | No No Yes Yes Yes | No Yes Yes No

70 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 70 Nominal and numeric attributes Nominal: number of children usually equal to number values  attribute won’t get tested more than once Other possibility: division into two subsets Numeric: test whether value is greater or less than constant  attribute may get tested several times Other possibility: three-way split (or multi-way split)‏ Integer: less than, equal to, greater than Real: below, within, above

71 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 71 Missing values Does absence of value have some significance? Yes  “missing” is a separate value No  “missing” must be treated in a special way  Solution A: assign instance to most popular branch  Solution B: split instance into pieces Pieces receive weight according to fraction of training instances that go down each branch Classifications from leave nodes are combined using the weights that have percolated to them

72 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 72 Classification rules Popular alternative to decision trees Antecedent (pre-condition): a series of tests (just like the tests at the nodes of a decision tree)‏ Tests are usually logically ANDed together (but may also be general logical expressions)‏ Consequent (conclusion): classes, set of classes, or probability distribution assigned by rule Individual rules are often logically ORed together  Conflicts arise if different conclusions apply

73 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 73 An example If outlook = sunny and humidity = high then play = no If outlook = rainy and windy = true then play = no If outlook = overcast then play = yes If humidity = normal then play = yes If none of the above then play = yes

74 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 74 Trees for numeric prediction Regression: the process of computing an expression that predicts a numeric quantity Regression tree: “decision tree” where each leaf predicts a numeric quantity n Predicted value is average value of training instances that reach the leaf Model tree: “regression tree” with linear regression models at the leaf nodes n Linear patches approximate continuous function

75 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 75 An example …………… 40FalseNormalMildRainy 55FalseHighHotOvercast 0TrueHighHotSunny 5FalseHighHotSunny Play-timeWindyHumidityTemperatureOutlook

76 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 76 Next lecture Relevant KDD concepts and methods for your projects

77 Berendt: Knowledge and the Web, 2014, http://www.cs.kuleuven.be/~berendt/teaching/ 77 References / background reading; acknowledgements n The slides are based on l Witten, I.H., & Frank, E.(2005). Data Mining. Practical Machine Learning Tools and Techniques with Java Implementations. 2nd ed. Morgan Kaufmann. http://www.cs.waikato.ac.nz/%7Eml/weka/book.html http://www.cs.waikato.ac.nz/%7Eml/weka/book.html l In particular, pp. 8-57 are based on the instructor slides for that book available at http://books.elsevier.com/companions/9780120884070/ http://books.elsevier.com/companions/9780120884070/ (chapters 1-4): http://books.elsevier.com/companions/9780120884070/revisionnotes/01~PDFs/chapter1.pdfhttp://books.elsevier.com/companions/9780120884070/revisionnotes/01~PDFs/chapter1.pdf (and...chapter2.pdf, chapter3.pdf, chapter4.pdf) or http://books.elsevier.com/companions/9780120884070/revisionnotes/02~ODP%20Files/chapter1.odp http://books.elsevier.com/companions/9780120884070/revisionnotes/02~ODP%20Files/chapter1.odp (and...chapter2.odp, chapter3.odp, chapter4.odp) n Scales (aka levels) of measurement are explained well here: http://en.wikipedia.org/wiki/Level_of_measurementhttp://en.wikipedia.org/wiki/Level_of_measurement [15 Nov 2014]


Download ppt "Berendt: Knowledge and the Web, 2014, 1 Knowledge and the Web – Inferring new knowledge from data(bases):"

Similar presentations


Ads by Google