Presentation is loading. Please wait.

Presentation is loading. Please wait.

COMP53311 Knowledge Discovery in Databases Overview Prepared by Raymond Wong Presented by Raymond Wong

Similar presentations


Presentation on theme: "COMP53311 Knowledge Discovery in Databases Overview Prepared by Raymond Wong Presented by Raymond Wong"— Presentation transcript:

1 COMP53311 Knowledge Discovery in Databases Overview Prepared by Raymond Wong Presented by Raymond Wong raywong@cse

2 COMP53312 Course Details Reference books/materials: Papers Data Mining: Concepts and Techniques. Jiawei Han and Micheline Kamber. Morgan Kaufmann Publishers (3 rd edition) Introduction to Data Mining. Pang-Ning Tan, Michael Steinbach, Vipin Kumar Boston : Pearson Addison Wesley (2006)

3 COMP53313 Area DB or AI This course can count towards one of the areas ONLY and cannot be double counted towards the required credits

4 COMP53314 Course Details Grading Scheme: Assignment 30% Project 30% Final Exam 40%

5 COMP53315 Assignment If the students can answer the selected questions in class correctly, for each corrected answer, I will give him/her a coupon This coupon can be used to waive one question in an assignment which means that s/he can get full marks for this question without answering this question

6 COMP53316 Assignment Guideline For each assignment, each student can waive at most one question only. s/he can waive any question he wants and obtain full marks for this question (no matter whether s/he answer this question or not) s/he may also answer this question. But, we will also mark it but will give full marks to this question. When the student submits the assignment, please staple the coupon to the submitted assignment please write down the question no. s/he wants to waive on the coupon

7 COMP53317 Project Each project is completed by a group. The number of students in a group depends on the class size. The duration of each presentation depends on the class size. It will be announced soon.

8 COMP53318 Project Project Type (One of the following) Survey Implementation-oriented Project Research-oriented Project Your group only needs to read about 2~5 papers Your group only needs to read about 1~2 papers You can read some papers and conduct research

9 COMP53319 Project Project Type (One of the following) Survey Implementation-oriented Project Research-oriented Project 1.Proposal 2.Presentation 3.Final report 1.Proposal 2.Presentation 3.Final report 4.Coding 1.Proposal 2.Presentation 3.Final report (containing your proposed methodology) 4.Coding (if any) Full Score = 80% Full Score = 90% Full Score = 100%

10 COMP533110 Project Project Topic Some pre-selected topics/papers Your own choice For fairness, please do not choose the topic which is closely related to your own research

11 COMP533111 Exam You are allowed to bring a calculator with you. Please remember to prepare a calculator for the exam

12 COMP533112 Major Topics 1.Association 2.Clustering 3.Classification 4.Data Warehouse 5.Data Mining over Data Streams 6.Web Databases 7.Multi-criteria Decision Making

13 COMP533113 1. Association CustomerAppleOrangeMilk RaymondAppleOrange AdaOrangeMilk GraceAppleOrange ………… Items/ItemsetsFrequency Apple2 Orange3 Milk1 {Apple, Orange}2 {Orange, Milk}1 We are interested in the items/itemsets with frequency >= 2 Frequent Pattern (or Frequent Item) Frequent Pattern (or Frequent Item) Frequent Pattern (or Frequent Itemset)

14 COMP533114 1. Association CustomerAppleOrangeMilk RaymondAppleOrange AdaOrangeMilk GraceAppleOrange ………… Items/ItemsetsFrequency Apple2 Orange3 Milk1 {Apple, Orange}2 {Orange, Milk}1 We are interested in the items/itemsets with frequency >= 2 Association Rule: 1. Apple  Orange ( customers who buy apple will probably buy orange.) 2. Orange  Apple ( customer who buy orange will probably buy apple.) 100% 2 2 67% 3 2 Problem: to find all frequent patterns and association rules

15 COMP533115 Major Topics 1.Association 2.Clustering 3.Classification 4.Data Warehouse 5.Data Mining over Data Streams 6.Web Databases 7.Multi-criteria Decision Making

16 COMP533116 2. Clustering ComputerHistory Raymond 10040 Louis 9045 Wyman 2095 ……… Computer History Cluster 1 (e.g. High Score in Computer and Low Score in History) Cluster 2 (e.g. High Score in History and Low Score in Computer) Problem: to find all clusters

17 COMP533117 Major Topics 1.Association 2.Clustering 3.Classification 4.Data Warehouse 5.Data Mining over Data Streams 6.Web Databases 7.Multi-criteria Decision Making

18 COMP533118 3. Classification root child=yeschild=no Income=high Income=low 100% Yes 0% No 100% Yes 0% No 0% Yes 100% No Decision tree RaceIncomeChildInsurance whitehighno? Suppose there is a person.

19 COMP533119 Major Topics 1.Association 2.Clustering 3.Classification 4.Data Warehouse 5.Data Mining over Data Streams 6.Web Databases 7.Multi-criteria Decision Making

20 COMP533120 4. Warehouse Databases Users Databases Users Data Warehouse Need to wait for a long time (e.g., 1 day to 1 week) Pre-computed results Query

21 COMP533121 Major Topics 1.Association 2.Clustering 3.Classification 4.Data Warehouse 5.Data Mining over Data Streams 6.Web Databases 7.Multi-criteria Decision Making

22 COMP533122 5. Data Mining over Static Data 1.Association 2.Clustering 3.Classification Static Data Output (Data Mining Results)

23 COMP533123 5. Data Mining over Data Streams 1.Association 2.Clustering 3.Classification Output (Data Mining Results) … Unbounded Data Real-time Processing

24 COMP533124 Major Topics 1.Association 2.Clustering 3.Classification 4.Data Warehouse 5.Data Mining over Data Streams 6.Web Databases 7.Multi-criteria Decision Making

25 COMP533125 6. Web Databases Raymond Wong

26 COMP533126 How to rank the webpages?

27 COMP533127 Major Topics 1.Association 2.Clustering 3.Classification 4.Data Warehouse 5.Data Mining over Data Streams 6.Web Databases 7.Multi-criteria Decision Making

28 COMP533128 7. Multi-criteria Decision Making HotelPriceDistance to beach (km) a10004 b24005 c30001 3 hotels Suppose we want to look for a hotel which is close to a beach. We have two attributes. Which hotel should we select? Suppose we compare hotel a and hotel b We know that hotel a is “ better ” than hotel b because 1.Price of hotel a is smaller 2.Distance of hotel a is smaller

29 COMP533129 7. Multi-criteria Decision Making HotelPriceDistance to beach (km) a10004 b24005 c30001 3 hotels Suppose we want to look for a hotel which is close to a beach. We have two attributes. Which hotel should we select? Suppose we compare hotel a and hotel c We cannot determine hotel a is “ better ” than hotel c (wrt two attributes). We cannot determine hotel c is “ better ” than hotel a (wrt two attributes).. This is because 1.Price of hotel a is smaller 2.Distance of hotel c is smaller


Download ppt "COMP53311 Knowledge Discovery in Databases Overview Prepared by Raymond Wong Presented by Raymond Wong"

Similar presentations


Ads by Google