Presentation is loading. Please wait.

Presentation is loading. Please wait.

COMP1942 Exploring and Visualizing Data Overview

Similar presentations


Presentation on theme: "COMP1942 Exploring and Visualizing Data Overview"— Presentation transcript:

1 COMP1942 Exploring and Visualizing Data Overview
Prepared by Raymond Wong Presented by Raymond Wong COMP1942

2 Course Details Instructor TA Dr. Raymond Wong Kai Ho CHAN Dandan LIN
Junqiu WEI COMP1942

3 Course Details Webpage COMP1942

4 Course Details Lecture
Time: Monday (1:30pm - 2:50pm) and Friday (9:00am - 10:20am) Venue: G010 (CYT Building) Tutorial Time: Monday (12:30pm-1:20pm) Venue: Room 5583 (LT 29-30) (Academic Building) or CSE Lab 3 (Rm 4213 (Academic Building)) Time: Tuesday (12:30pm-1:20pm) Venue: Rm 2302 (LT 17-18) (Academic Building) or CSE Lab 3 (Rm 4213 (Academic Building)) Tutorial will be announced via . COMP1942

5 Course Details Textbook
Data Mining for Business Intelligence: Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. Galit Shmueli, Nitin R. Patel and Peter C. Bruce, Wiley 2010 (2nd edition) COMP1942

6 Course Details Reference books/materials:
Data Mining: Concepts and Techniques. Jiawei Han, Micheline Kamber and Jian PEI. Morgan Kaufmann Publishers (3rd edition) Introduction to Data Mining. Pang-Ning Tan, Michael Steinbach, Vipin Kumar Boston : Pearson Addison Wesley (2006) COMP1942

7 Common Core Requirement
My ability to use quantitative methods to define, analyze and solve problems in daily life has been enhanced. I am more able to process quantitative data and to use the data to reach a conclusion in a logical way. The course has aroused my interest in learning more about mathematical models or quantitative methods. COMP1942

8 Course Details Grading Scheme: Assignment 10% Project 20%
In-class Participation 10% Mid-Term Exam 20% Final Exam 40% COMP1942

9 Assignment 2 assignments NOTE: No late submissions are allowed.
Content before the mid-term exam Assignment 2 Content after the mid-term exam NOTE: No late submissions are allowed. Assignment 10% Project 20% In-class Participation 10% Mid-Term Exam 20% Final Exam 40% COMP1942

10 Assignment If the students can answer the selected questions in class correctly, for each correct answer, I will give him/her a coupon This coupon can be used to waive one question in an assignment which means that s/he can get full marks for this question without answering this question COMP1942

11 Assignment Guideline For each assignment, each student can waive at most one question only. s/he can waive any question s/he wants and obtain full marks for this question (no matter whether s/he answer this question or not) s/he may also answer this question. But, we will also mark it but will give full marks to this question. When the student submits the assignment, please staple the coupon to the submitted assignment please write down the question no. s/he wants to waive on the coupon COMP1942

12 Project Phase 1 (Excel file) Phase 2 (Design Report)
Phase 3 (Final Report and Output files) Assignment 10% Project 20% In-class Participation 10% Mid-Term Exam 20% Final Exam 40% COMP1942

13 Project You are required to form a group.
Each group contains 1 or 2 members. 3-member group is NOT allowed. Please fill in the following information of each member in the link student ID student name One group needs to submit the grouping information ONCE. The group forming deadline is 15 Feb (Wed) 1pm. COMP1942

14 Project Data Mining Tool: XLMiner (in MS Excel)
Installed in CSE Lab 3 (Rm 4213) All non-CSE students and all non-CPEG students need to apply for the CSE account. You can see the details in our course webpage. COMP1942

15 Project In Phase 3 (the last phase), you are required to hand in some output files We will check the output files You can use at most one coupon to obtain full marks for all output files Each group can use at most one coupon Please staple your coupon with your final report. COMP1942

16 In-class Participation
In each lesson, you are required to bring one of the following with you. your smart phone installed with iPRS (Internet enabled Personal Response System) or your PRS device Assignment 10% Project 20% In-class Participation 10% Mid-Term Exam 20% Final Exam 40% COMP1942

17 In-class Participation
If you have a smart phone (Android/iOS), please install an app called “HKUST iLearn” in your smart phone (Android/iOS). If you do not have a smart phone, you have to borrow your PRS device. please visit ITSC Service Desk at Rm 2021 (Lift 2) to borrow your PRS device. COMP1942

18 In-class Participation
In each lesson, you may be asked about some multiple-choice questions (e.g., 1-3 questions) You have to use your iPRS to answer the questions COMP1942

19 In-class Participation
You can obtain 1 unit for in-class participation when you answer a question in class with your iPRS (no matter whether you answer it correctly or not) Those questions may be in the mid-term exam and the final exam. COMP1942

20 In-class Participation
In some cases, Some students may be absent for some reasons in class The iPRS system could not record your answer E.g., your smart device and the iPRS system crash You are required to obtain 20 units in order to obtain the full score (10%) for the in-class participation We will give at least 25 questions in the course. COMP1942

21 Midterm and Final Exam You are allowed to bring a calculator with you.
Please remember to prepare a calculator for the exam Assignment 10% Project 20% In-class Participation 10% Mid-Term Exam 20% Final Exam 40% COMP1942

22 Midterm Exam In-class Midterm Date: 17 March, 2017 (Fri)
Time: 9:00-10:20 Venue: G010 (CYT Building) Rm 5619 (LT 31/32) (Academic Building) COMP1942

23 Major Topics In this course, you are expected to learn something related to “Exploring and Visualizing Data”. Not only this! In this course, you are expected to learn how to solve problems and how to analyze problems. This is very important to your future. COMP1942

24 Major Topics Association Clustering Classification Data Warehouse
Dimension Reduction Web Databases COMP1942

25 1. Association Customer Apple Orange Milk Raymond Ada Grace …
We are interested in the items/itemsets with frequency >= 2 Items/Itemsets Frequency Apple 2 Orange 3 Milk 1 {Apple, Orange} {Orange, Milk} Frequent Pattern (or Frequent Item) Frequent Pattern (or Frequent Item) Frequent Pattern (or Frequent Itemset) COMP1942

26 1. Association Customer Apple Orange Milk Raymond Ada Grace …
We are interested in the items/itemsets with frequency >= 2 Association Rule: 1. Apple  Orange ( customers who buy apple will probably buy orange.) 2. Orange  Apple ( customer who buy orange will probably buy apple.) 100% Items/Itemsets Frequency Apple 2 Orange 3 Milk 1 {Apple, Orange} {Orange, Milk} 2 3 67% 2 2 Problem: to find all frequent patterns and association rules COMP1942

27 Major Topics Association Clustering Classification Data Warehouse
Dimension Reduction Web Databases COMP1942

28 2. Clustering Computer History Raymond 100 40 Louis 90 45 Wyman 20 95
Cluster 2 (e.g. High Score in History and Low Score in Computer) Computer History Computer History Raymond 100 40 Louis 90 45 Wyman 20 95 Cluster 1 (e.g. High Score in Computer and Low Score in History) Problem: to find all clusters COMP1942

29 Major Topics Association Clustering Classification Data Warehouse
Dimension Reduction Web Databases COMP1942

30 3. Classification white high no ? root child=yes child=no Income=high
Suppose there is a person. Race Income Child Insurance white high no ? root child=yes child=no Income=high Income=low 100% Yes 0% No 0% Yes 100% No Decision tree COMP1942

31 Major Topics Association Clustering Classification Data Warehouse
Dimension Reduction Web Databases COMP1942

32 Need to wait for a long time (e.g., 1 day to 1 week)
4. Warehouse Query Users Databases Need to wait for a long time (e.g., 1 day to 1 week) Data Warehouse Users Databases Pre-computed results COMP1942

33 Major Topics Association Clustering Classification Data Warehouse
Dimension Reduction Web Databases COMP1942

34 Suppose we have the following data set
COMP1942 COMP1942 34

35 According to the data, we find the following vectors (marked in red)
COMP1942 COMP1942 35 e2

36 Consider that the data points are projected on e1
COMP1942 COMP1942 36

37 Suppose all data points are projected on vector e1
This corresponds to the information loss This corresponds to another information loss COMP1942 COMP1942 37 e2

38 After all data points are projected on vector e1
Thus, the total information loss is small. COMP1942 COMP1942 38 e2

39 We can use only 1 dimension to represent all data points (i. e
We can use only 1 dimension to represent all data points (i.e., vector e1) COMP1942

40 Major Topics Association Clustering Classification Data Warehouse
Dimension Reduction Web Databases COMP1942

41 6. Web Databases Raymond Wong COMP1942

42 How to rank the webpages?
COMP1942


Download ppt "COMP1942 Exploring and Visualizing Data Overview"

Similar presentations


Ads by Google