Download presentation
Presentation is loading. Please wait.
1
COMP1942 Exploring and Visualizing Data Overview
Prepared by Raymond Wong Presented by Raymond Wong COMP1942
2
Course Details Instructor TA Dr. Raymond Wong Kai Ho CHAN Dandan LIN
Junqiu WEI COMP1942
3
Course Details Webpage COMP1942
4
Course Details Lecture
Time: Monday (1:30pm - 2:50pm) and Friday (9:00am - 10:20am) Venue: G010 (CYT Building) Tutorial Time: Monday (12:30pm-1:20pm) Venue: Room 5583 (LT 29-30) (Academic Building) or CSE Lab 3 (Rm 4213 (Academic Building)) Time: Tuesday (12:30pm-1:20pm) Venue: Rm 2302 (LT 17-18) (Academic Building) or CSE Lab 3 (Rm 4213 (Academic Building)) Tutorial will be announced via . COMP1942
5
Course Details Textbook
Data Mining for Business Intelligence: Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. Galit Shmueli, Nitin R. Patel and Peter C. Bruce, Wiley 2010 (2nd edition) COMP1942
6
Course Details Reference books/materials:
Data Mining: Concepts and Techniques. Jiawei Han, Micheline Kamber and Jian PEI. Morgan Kaufmann Publishers (3rd edition) Introduction to Data Mining. Pang-Ning Tan, Michael Steinbach, Vipin Kumar Boston : Pearson Addison Wesley (2006) COMP1942
7
Common Core Requirement
My ability to use quantitative methods to define, analyze and solve problems in daily life has been enhanced. I am more able to process quantitative data and to use the data to reach a conclusion in a logical way. The course has aroused my interest in learning more about mathematical models or quantitative methods. COMP1942
8
Course Details Grading Scheme: Assignment 10% Project 20%
In-class Participation 10% Mid-Term Exam 20% Final Exam 40% COMP1942
9
Assignment 2 assignments NOTE: No late submissions are allowed.
Content before the mid-term exam Assignment 2 Content after the mid-term exam NOTE: No late submissions are allowed. Assignment 10% Project 20% In-class Participation 10% Mid-Term Exam 20% Final Exam 40% COMP1942
10
Assignment If the students can answer the selected questions in class correctly, for each correct answer, I will give him/her a coupon This coupon can be used to waive one question in an assignment which means that s/he can get full marks for this question without answering this question COMP1942
11
Assignment Guideline For each assignment, each student can waive at most one question only. s/he can waive any question s/he wants and obtain full marks for this question (no matter whether s/he answer this question or not) s/he may also answer this question. But, we will also mark it but will give full marks to this question. When the student submits the assignment, please staple the coupon to the submitted assignment please write down the question no. s/he wants to waive on the coupon COMP1942
12
Project Phase 1 (Excel file) Phase 2 (Design Report)
Phase 3 (Final Report and Output files) Assignment 10% Project 20% In-class Participation 10% Mid-Term Exam 20% Final Exam 40% COMP1942
13
Project You are required to form a group.
Each group contains 1 or 2 members. 3-member group is NOT allowed. Please fill in the following information of each member in the link student ID student name One group needs to submit the grouping information ONCE. The group forming deadline is 15 Feb (Wed) 1pm. COMP1942
14
Project Data Mining Tool: XLMiner (in MS Excel)
Installed in CSE Lab 3 (Rm 4213) All non-CSE students and all non-CPEG students need to apply for the CSE account. You can see the details in our course webpage. COMP1942
15
Project In Phase 3 (the last phase), you are required to hand in some output files We will check the output files You can use at most one coupon to obtain full marks for all output files Each group can use at most one coupon Please staple your coupon with your final report. COMP1942
16
In-class Participation
In each lesson, you are required to bring one of the following with you. your smart phone installed with iPRS (Internet enabled Personal Response System) or your PRS device Assignment 10% Project 20% In-class Participation 10% Mid-Term Exam 20% Final Exam 40% COMP1942
17
In-class Participation
If you have a smart phone (Android/iOS), please install an app called “HKUST iLearn” in your smart phone (Android/iOS). If you do not have a smart phone, you have to borrow your PRS device. please visit ITSC Service Desk at Rm 2021 (Lift 2) to borrow your PRS device. COMP1942
18
In-class Participation
In each lesson, you may be asked about some multiple-choice questions (e.g., 1-3 questions) You have to use your iPRS to answer the questions COMP1942
19
In-class Participation
You can obtain 1 unit for in-class participation when you answer a question in class with your iPRS (no matter whether you answer it correctly or not) Those questions may be in the mid-term exam and the final exam. COMP1942
20
In-class Participation
In some cases, Some students may be absent for some reasons in class The iPRS system could not record your answer E.g., your smart device and the iPRS system crash You are required to obtain 20 units in order to obtain the full score (10%) for the in-class participation We will give at least 25 questions in the course. COMP1942
21
Midterm and Final Exam You are allowed to bring a calculator with you.
Please remember to prepare a calculator for the exam Assignment 10% Project 20% In-class Participation 10% Mid-Term Exam 20% Final Exam 40% COMP1942
22
Midterm Exam In-class Midterm Date: 17 March, 2017 (Fri)
Time: 9:00-10:20 Venue: G010 (CYT Building) Rm 5619 (LT 31/32) (Academic Building) COMP1942
23
Major Topics In this course, you are expected to learn something related to “Exploring and Visualizing Data”. Not only this! In this course, you are expected to learn how to solve problems and how to analyze problems. This is very important to your future. COMP1942
24
Major Topics Association Clustering Classification Data Warehouse
Dimension Reduction Web Databases COMP1942
25
1. Association Customer Apple Orange Milk Raymond Ada Grace …
We are interested in the items/itemsets with frequency >= 2 Items/Itemsets Frequency Apple 2 Orange 3 Milk 1 {Apple, Orange} {Orange, Milk} Frequent Pattern (or Frequent Item) Frequent Pattern (or Frequent Item) Frequent Pattern (or Frequent Itemset) COMP1942
26
1. Association Customer Apple Orange Milk Raymond Ada Grace …
We are interested in the items/itemsets with frequency >= 2 Association Rule: 1. Apple Orange ( customers who buy apple will probably buy orange.) 2. Orange Apple ( customer who buy orange will probably buy apple.) 100% Items/Itemsets Frequency Apple 2 Orange 3 Milk 1 {Apple, Orange} {Orange, Milk} 2 3 67% 2 2 Problem: to find all frequent patterns and association rules COMP1942
27
Major Topics Association Clustering Classification Data Warehouse
Dimension Reduction Web Databases COMP1942
28
2. Clustering Computer History Raymond 100 40 Louis 90 45 Wyman 20 95
Cluster 2 (e.g. High Score in History and Low Score in Computer) Computer History Computer History Raymond 100 40 Louis 90 45 Wyman 20 95 … Cluster 1 (e.g. High Score in Computer and Low Score in History) Problem: to find all clusters COMP1942
29
Major Topics Association Clustering Classification Data Warehouse
Dimension Reduction Web Databases COMP1942
30
3. Classification white high no ? root child=yes child=no Income=high
Suppose there is a person. Race Income Child Insurance white high no ? root child=yes child=no Income=high Income=low 100% Yes 0% No 0% Yes 100% No Decision tree COMP1942
31
Major Topics Association Clustering Classification Data Warehouse
Dimension Reduction Web Databases COMP1942
32
Need to wait for a long time (e.g., 1 day to 1 week)
4. Warehouse Query Users Databases Need to wait for a long time (e.g., 1 day to 1 week) Data Warehouse Users Databases Pre-computed results COMP1942
33
Major Topics Association Clustering Classification Data Warehouse
Dimension Reduction Web Databases COMP1942
34
Suppose we have the following data set
COMP1942 COMP1942 34
35
According to the data, we find the following vectors (marked in red)
COMP1942 COMP1942 35 e2
36
Consider that the data points are projected on e1
COMP1942 COMP1942 36
37
Suppose all data points are projected on vector e1
This corresponds to the information loss This corresponds to another information loss COMP1942 COMP1942 37 e2
38
After all data points are projected on vector e1
Thus, the total information loss is small. COMP1942 COMP1942 38 e2
39
We can use only 1 dimension to represent all data points (i. e
We can use only 1 dimension to represent all data points (i.e., vector e1) COMP1942
40
Major Topics Association Clustering Classification Data Warehouse
Dimension Reduction Web Databases COMP1942
41
6. Web Databases Raymond Wong COMP1942
42
How to rank the webpages?
COMP1942
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.