COMP1942 Exploring and Visualizing Data Overview

Slides:



Advertisements
Similar presentations
CSc 2310 Principles of Programming (Java)
Advertisements

2015/6/1Course Introduction1 Welcome! MSCIT 521: Knowledge Discovery and Data Mining Qiang Yang Hong Kong University of Science and Technology
1 Course Information Parallel Computing Spring 2010.
IT 240 Intro to Desktop Databases Introduction. About this course Design a database: Entity Relation (ER) modeling and normalization techniques Create.
1 Data Mining Techniques Instructor: Ruoming Jin Fall 2006.
Introduction to Project Management
Cpt S 471/571: Computational Genomics Spring 2015, 3 cr. Where: Sloan 9 When: M WF 11:10-12:00 Instructor weekly office hour for Spring 2015: Tuesdays.
SYSC System Analysis and Design 1 Part I – Introduction.
Math 125 Statistics. About me  Nedjla Ougouag, PhD  Office: Room 702H  Ph: (312)   Homepage:
COMP Introduction to Programming Yi Hong May 13, 2015.
CSc 2310 Principles of Programming (Java) Dr. Xiaolin Hu.
Course Title Database Technologies Instructor: Dr ALI DAUD Course Credits: 3 with Lab Total Hours: 45 approximately.
Course Introduction Software Engineering
CST 229 Introduction to Grammars Dr. Sherry Yang Room 213 (503)
WXGE 6103 Digital Image Processing Semester 2, Session 2013/2014.
1 [CMP001 Computer Orientation I] Course Guide Ms. Wesal Abdalfattah office#: 357 Ext#: 8612 Prince Sultan University,
Data Warehousing/Mining 1 Data Warehousing/Mining Comp 150DW Course Overview Instructor: Dan Hebert.
Database Design CS562 Fall CS562 Database Design Instructor : Professor Chin-Wan Chung Office : Rm 3406 Tel : 3537
IT Semester 1 Course Introduction IT Portfolio IT 1140 ( Semester 1) Hong Kong Baptist University IT 1140 – IT Portfolio – Course Introduction.
CSE 2337 Introduction to Data Management Introduction.
HOW I SURVIVED A SCHULTZ COURSE AND LEARNED TO LOVE INFORMATION SYSTEMS Fall 2014 Edition.
Introduction to ECE 2401 Data Structure Fall 2005 Chapter 0 Chen, Chang-Sheng
Open Systems and Electronic Commerce
June 19, Liang-Jun Zhang MTWRF 9:45-11:15 am Sitterson Hall 011 Comp 110 Introduction to Programming.
COMP53311 Knowledge Discovery in Databases Overview Prepared by Raymond Wong Presented by Raymond Wong
CSE3330/5330 DATABASE SYSTEMS AND FILE STRUCTURES (DB I) CSE3330/5330 DB I, Summer2012 Department of Computer Science and Engineering, University of Texas.
1 IMM472 資料探勘 陳春賢. 2 Lecture I Class Introduction.
Computer Networks CNT5106C
Course Information CSE 2031 Fall Instructor U. T. Nguyen /new-yen/ Office: CSEB Office hours:  Tuesday,
Course Information CSE 2031 Fall Instructor U.T. Nguyen Office: CSE Home page:
DATA MINING: LECTURE 1 By Dr. Hammad A. Qureshi Introduction to the Course and the Field There is an inherent meaning in everything. “Signs for people.
1 SBM411 資料探勘 陳春賢. 2 Lecture I Class Introduction.
DATABASE SYSTEM COURSE SYLLABUS Ghulam Imaduddin Informatics Engineering Muhammadiyah Jakarta University Database System by Ghulam I1.
CAP 107 Introduction To Computers. Welcome to CAP 107 Lecturer: Reham Al-Abdul Jabbar
1 COMP2121 Discrete Mathematics Introduction Hubert Chan [O1 Abstract Concepts] [O2 Proof Techniques] [O3 Basic Analysis Techniques]
Course Information EECS 2031 Fall Instructor Uyen Trang (U.T.) Nguyen Office: LAS Office hours: 
Computer Network Fundamentals CNT4007C
Course Overview - Database Systems
Course Information EECS 2031 – Section A Fall 2017.
Computer Engineering Department Islamic University of Gaza
CSE 102/ISE 102 Introduction to Web Design and Programming
ENCM 369 Computer Organization
CSc 1302 Principles of Computer Science II
Computer Networks CNT5106C
Database Managment System
Syed Sohail Ahmed Assistant Professor, UET Taxila
It’s called “wifi”! Source: Somewhere on the Internet!
CENG 213 Data Structures Dr. Cevat Şener
Classification 3 (Nearest Neighbor Classifier)
Welcome to CBUS214 Course.
COMP1942 Classification: More Concept Prepared by Raymond Wong
Welcome to GC311 Database Concepts
Course Overview CSE8313 Object-Oriented Analysis and Design
CSC 361 Artificial Intelligence
7th Grade Mathematics Overview Materials Classroom Policies
Data Mining: Concepts and Techniques Course Outline
Welcome to GC311 Database Concepts
Mrs. Atkinson 6th grade Math and Science
ENG3380 Computer Organization
Welcome to Physics 1D03.
Cpt S 471/571: Computational Genomics
Business Presentations
Lecture1: Introduction to IT322 Software Engineering I
COMP5331 Advanced Topics Prepared by Raymond Wong
Course Information EECS 2031 Fall 2016.
Computer Engineering Department Islamic University of Gaza
Information Retrieval and Data Mining (AT71. 07) Comp. Sc. and Inf
Presentation transcript:

COMP1942 Exploring and Visualizing Data Overview Prepared by Raymond Wong Presented by Raymond Wong raywong@cse COMP1942

Course Details Instructor TA Dr. Raymond Wong Kai Ho CHAN Dandan LIN Junqiu WEI COMP1942

Course Details Webpage http://course.cse.ust.hk/comp1942/ COMP1942

Course Details Lecture Time: Monday (1:30pm - 2:50pm) and Friday (9:00am - 10:20am) Venue: G010 (CYT Building) Tutorial Time: Monday (12:30pm-1:20pm) Venue: Room 5583 (LT 29-30) (Academic Building) or CSE Lab 3 (Rm 4213 (Academic Building)) Time: Tuesday (12:30pm-1:20pm) Venue: Rm 2302 (LT 17-18) (Academic Building) or CSE Lab 3 (Rm 4213 (Academic Building)) Tutorial will be announced via email. COMP1942

Course Details Textbook Data Mining for Business Intelligence: Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. Galit Shmueli, Nitin R. Patel and Peter C. Bruce, Wiley 2010 (2nd edition) COMP1942

Course Details Reference books/materials: Data Mining: Concepts and Techniques. Jiawei Han, Micheline Kamber and Jian PEI. Morgan Kaufmann Publishers (3rd edition) Introduction to Data Mining. Pang-Ning Tan, Michael Steinbach, Vipin Kumar Boston : Pearson Addison Wesley (2006) COMP1942

Common Core Requirement My ability to use quantitative methods to define, analyze and solve problems in daily life has been enhanced. I am more able to process quantitative data and to use the data to reach a conclusion in a logical way. The course has aroused my interest in learning more about mathematical models or quantitative methods. COMP1942

Course Details Grading Scheme: Assignment 10% Project 20% In-class Participation 10% Mid-Term Exam 20% Final Exam 40% COMP1942

Assignment 2 assignments NOTE: No late submissions are allowed. Content before the mid-term exam Assignment 2 Content after the mid-term exam NOTE: No late submissions are allowed. Assignment 10% Project 20% In-class Participation 10% Mid-Term Exam 20% Final Exam 40% COMP1942

Assignment If the students can answer the selected questions in class correctly, for each correct answer, I will give him/her a coupon This coupon can be used to waive one question in an assignment which means that s/he can get full marks for this question without answering this question COMP1942

Assignment Guideline For each assignment, each student can waive at most one question only. s/he can waive any question s/he wants and obtain full marks for this question (no matter whether s/he answer this question or not) s/he may also answer this question. But, we will also mark it but will give full marks to this question. When the student submits the assignment, please staple the coupon to the submitted assignment please write down the question no. s/he wants to waive on the coupon COMP1942

Project Phase 1 (Excel file) Phase 2 (Design Report) Phase 3 (Final Report and Output files) Assignment 10% Project 20% In-class Participation 10% Mid-Term Exam 20% Final Exam 40% COMP1942

Project You are required to form a group. Each group contains 1 or 2 members. 3-member group is NOT allowed. Please fill in the following information of each member in the link https://goo.gl/forms/WguJdkHO8TkpOFYn1 student ID student name Email One group needs to submit the grouping information ONCE. The group forming deadline is 15 Feb (Wed) 1pm. COMP1942

Project Data Mining Tool: XLMiner (in MS Excel) Installed in CSE Lab 3 (Rm 4213) All non-CSE students and all non-CPEG students need to apply for the CSE account. You can see the details in our course webpage. COMP1942

Project In Phase 3 (the last phase), you are required to hand in some output files We will check the output files You can use at most one coupon to obtain full marks for all output files Each group can use at most one coupon Please staple your coupon with your final report. COMP1942

In-class Participation In each lesson, you are required to bring one of the following with you. your smart phone installed with iPRS (Internet enabled Personal Response System) or your PRS device Assignment 10% Project 20% In-class Participation 10% Mid-Term Exam 20% Final Exam 40% COMP1942

In-class Participation If you have a smart phone (Android/iOS), please install an app called “HKUST iLearn” in your smart phone (Android/iOS). If you do not have a smart phone, you have to borrow your PRS device. please visit ITSC Service Desk at Rm 2021 (Lift 2) to borrow your PRS device. COMP1942

In-class Participation In each lesson, you may be asked about some multiple-choice questions (e.g., 1-3 questions) You have to use your iPRS to answer the questions COMP1942

In-class Participation You can obtain 1 unit for in-class participation when you answer a question in class with your iPRS (no matter whether you answer it correctly or not) Those questions may be in the mid-term exam and the final exam. COMP1942

In-class Participation In some cases, Some students may be absent for some reasons in class The iPRS system could not record your answer E.g., your smart device and the iPRS system crash You are required to obtain 20 units in order to obtain the full score (10%) for the in-class participation We will give at least 25 questions in the course. COMP1942

Midterm and Final Exam You are allowed to bring a calculator with you. Please remember to prepare a calculator for the exam Assignment 10% Project 20% In-class Participation 10% Mid-Term Exam 20% Final Exam 40% COMP1942

Midterm Exam In-class Midterm Date: 17 March, 2017 (Fri) Time: 9:00-10:20 Venue: G010 (CYT Building) Rm 5619 (LT 31/32) (Academic Building) COMP1942

Major Topics In this course, you are expected to learn something related to “Exploring and Visualizing Data”. Not only this! In this course, you are expected to learn how to solve problems and how to analyze problems. This is very important to your future. COMP1942

Major Topics Association Clustering Classification Data Warehouse Dimension Reduction Web Databases COMP1942

1. Association Customer Apple Orange Milk Raymond Ada Grace … We are interested in the items/itemsets with frequency >= 2 Items/Itemsets Frequency Apple 2 Orange 3 Milk 1 {Apple, Orange} {Orange, Milk} Frequent Pattern (or Frequent Item) Frequent Pattern (or Frequent Item) Frequent Pattern (or Frequent Itemset) COMP1942

1. Association Customer Apple Orange Milk Raymond Ada Grace … We are interested in the items/itemsets with frequency >= 2 Association Rule: 1. Apple  Orange ( customers who buy apple will probably buy orange.) 2. Orange  Apple ( customer who buy orange will probably buy apple.) 100% Items/Itemsets Frequency Apple 2 Orange 3 Milk 1 {Apple, Orange} {Orange, Milk} 2 3 67% 2 2 Problem: to find all frequent patterns and association rules COMP1942

Major Topics Association Clustering Classification Data Warehouse Dimension Reduction Web Databases COMP1942

2. Clustering Computer History Raymond 100 40 Louis 90 45 Wyman 20 95 Cluster 2 (e.g. High Score in History and Low Score in Computer) Computer History Computer History Raymond 100 40 Louis 90 45 Wyman 20 95 … Cluster 1 (e.g. High Score in Computer and Low Score in History) Problem: to find all clusters COMP1942

Major Topics Association Clustering Classification Data Warehouse Dimension Reduction Web Databases COMP1942

3. Classification white high no ? root child=yes child=no Income=high Suppose there is a person. Race Income Child Insurance white high no ? root child=yes child=no Income=high Income=low 100% Yes 0% No 0% Yes 100% No Decision tree COMP1942

Major Topics Association Clustering Classification Data Warehouse Dimension Reduction Web Databases COMP1942

Need to wait for a long time (e.g., 1 day to 1 week) 4. Warehouse Query Users Databases Need to wait for a long time (e.g., 1 day to 1 week) Data Warehouse Users Databases Pre-computed results COMP1942

Major Topics Association Clustering Classification Data Warehouse Dimension Reduction Web Databases COMP1942

Suppose we have the following data set COMP1942 COMP1942 34

According to the data, we find the following vectors (marked in red) COMP1942 COMP1942 35 e2

Consider that the data points are projected on e1 COMP1942 COMP1942 36

Suppose all data points are projected on vector e1 This corresponds to the information loss This corresponds to another information loss COMP1942 COMP1942 37 e2

After all data points are projected on vector e1 Thus, the total information loss is small. COMP1942 COMP1942 38 e2

We can use only 1 dimension to represent all data points (i. e We can use only 1 dimension to represent all data points (i.e., vector e1) COMP1942

Major Topics Association Clustering Classification Data Warehouse Dimension Reduction Web Databases COMP1942

6. Web Databases Raymond Wong COMP1942

How to rank the webpages? COMP1942