Download presentation
Presentation is loading. Please wait.
Published byRodger Webster Modified over 8 years ago
1
CSC 4740 / 6740 Fall 2016 Data Mining Instructor: Yubao Wu Fall 2016
2
Welcome! Instructor: Yubao Wu Office: 25 Park Place Suite 737 Phone: 404-413-6125 (office) E-mail: ywu@cs.gsu.edu Website: http://www.robwu.net/teaching Office Hours: 4:00 pm - 5:30 pm, Wednesday; 3:30 pm - 5:00 pm, Friday; or by appointment
3
Classroom and Date Classroom: Petit Science Center 230 Date/Time: Monday/Wednesday, 10:00 am - 11:45 am
4
Textbook Data Mining: Concepts and Techniques, Third Edition, by Jiawei Han, Micheline Kamber, and Jian Pei, Morgan Kaufmann Publishers, 2011. ISBN:978-0123814791
5
References Introduction to Data Mining, by Tan, Steinbach, and Kumar, Addison Wesley, 2006. (ISBN:0-321-32136-7) Principles of Data Mining, by Hand, Mannila, and Smyth, MIT Press, 2001. (ISBN:0-262-08290-X) The Elements of Statistical Learning --- Data Mining, Inference, and Prediction, by Hastie, Tibshirani, and Friedman, Springer, 2001. (ISBN:0-387-95284-5)
6
Course Content Basic data mining techniques association rules mining Sequential Patterns Classification and Prediction Clustering and Outlier Detection Regression Pattern Interestingness Dimensionality Reduction …… Big data mining applications Web data mining Bioinformatics Social networks Text mining Visualization Financial data analysis Software Engineering ……
7
Course Requirements Course Requirements: Basic theoretical principles Practical hands-on experience Prerequisite: CSC 3410 Data Structures Assignments Mid-term Exam Final Exam Research Project The department will strictly enforce all prerequisites. Students without proper prerequisites will be dropped from the class, without any prior notice, at any time during the semester.
8
Assignments and Exams Mid-Term Exam: Open Textbook Final Exam: Open Textbook The problems for CSC 4740 and CSC 6740 may be different.
9
Research Projects CSC 4740: CSC 6740: One or Two undergraduate students form a group. Each group does a project and submits one project report. Each graduate student does a project and submits one project report.
10
Research Projects discovers interesting relationships within a significant amount of data. Some project ideas (only examples, best to propose your own) Statistical Computing (Speed up traditional statistical methods, such as correlation computation). Data Mining in Business Applications (Customer Segmentation, Accounting, Marketing) Literature Survey Mining Biological Datasets Social Network Analysis Your own ideas
11
Research Projects Project proposal (2 - 4 pages, ACM SIGKDD or IEEE ICDM template) Title, project idea, survey of related work, data source, key algorithms/technology, and what you expect to submit at the end of the semester. Final report (6 - 12 pages, ACM SIGKDD or IEEE ICDM template) A comprehensive description of your project. project idea, extended survey of related work, detailed algorithm/technology, specific implementation, key results what worked, what did not work, what surprised you, and why
12
Research Projects CSC 4740: CSC 6740: Project Proposal Final Report Software, user manual, and sample dataset Project Proposal Final Report Software, user manual, and sample dataset Slides
13
Research Projects Final presentation In the last a few classes, each graduate student presents his/her project to the rest of the class. About 15 minute presentation + 2 minute questions Checkpoints Proposal (due Sep 21): ~ 1 month Final Report (due Dec 5): ~ 2 months
14
Class Policy: Attendance: Students are required to attend all classes. Academic honesty: Plagiarism will result in a score of zero on the test or project. The instructor has the right to make a decision. Assignments and Projects: They must be handed in on time and will not be accepted when past due. Withdrawals: Oct 11 Tuesday is the last day to withdraw and possibly receive a W. Make-ups: need the instructor's special permission.
15
Grading Policy: CSC 4740CSC 6740 Mid-term Exam 25%20% Final Exam 25%20% Assignments 30%25% Project 15%30% Attendance 5% A+ [97, 100]A [93, 97)A- [90, 93) B+ [87, 90)B [83, 87)B- [80, 83) C+ [77, 80)C [73, 77)C- [70, 73) D [60, 70)F [0, 60) If one student’s score is no less than 97, an A+ will be given. The scores may be adjusted if the average is low.
16
Tentative Course Outline and Schedule: Chapter 1 IntroductionAug. 22 Chapter 2 Getting to Know Your Data Chapter 3 Data Preprocessing Aug. 24 Chapter 6 Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods Aug. 29, 31, Sep. 7 Chapter 8 Classification: Basic ConceptsSep. 12 Chapter 9 Classification: Advanced MethodsSep. 14, 19, 21 Project Proposal Due6 pm eastern time, Sep. 21
17
Tentative Course Outline and Schedule: Chapter 10 Cluster Analysis: Basic Concepts and Methods Sep. 26, 28, Oct. 5, 10 Mid-term ExamOct. 3 Chapter 11 Advanced Cluster AnalysisOct. 12, 17, 19, 24 Chapter 13 Data Mining Trends and Research Frontiers Oct. 26, 31, Nov. 2, 7, 9, 14 Project PresentationsNov. 16, 28, 30 Final ExamDec. 5 Research Project Due6 pm eastern time, Dec. 8
18
KDD References Data mining and KDD Conferences: ACM-SIGKDD, IEEE-ICDM, SIAM-DM, PKDD, PAKDD, etc. Journal: ACM-KDD, Data Mining and Knowledge Discovery, KDD Explorations Database systems Conferences: ACM-SIGMOD, ACM-PODS, VLDB, IEEE-ICDE, EDBT, ICDT, DASFAA Journals: ACM-TODS, IEEE-TKDE, JIIS, J. ACM, etc. AI & Machine Learning Conferences: Machine learning (ICML), AAAI, IJCAI, COLT (Learning Theory), etc. Journals: Machine Learning, Artificial Intelligence, etc.
19
KDD References Statistics Conferences: Joint Stat. Meeting, etc. Journals: Annals of statistics, etc. Bioinformatics Conferences: ISMB, RECOMB, PSB, CSB, BIBE, etc. Journals: J. of Computational Biology, Bioinformatics, PLoS Computational Biology, etc.
20
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.