CS 685G – Spring 2017 Special Topics in Data mining

Slides:



Advertisements
Similar presentations
CS583 – Data Mining and Text Mining
Advertisements

CS583 – Data Mining and Text Mining
SAK 5609 DATA MINING Prof. Madya Dr. Md. Nasir bin Sulaiman
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
1 Data Mining Techniques Instructor: Ruoming Jin Fall 2006.
Data Mining – Intro.
CS 5941 CS583 – Data Mining and Text Mining Course Web Page 05/cs583.html.
Data Mining Mohammed J. Zaki.
Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining Techniques
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Spatial Statistics and Spatial Knowledge Discovery First law of geography [Tobler]: Everything is related to everything, but nearby things are more related.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2.
Chapter 1 Introduction to Data Mining
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
CS525 DATA MINING COURSE INTRODUCTION YÜCEL SAYGIN SABANCI UNIVERSITY.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Data Mining By Dave Maung.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
9/03 Data Mining – Introduction G Dong (WSU)1 CS499/ Data Mining Fall 2003 Professor Guozhu Dong Computer Science & Engineering WSU.
Data Mining and Decision Support
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Instructor: Dr. Jinze Liu CS 485G – Spring 2016 Special Topics in Data mining.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
DATA MINING: LECTURE 1 By Dr. Hammad A. Qureshi Introduction to the Course and the Field There is an inherent meaning in everything. “Signs for people.
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
CSC 4740 / 6740 Fall 2016 Data Mining Instructor: Yubao Wu Fall 2016.
Book web site:
CS583 – Data Mining and Text Mining
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 1 —
Machine Learning with Spark MLlib
Data Mining – Intro.
Knowledge Discovery State of the Art
MIS2502: Data Analytics Advanced Analytics - Introduction
DATA MINING © Prentice Hall.
CS583 – Data Mining and Text Mining
Dr. Chengwei Lei CEECS California State University, Bakersfield
Introduction to Data Mining
CS583 – Data Mining and Text Mining
Introduction C.Eng 714 Spring 2010.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Data Mining: Concepts and Techniques Course Outline
CS583 – Data Mining and Text Mining
CS7280: Special Topics in Data Mining Information/Social Networks
Sangeeta Devadiga CS 157B, Spring 2007
CSE591: Data Mining by H. Liu
Data Warehousing and Data Mining
Data Mining: Concepts and Techniques
CS583 – Data Mining and Text Mining
Data Mining: Concepts and Techniques
Course Introduction CSC 576: Data Mining.
CS583 – Data Mining and Text Mining
MA171 Introduction to Probability and Statistics
Data Warehousing Data Mining Privacy
Data Mining: Concepts and Techniques
Data Mining.
©Jiawei Han and Micheline Kamber
CSCE 4143 Section 001: Data Mining Spring 2019.
CS583 – Data Mining and Text Mining
CSE591: Data Mining by H. Liu
Presentation transcript:

CS 685G – Spring 2017 Special Topics in Data mining Instructor: Dr. Jinze Liu

Welcome! Instructor: Jinze Liu Homepage: http://www.cs.uky.edu/~liuj Office: 235 Hardymon Building Email: liuj@cs.uky.edu

Overview Time: TR 9:30am - 10:45am Office hour: By Appointment Credit: 3 Preferred Prerequisite: At least one of the following: Data structure, Algorithms, Database, Statistics.

Overview Textbook: Other References Data Mining and Analysis: http://www.dataminingbook.info/ Other References Mining of Massive Datasets. Can be accessed for free at http://infolab.stanford.edu/~ullman/mmds/book.pdf Data Mining --- Concepts and techniques, by Han and Kamber, Morgan Kaufmann. (ISBN:1-55860-901-6) Principles of Data Mining, by Hand, Mannila, and Smyth, MIT Press. (ISBN:0-262-08290-X)

Overview Grading scheme 3 Homeworks 30% 1 Exam 20% 1 Presentation 1 Project

Data + Mining Data: Plural of Datum Information, especially in a scientific or computational context, or with the implication that it is organized representation of facts or ideas in a formalized manner capable of being communicated or manipulated by some process. Mining: The activity of removing solid valuables from the earth Any activity that extracts or undermines The activity of placing explosives underground, rigged to explode Day-Ta data Dah-Ta

Promise of Data Data Driven Science Digital Government & Humanities Data revolution: Massive amounts of data being collected in different disciplines Data Driven Science Digital Government & Humanities Smart Health, Smart Cities, etc. Speaking to Data and Letting Data Speak!

Social Media Facebook Statistics 1.35 Billion active monthly users 864 Million daily active users 21minutes per day on average 300 Petabytes of user data 300 friends on avg for teens Age group:15-34 (66%), 12-17 (28%) Twitter Statistics 1 Billion registered users 100 Million daily active users 208 followers on avg per tweet http://www.internetlivestats.com/twitter-statistics/

Smart Health Fitbit – everybody?

Bioinformatics

Chem-informatics Structural Descriptors Physiochemical Descriptors Topological Descriptors Geometrical Descriptors AAACCTCATAGGAAGCATACCAGGAATTACATCA…

Eco-informatics Analyze complex ecological data from a highly-distributed set of field stations, laboratories, research sites, and individual researchers

Astro-Informatics New Astronomy Local vs. Distant Universe Rare/exotic objects Census of active galactic nuclei Search extra-solar planets National Virtual Observatory: Rise of the citizen scientist!

Geo-Informatics location-based services, humanitarian efforts What is data with geo-science?

Materials Informatics (Materials Genome Initiative)

Linked Open Data 570 Datasets and 2909 Interconnections

The Data Deluge: Rise of Complex Interlinked Data Massive amounts of DATA Various modalities: Tables, Text, Images, Video, Ontologies, Graphs Enriched Data: Weighted, Multi-labeled, Temporal/spatial attributes Distributed, Uncertain, Dynamic Massive: Tera/peta-scale & beyond Data Data Everywhere, Not Any Drop of Insight!

Data Mining Enabling the New Science of Data Study of DATA in its own right Develop methods and frameworks across various fields New data models: dynamic, streaming, etc. New mining algorithms that offer timely and reliable inference and information extraction: online, approximate Self-aware, intelligent continuous data analysis and mining Data Language(s) Data and model compression Data provenance Data security and privacy Data sensation: visual, aural, tactile

What is Data Mining? The iterative and interactive process of discovering valid, novel, useful, and understandable patterns or models in Massive databases

What is Data Mining? Valid: generalize to the future Novel: what we don't know Useful: be able to take some action Understandable: leading to insight Iterative: takes multiple passes Interactive: human in the loop

Data mining: Main Goals Prediction What? Opaque Description Why? Transparent Model Age Salary CarType High/Low Risk outlier

Data Mining: Main Techniques Classification: assign a new data record to one of several predefined categories or classes. Also called supervised learning. Regression: deals with predicting real-valued fields. Clustering: partition the dataset into subsets or groups such that elements of a group share a common set of properties, with high within group similarity and small inter-group similarity. Also called unsupervised learning.

Data Mining: Main Techniques Pattern Mining: detect set, sequence, or interlinked/graph patterns among entities and their attributes. Discover rules. For example, people who buy book X, also buy book Y. Or patterns of website visit, or social search. Outlier/anomaly detection: find the record(s) that is (are) the most different from the other records, i.e., find all outliers. These may be thrown away as noise or may be the “interesting” ones.

Data Mining Process Interpretation Data Mining Transformation Original Data Target Preprocessed Transformed Patterns Knowledge Selection Preprocessing Transformation Data Mining Interpretation

Data Mining Process Understand application domain Prior knowledge, user goals Create target dataset Select data, focus on subsets Data cleaning and transformation Remove noise, outliers, missing values Select features, reduce dimensions Original Data Target Preprocessed Transformed Patterns Knowledge Selection Preprocessing Transformation Data Mining Interpretation

Data Mining Process Apply data mining algorithm Associations, sequences, classification, clustering, etc. Interpret, evaluate and visualize patterns What's new and interesting? Iterate if needed Manage discovered knowledge Close the loop Original Data Target Preprocessed Transformed Patterns Knowledge Selection Preprocessing Transformation Data Mining Interpretation

Components of Data Mining Methods Representation: language for patterns/models, expressive power Evaluation: scoring methods for deciding what is a good fit of model to data Search: method for enumerating patterns/models

Kaggle: Data Science Challenges

Data Mining Tasks Prediction Methods Description Methods Use some variables to predict unknown or future values of other variables. Description Methods Find human-interpretable patterns that describe the data. From [Fayyad, et.al.] Advances in Knowledge Discovery and Data Mining, 1996

Data Mining Tasks... Classification [Predictive] Clustering [Descriptive] Association Rule Discovery [Descriptive] Regression [Predictive] Semi-supervised Learning Semi-supervised Clustering Semi-supervised Classification

Data Mining Tasks Cover in this Course Classification [Predictive] Association Rule Discovery [Descriptive] Clustering [Descriptive] Deviation Detection [Predictive] Semi-supervised Learning Semi-supervised Clustering Semi-supervised Classification

Survey Why are you taking this course? What would you like to gain from this course? What topics are you most interested in learning about from this course? Any other suggestions?

Reading assignment Chapter 1: data mining and analysis