Download presentation
Presentation is loading. Please wait.
Published byBarnard Stokes Modified over 8 years ago
1
Introduction
2
Instructor: Cengiz Örencik E-mail: cengizorencik@beykent.edu.tr Course materials: myweb.sabanciuniv.edu/cengizo/courses
3
Reference Books ◦ Veri Madenciliği: Kavram ve Algoritmaları, Doç. Dr. Gökhan Silahtaroğlu, 2013 ◦ Data Mining: Concepts and Techniques, Jiawei Han and Micheline Kamber, 2010
4
1 midterm%30 2 inclass quiz %20 1 final %50 HW ?
5
Fundamental data mining tools / concepts Classification, clustering, associations and correlations algorithms Real life examples and implementations
6
Data preprocess Data Warehouses ◦ Data from different sources/different structure unified schema, reside at a single site ◦ Periodic data summary Associations and correlations ◦ Market basket analysis, etc. Classification and prediction ◦ E.g. is he trustable for credit application?
7
Cluster Analysis ◦ People with similar spending patterns Text and WEB mining Privacy preserving data mining ◦ Protect personal information
8
“Necessity is the mother of invention” Plato
9
Continuously petabytes of new data is produced ◦ 90% of world's data generated over last two years ◦ Twitter, facebook, online shopping, mobese cams etc. Easy to access and store data e.g. customer voice records Web Crawler e.g. twits that contain “election” and “party” terms Hard part is getting knowledge from the data
10
Data mining is extracting non-trivial (previously unknown) and valid knowledge from large amounts of data that can be used in decision making Non-trivial ◦ Huge cost to get predictable info ◦ Not to prove sth you already know Diaper – beer correlation Large data ◦ Validity Decision making
11
DatabasesData Mining Query ◦ Suitable SQL – relational DB Data ◦ Dynamic Output ◦ known ◦ Subset of data Query ◦ Not suitable ◦ No common language Data ◦ Static Output ◦ Not known ◦ Not subset of data
12
Database queries ◦ List of the people that has a boat at Kalamış marine and has the name “Ahmet” ◦ Credit card owners under 30 that has >5000 TL/m spending Data Mining Queries ◦ Credit application with low risk (classification) ◦ Card owners with similar buying patterns (clustering) ◦ Products purchased together with PS4 games (association rules)
13
Databases Data Warehouse Data Mining patterns Knowledge CleaningSelection transformation Evaluation Presentation
14
14 Increasing potential to support business decisions End User Business Analyst Data Analyst DBA Decision Making Data Presentation Visualization Techniques Data Mining Information Discovery Data Exploration Statistical Summary, Querying, and Reporting Data Preprocessing/Integration, Data Warehouses Data Sources Paper, Files, Web documents, Scientific experiments, Database Systems
15
Market analysis ◦ Target audience, customer relations Risk analysis ◦ Resource management, check competitive enterprise Fraud detection ◦ Insurance, banking ◦ Modeling using history data Document similarity ◦ plagiarism
16
Want to fit data into a model Predictive mining ◦ Classify people that may not pay mortgage payments ◦ Predict people that leave your company for another ◦ Predict exchange market (borsa) Descriptive mining ◦ Shows hidden information ◦ Shows your best customers ◦ Which products sell together ◦ Which customers have similar shopping trends
17
Classification [Predictive] Clustering [Descriptive] Association Rules [Descriptive]
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.