Machine Learning and Data Mining Course Summary. 2 Outline  Data Mining and Society  Discrimination, Privacy, and Security  Hype Curve  Future Directions.

Slides:



Advertisements
Similar presentations
Research Methodology Chapter 1.
Advertisements

Data Mining: Potentials and Challenges Rakesh Agrawal & Jeff Ullman.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
PolyAnalyst Data and Text Mining tool Your Knowledge Partner TM www
Face detection Many slides adapted from P. Viola.
Data warehouse example
From Data Mining to Knowledge Discovery: An Introduction Gregory Piatetsky-Shapiro KDnuggets.
Chapter 3 Database Management
Week 9 Data Mining System (Knowledge Data Discovery)
CHAPTER 6 SECONDARY DATA SOURCES. Important Topics of This Chapter Success of secondary data. To understand how to create an internal database. To distinguish.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Basic Data Mining Techniques
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
SLIDE 1IS 257 – Fall 2008 Data Mining and the Weka Toolkit University of California, Berkeley School of Information IS 257: Database Management.
Data Mining By Archana Ketkar.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
Supporting Decision Making Chapter 10 McGraw-Hill/IrwinCopyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved.
Data Mining – Intro.
Executive Dashboard Systems Secure CITI Adam Zagorecki April 30, 2004.
Data Mining: A Closer Look
Data Mining.
Introduction to machine learning
Features and Functions of Information Systems. What are information systems?  Information systems consist of software, hardware and communication networks.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Operational Data Tools Chapter Eight. Copyright © Houghton Mifflin Company. All rights reserved.8–28–2 Chapter Eight Learning Objectives To learn database.
Data Mining Knowledge Discovery: An Introduction
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 2.
ISOM Data Mining and Warehousing Arijit Sengupta.
More on Data Mining KDnuggets Datanami ACM SIGKDD
Advanced Information Technology in Law Enforcement: Challenges and Barriers to Implementation Andreas M. Olligschlaeger, Ph.D. President, TruNorth Data.
Data Mining Dr. Chang Liu. What is Data Mining Data mining has been known by many different terms Data mining has been known by many different terms Knowledge.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Techniques As Tools for Analysis of Customer Behavior
Data Mining Chun-Hung Chou
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Data Management Turban, Aronson, and Liang Decision Support Systems and Intelligent Systems, Seventh Edition.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Chapter 6: Foundations of Business Intelligence - Databases and Information Management Dr. Andrew P. Ciganek, Ph.D.
Chapter 1 Introduction to Data Mining
SCSC 311 Information Systems: hardware and software.
Data Mining Process A manifestation of best practices A systematic way to conduct DM projects Different groups has different versions Most common standard.
Slides prepared by Cyndi Chie and Sarah Frye1 A Gift of Fire Third edition Sara Baase Chapter 2: Privacy.
1 Controversial Issues  Data mining (or simple analysis) on people may come with a profile that would raise controversial issues of  Discrimination 
Data Mining: Potentials and Challenges Rakesh Agrawal IBM Almaden Research Center.
Data Mining By Dave Maung.
Introduction – Addressing Business Challenges Microsoft® Business Intelligence Solutions.
Chapter 4 Data and Databases. Learning Objectives Upon successful completion of this chapter, you will be able to: Describe the differences between data,
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Guest Lecture Introduction to Data Mining Dr. Bhavani Thuraisingham September 17, 2010.
1 CHAPTER 2 Decision Making, Systems, Modeling, and Support.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Last Words DM 1. Mining Data Steams / Incremental Data Mining / Mining sensor data (e.g. modify a decision tree assuming that new examples arrive continuously,
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
April 25 th Classrooms for the Future Facts 08’  358 High Schools in PA  12,100 Teachers  83,000 Laptops  101 Million Statewide Spent  3.75.
Introduction.  Instructor: Cengiz Örencik   Course materials:  myweb.sabanciuniv.edu/cengizo/courses.
Risk Controls in IA Zachary Rensko COSC 481. Outline Definition Risk Control Strategies Risk Control Categories The Human Firewall Project OCTAVE.
Chapter 3 Building Business Intelligence Chapter 3 DATABASES AND DATA WAREHOUSES Building Business Intelligence 6/22/2016 1Management Information Systems.
Data Mining – Intro.
Introduction C.Eng 714 Spring 2010.
Data and Applications Security Introduction to Data Mining
Data Warehousing and Data Mining
From Data Mining to Knowledge Discovery: An Introduction
PolyAnalyst Web Report Training
CSE591: Data Mining by H. Liu
Presentation transcript:

Machine Learning and Data Mining Course Summary

2 Outline  Data Mining and Society  Discrimination, Privacy, and Security  Hype Curve  Future Directions  Course Summary

3 Controversial Issues  Data mining (or simple analysis) on people may come with a profile that would raise controversial issues of  Discrimination  Privacy  Security  Examples:  Should males between 18 and 35 from countries that produced terrorists be singled out for search before flight?  Can people be denied mortgage based on age, sex, race?  Women live longer. Should they pay less for life insurance?

4 Data Mining and Discrimination  Can discrimination be based on features like sex, age, national origin?  In some areas (e.g. mortgages, employment), some features cannot be used for decision making  In other areas, these features are needed to assess the risk factors  E.g. people of African descent are more susceptible to sickle cell anemia

5 Data Mining and Privacy  Can information collected for one purpose be used for mining data for another purpose  In Europe, generally no, without explicit consent  In US, generally yes  Companies routinely collect information about customers and use it for marketing, etc.  People may be willing to give up some of their privacy in exchange for some benefits  See Data Mining And Privacy Symposium,

6 Data Mining with Privacy  Data Mining looks for patterns, not people!  Technical solutions can limit privacy invasion  Replacing sensitive personal data with anon. ID  Give randomized outputs  return salary + random()  …  See Bayardo & Srikant, Technological Solutions for Protecting Privacy, IEEE Computer, Sep 2003

7 Data Mining and Security Controversy in the News  TIA: Terrorism (formerly Total) Information Awareness Program –  DARPA program closed by Congress, Sep 2003  some functions transferred to intelligence agencies  CAPPS II – screen all airline passengers  controversial  …  Invasion of Privacy or Defensive Shield?

8 Criticism of analytic approach to Threat Detection: Data Mining will  invade privacy  generate millions of false positives But can it be effective?

9 Is criticism sound ?  Criticism: Databases have 5% errors, so analyzing 100 million suspects will generate 5 million false positives  Reality: Analytical models correlate many items of information to reduce false positives.  Example: Identify one biased coin from 1,000.  After one throw of each coin, we cannot  After 30 throws, one biased coin will stand out with high probability.  Can identify 19 biased coins out of 100 million with sufficient number of throws

10 Another Approach: Link Analysis Can Find Unusual Patterns in the Network Structure

11 Analytic technology can be effective  Combining multiple models and link analysis can reduce false positives  Today there are millions of false positives with manual analysis  Data mining is just one additional tool to help analysts  Analytic technology has the potential to reduce the current high rate of false positives

12 Data Mining and Society  No easy answers to controversial questions  Society and policy-makers need to make an educated choice  Benefits and efficiency of data mining programs vs. cost and erosion of privacy

13 The Hype Curve for Data Mining and Knowledge Discovery Over-inflated expectations rising expectations

14 The Hype Curve for Data Mining and Knowledge Discovery Over-inflated expectations Disappointment Growing acceptance and mainstreaming rising expectations

15 Data Mining Future Directions  Currently, most data mining is on flat tables  Richer data sources  text, links, web, images, multimedia, knowledge bases  Advanced methods  Link mining, Stream mining, …  Applications  Web, Bioinformatics, Customer modeling, …

16 Challenges for Data Mining  Technical  tera-bytes and peta-bytes  complex, multi-media, structured data  integration with domain knowledge  Business  finding good application areas  Societal  Privacy issues

17 Data Mining Central Quest Find true patterns and avoid overfitting (false patterns due to randomness)

18 Knowledge Discovery Process Monitoring Start with Business (Problem) Understanding Data Preparation usually takes the most effort Knowledge Discovery is an Iterative Process Data Preparation

19 Key Ideas  Avoid Overfitting!  Data Preparation  catch false predictors  evaluation: train, validate, test subset  Classification: C4.5, Bayes, CART  Targeted Marketing: Lift, Gains, GPS Lift estimate  Clustering, Association, Other tasks  Knowledge Discovery is a Process

20 Final Exam Topics

21 Happy Discoveries! Data Mining and Knowledge Discovery site Data Mining and Knowledge Discovery Society: ACM SIGKDD