Data Mining: Potentials and Challenges Rakesh Agrawal & Jeff Ullman.

Slides:



Advertisements
Similar presentations
PAKDD Panel: What Next Ramakrishnan Srikant. What Next Electronic Commerce –Catalog Integration (WWW 2001, with R. Agrawal) –Searching with Numbers (WWW.
Advertisements

Taxonomy & Ontology Impact on Search Infrastructure John R. McGrath Sr. Director, Fast Search & Transfer.
Data Mining: What? WHY? HOW?
Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute INTRODUCTION TO KNOWLEDGE DISCOVERY IN DATABASES AND DATA MINING.
Supporting End-User Access
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Machine Learning and Data Mining Course Summary. 2 Outline  Data Mining and Society  Discrimination, Privacy, and Security  Hype Curve  Future Directions.
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
Semantic Web and Web Mining: Networking with Industry and Academia İsmail Hakkı Toroslu IST EVENT 2006.
An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University This is joint work with Murat.
Data Security against Knowledge Loss *) by Zbigniew W. Ras University of North Carolina, Charlotte, USA.
Search and Data Management Rakesh Agrawal MSR Search Lab.
Spatial and Temporal Data Mining V. Megalooikonomou Introduction to Decision Trees ( based on notes by Jiawei Han and Micheline Kamber and on notes by.
Data Mining: Next 10 Years Rakesh Agrawal IBM Almaden Research Center Position from KDD-2001 Revisited.
Finding Personally Identifying Information Mark Shaneck CSCI 5707 May 6, 2004.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Privacy Preserving K-means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton.
Some Interesting Problems Rakesh Agrawal IBM Almaden Research Center.
Summarization of Frequent Pattern Mining. What is FPM? Why being frequent is so important? Application of FPM Decision make/Business Software Debugging.
Privacy Preserving Data Mining Yehuda Lindell & Benny Pinkas.
Data Mining Adrian Tuhtan CS157A Section1.
Privacy-Preserving Data Mining Rakesh Agrawal Ramakrishnan Srikant IBM Almaden Research Center 650 Harry Road, San Jose, CA Published in: ACM SIGMOD.
Panel on New Research Directions in KDD Ted E. Senator Disclaimer: Views are My Own, not necessarily those of DARPA, Department.
Overview of Web Data Mining and Applications Part I
Educational Data Mining and DataShop John Stamper Carnegie Mellon University 1 9/12/2012 PSLC Corporate Partner Meeting 2012.
Computer Science Universiteit Maastricht Institute for Knowledge and Agent Technology Data mining and the knowledge discovery process Summer Course 2005.
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Chapter 11 Managing Knowledge. Dimensions of Knowledge.
Data Mining Techniques
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 2.
1 Controversial Issues  Data mining (or simple analysis) on people may come with a profile that would raise controversial issues of  Discrimination 
Tools for Privacy Preserving Distributed Data Mining
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Course on Data Mining: Seminar Meetings Page 1/17 Course on Data Mining ( ): Seminar Meetings Ass. Rules EpisodesEpisodes Text Mining
Data Mining: Potentials and Challenges Rakesh Agrawal IBM Almaden Research Center.
Data Mining By Dave Maung.
Privacy preserving data mining Li Xiong CS573 Data Privacy and Anonymity.
Guest Lecture Introduction to Data Mining Dr. Bhavani Thuraisingham September 17, 2010.
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Randomization in Privacy Preserving Data Mining Agrawal, R., and Srikant, R. Privacy-Preserving Data Mining, ACM SIGMOD’00 the following slides include.
27-18 września Data Mining dr Iwona Schab. 2 Semester timetable ORGANIZATIONAL ISSUES, INDTRODUCTION TO DATA MINING 1 Sources of data in business,
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Sovereign Information Sharing, Searching and Mining Rakesh Agrawal IBM Almaden Research Center.
DATA MINING PREPARED BY RAJNIKANT MODI REFERENCE:DOUG ALEXANDER.
1 Privacy Preserving Data Mining Introduction August 2 nd, 2013 Shaibal Chakrabarty.
Academic Year 2014 Spring Academic Year 2014 Spring.
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.
Privacy Preserving Outlier Detection using Locality Sensitive Hashing
1 Maintaining Data Privacy in Association Rule Mining Speaker: Minghua ZHANG Oct. 11, 2002 Authors: Shariq J. Rizvi Jayant R. Haritsa VLDB 2002.
CS570: Data Mining Spring 2010, TT 1 – 2:15pm Li Xiong.
Data Mining Concept Submitted TO: Mrs. MONIKA SUBMITTED BY: SHALU 4717.
Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization.
Data mining in web applications
DATA MINING © Prentice Hall.
Introduction to Data Mining
Introduction C.Eng 714 Spring 2010.
Data and Applications Security Introduction to Data Mining
Adrian Tuhtan CS157A Section1
Sangeeta Devadiga CS 157B, Spring 2007
CSE591: Data Mining by H. Liu
Supporting End-User Access
A Comparison of Capabilities of Data Mining Tools
Data Warehousing Data Mining Privacy
Dept. of Computer Science University of Liverpool
CSE591: Data Mining by H. Liu
Promising “Newer” Technologies to Cope with the
Presentation transcript:

Data Mining: Potentials and Challenges Rakesh Agrawal & Jeff Ullman

Observations Transfer of data mining research into deployed applications and commercial products – Greater success in vertical applications – Horizontal tools: Examples: SAS Enterprise Miner: Sophisticated Statisticians segment DB2 Intelligent Miner: database applications requiring mining Emergence of the application of data mining in non-conventional domains – Combination of structured and unstructured data New challenges due to security/privacy concerns DARPA initiative to fund data mining research

Identifying Social Links Using Association Rules Input: Crawl of about 1 million pages

Website Profiling using Classification Input: Example pages for each category during training

Discovering Trends Using Sequential Patterns & Shape Queries Input: i) patent database ii) shape of interest

Discovering Micro-communities Frequently co-cited pages are related. Pages with large bibliographic overlap are related.

New Challenges Privacy-preserving data mining Data mining over compartmentalized databases

Inducing Classifiers over Privacy Preserved Numeric Data 30 | 25K | …50 | 40K | … Randomizer 65 | 50K | … Randomizer 35 | 60K | … Reconstruct Age Distribution Reconstruct Salary Distribution Decision Tree Algorithm Model 30 become s 65 (30+35) Alices age Alices salary Johns age

Other recent work Cryptographic approach to privacy- preserving data mining – Lindell & Pinkas, Crypto 2000 Privacy-Preserving discovery of association rules – Vaidya & Clifton, KDD2002 – Evfimievski et. Al, KDD 2002 – Rizvi & Haritsa, VLDB 2002

Computation over Compartmentalized Databases

Some Hard Problems Past may be a poor predictor of future – Abrupt changes – Wrong training examples Actionable patterns (principled use of domain knowledge?) Over-fitting vs. not missing the rare nuggets Richer patterns Simultaneous mining over multiple data types When to use which algorithm? Automatic, data-dependent selection of algorithm parameters

Discussion Should data mining be viewed as rich querying and deeply integrated with database systems? – Most of current work make little use of database functionality Should analytics be an integral concern of database systems? Issues in data mining over heterogeneous data repositories (Relationship to the heterogeneous systems discussion)

Summary Data mining has shown promise but needs much more further research We stand on the brink of great new answers, but even more, of great new questions -- Matt Ridley