PAKDD Panel: What Next Ramakrishnan Srikant. What Next Electronic Commerce –Catalog Integration (WWW 2001, with R. Agrawal) –Searching with Numbers (WWW.

Slides:



Advertisements
Similar presentations
ACHIEVING NETWORK LEVEL PRIVACY IN WIRELESS SENSOR NETWORKS.
Advertisements

CLOSENESS: A NEW PRIVACY MEASURE FOR DATA PUBLISHING
Abstract There is significant need to improve existing techniques for clustering multivariate network traffic flow record and quickly infer underlying.
ABSTRACT Due to the Internets sheer size, complexity, and various routing policies, it is difficult if not impossible to locate the causes of large volumes.
Data Mining: Potentials and Challenges Rakesh Agrawal & Jeff Ullman.
eClassifier: Tool for Taxonomies
System Overview Chapter 1. System Overview 1-2 Objectives Understand WIA Terminology Understand Overall Schematic Understand basic differences between.
Recommender Systems & Collaborative Filtering
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Buying a New Computer John Lewis On Call Computer Services March 14, 2007.
How The Internet Changed the Game Presented by: Duston Barto from Infinicom USA.
UWF Computing Hardware Standards ITS Annual Recommendations for
Mining customer ratings for product recommendation using the support vector machine and the latent class model William K. Cheung, James T. Kwok, Martin.
Rich feature Hierarchies for Accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitandra Malik (UC Berkeley)
Launch Data Pro Back Office Solutions for Open Systems Software and Services for Now and the Future.
ILUMINA 502 PRODUCT PRESENTATION
IPM THEORY CHALLENGE QUIZ NUMBER 1. Q1 - We are able to place organisations into which of the following categories based on their prime purpose A.Profit.
Introduction to Indexes Rui Zhang The University of Melbourne Aug 2006.
WEB USAGE MINING FRAMEWORK FOR MINING EVOLVING USER PROFILES IN DYNAMIC WEBSITE DONE BY: AYESHA NUSRATH 07L51A0517 FIRDOUSE AFREEN 07L51A0522.
Item Based Collaborative Filtering Recommendation Algorithms
Supporting End-User Access
Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:
Introduction to Information Retrieval
XML and Enterprise Computing. What is XML? Stands for “Extensible Markup Language” –similar to SGML and HTML –document “tags” are used to define content.
March 2013 ESSnet DWH - Workshop IV DATA LINKING ASPECTS OF COMBINING DATA INCLUDING OPTIONS FOR VARIOUS HIERARCHIES (S-DWH CONTEXT)
Merging Taxonomies. Assertion Creation and maintenance of large ontologies will require the capability to merge taxonomies This problem is similar to.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
Data Mining: Next 10 Years Rakesh Agrawal IBM Almaden Research Center Position from KDD-2001 Revisited.
Some Interesting Problems Rakesh Agrawal IBM Almaden Research Center.
Abstract Shortest distance query is a fundamental operation in large-scale networks. Many existing methods in the literature take a landmark embedding.
Data Mining Technologies for Digital Libraries & Web Information Systems Ramakrishnan Srikant.
IBM Start Now Business Intelligence Solutions. Agenda Overview of BI Who will buy and why Start Now BI solution Benefit to customer.
A Fast Clustering-Based Feature Subset Selection Algorithm for High- Dimensional Data.
Protecting Sensitive Labels in Social Network Data Anonymization.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Data Mining: Potentials and Challenges Rakesh Agrawal IBM Almaden Research Center.
Accuracy-Constrained Privacy-Preserving Access Control Mechanism for Relational Data.
Wright Technology Corp. Minh Duong Tina Mendoza Tina Mendoza Mark Rivera.
Externally Enhanced Classifiers and Application in Web Page Classification Join work with Chi-Feng Chang and Hsuan-Yu Chen Jyh-Jong Tsay National Chung.
Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
Abstract With the advent of cloud computing, data owners are motivated to outsource their complex data management systems from local sites to the commercial.
Adaptive Sampling Methods for Scaling up Knowledge Discovery Algorithms From Ch 8 of Instace selection and Costruction for Data Mining (2001) From Ch 8.
Facilitating Document Annotation using Content and Querying Value.
 A search agent scours the entire web.  Constantly Evolving and Expanding.
Clustering Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm.
Chapter 6 – Three Simple Classification Methods © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Catalog Integration R. Agrawal, R. Srikant: WWW-10.
Searching Specification Documents R. Agrawal, R. Srikant. WWW-2002.
Supporting Privacy Protection in Personalized Web Search.
CSC 594 Topics in AI – Text Mining and Analytics
Introducing the NEW Apple iBook Laptop Computer Anthony Shaffo NEW.
A Scalable Two-Phase Top-Down Specialization Approach for Data Anonymization Using MapReduce on Cloud.
1 CSI5388 Practical Recommendations. 2 Context for our Recommendations I This discussion will take place in the context of the following three questions:
Security Analysis of a Privacy-Preserving Decentralized Key-Policy Attribute-Based Encryption Scheme.
Catalog Integration B2B electronics portal: 2000 categories, 200K datasheets Master CatalogNew Catalog After integration: Goal Use affinity information.
Making Holistic Schema Matching Robust: An Ensemble Approach Bin He Joint work with: Kevin Chen-Chuan Chang Univ. Illinois at Urbana-Champaign.
Facilitating Document Annotation Using Content and Querying Value.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
RATION CARD MANAGEMENT SYSTEM
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Under the Guidance of V.Rajashekhar M.Tech Assistant Professor
ROBUST FACE NAME GRAPH MATCHING FOR MOVIE CHARACTER IDENTIFICATION
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
WEB BASED CENTRAL LIBRARY
Web Taxonomy Integration through Co-Bootstrapping
Opening Weka Select Weka from Start Menu Select Explorer Fall 2003
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Tantan Liu, Fan Wang, Gagan Agrawal The Ohio State University
Presentation transcript:

PAKDD Panel: What Next Ramakrishnan Srikant

What Next Electronic Commerce –Catalog Integration (WWW 2001, with R. Agrawal) –Searching with Numbers (WWW 2002, with R. Agrawal) Security Privacy

Catalog Integration B2B electronics portal: 2000 categories, 200K datasheets Master Catalog New Catalog

Intuition Use affinity information in new catalog. –Products in same category are similar. Bias Naïve Bayes classifier to incorporate this information. –Accuracy boost depends on match between two categorizations. –Use tuning set to determine weight given to affinity information.

Yahoo & Google 5 slices of the hierarchy: Autos, Movies, Outdoors, Photography, Software –Typical match: 69%, 15%, 3%, 3%, 1%, …. Merging Yahoo into Google –30% fewer errors (14.1% absolute difference in accuracy) Merging Google into Yahoo –26% fewer errors (14.3% absolute difference) Open Problems: SVM, Decision Tree,...

Data Extraction is hard Synonyms for attribute names and units. –"lb" and "pounds", but no "lbs" or "pound". Attribute names are often missing. –No "Speed", just "MHz Pentium III" –No "Memory", just "MB SDRAM" 850 MHz Intel Pentium III 192 MB RAM 15 GB Hard Disk DVD Recorder: Included; Windows Me 14.1 inch diplay 8.0 pounds

Searching with Numbers

Why does it work? Conjecture: If we get a close match on numbers, it is likely that we have correctly matched attribute names. Non-overlapping attributes: –Memory: Mb, Disk: Gb Correlations: –Memory: Mb, Disk: Gb still fine.

Empirical Results

Incorporating Hints Use simple data extraction techniques to get hints, Names/Units in query matched against Hints. Open Problem: Rethink data extraction in this context.

Security

Some Hard Problems Past may be a poor predictor of future –Abrupt changes Reliability and quality of data –Wrong training examples Simultaneous mining over multiple data types Richer patterns

Privacy Preserving Data Mining Have your cake and mine it too! –Preserve privacy at the individual level, but still build accurate models. Challenges –Privacy Breaches –Clustering & Associations –Privacy-sensitive Security Applications Opportunities –Web Demographics –Inter-Enterprise Data Mining –Privacy-sensitive Security Applications