ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Web Information Retrieval and Extraction Chia-Hui Chang, Associate Professor National Central University, Taiwan
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
AsiaCrypt Program Committee Report Chi Sung Laih Nov.30~Dec.4,2003 Taipei, Taiwan.
Web Information Retrieval and Extraction Chia-Hui Chang, Associate Professor National Central University, Taiwan Sep. 16, 2005.
Clementine Server Clementine Server A data mining software for business solution.
SSTD 2011 Research Track Yufei Tao, Dieter Pfoser PC co-chairs.
Intelligent Systems Group Emmanuel Fernandez Larry Mazlack Ali Minai (coordinator) Carla Purdy William Wee.
Introduction to WEKA Aaron 2/13/2009. Contents Introduction to weka Download and install weka Basic use of weka Weka API Survey.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
T. P. Hong 1 Research Artificial Intelligence Expert Systems Machine Learning Knowledge Integration Heuristic Search Parallel Processing Top-down Bottom-up.
Advanced Database Applications Database Indexing and Data Mining CS591-G1 -- Fall 2001 George Kollios Boston University.
Presented To: Madam Nadia Gul Presented By: Bi Bi Mariam.
Oracle Data Mining Ying Zhang. Agenda Data Mining Data Mining Algorithms Oracle DM Demo.
Introduction to Data Mining Engineering Group in ACL.
© Lloyd’s Regional Watch Content Guide CLICK ANY BOX AMERICAS IMEA EUROPE ASIA PACIFIC.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Data Mining Techniques
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Data Mining Chun-Hung Chou
Last Words COSC Big Data (frameworks and environments to analyze big datasets) has become a hot topic; it is a mixture of data analysis, data mining,
Solar Physics Board Meeting Rio de Janeiro July, 2009.
Discovery Science 2006 Report of the Program Chairs Klaus P. Jantke, General Chair Nada Lavrač and Ljupčo Todorovski, Program Chairs Ricard Gavalda, Local.
PRIVP Huang Overview of Successes and Challenges
General Information Course Id: COSC6342 Machine Learning Time: TU/TH 10a-11:30a Instructor: Christoph F. Eick Classroom:AH123
Chapter 1 Introduction to Data Mining
1 Announcing … Global broadband subscribers to 30 June 2005 Total: 176 million 115 million * 65% * choose DSL.
1 ICDM 2004 Business Meeting 11/4/2004 Data Mining on ICDM Submission Data Shusaku Tsumoto Ning Zhong and Xindong Wu.
Report of DS 2004 Einoshin Suzuki Yokohama National University, Japan October 2, 2004 Discovery Science Padova, Italy.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Last Words DM 1. Mining Data Steams / Incremental Data Mining / Mining sensor data (e.g. modify a decision tree assuming that new examples arrive continuously,
27-18 września Data Mining dr Iwona Schab. 2 Semester timetable ORGANIZATIONAL ISSUES, INDTRODUCTION TO DATA MINING 1 Sources of data in business,
1 STAT 5814 Statistical Data Mining. 2 Use of SAS Data Mining.
Mining real world data Web data. World Wide Web Hypertext documents –Text –Links Web –billions of documents –authored by millions of diverse people –edited.
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
ML, DM education What’s cookin’ ? Maja Skrjanc, Tanja Urbancic, Peter Flach.
Text Document Categorization by Term Association Maria-luiza Antonie Osmar R. Zaiane University of Alberta, Canada 2002 IEEE International Conference on.
Chapter 27 Chapter 27 Geographic Variability in Hip and Vertebral Fractures Copyright © 2013 Elsevier Inc. All rights reserved.
CSE & CSE6002E - Soft Computing Winter Semester, 2011 Course Review.
The United States The Economy. What is GDP ? Gross Domestic Product (GDP): The total market (or dollar) value of all final goods and services produced.
The (IMG) Systems for Comparative Analysis of Microbial Genomes & Metagenomes: N America: 1,180 Europe: 386 Asia: 235 Africa: 6 Oceania: 81 S America:
Submissions154 Accepted33 Acceptance rate0.21 Reviews339 External reviewers64 External reviews82 OverallDC track Submissions75 Accepted18 Acceptance rate0.24.
General Information Course Id: COSC6342 Machine Learning Time: TU/TH 1-2:30p Instructor: Christoph F. Eick Classroom:AH301
Biological data representation and data mining Xin Chen
Why Intelligent Data Analysis? Joost N. Kok Leiden Institute of Advanced Computer Science Universiteit Leiden.
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
BPM 2014 A word from the program chairs
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 1 —
Data Mining – Intro.
Incheon (2009) Registration
RESEARCH APPROACH.
The Most Visited Countries
Introduction C.Eng 714 Spring 2010.
ICDIS 2018 Intelligence and Security
Bric's Countries in the Webometrics Ranking of World Universities
Data Mining: Concepts and Techniques Course Outline
Machine Learning & Data Science
Status of EQ-5D-5L Valuation Using Standardized Valuation Methodology
What is Pattern Recognition?
Research Areas Christoph F. Eick
CSE591: Data Mining by H. Liu
Data Warehousing and Data Mining
Supporting End-User Access
Welcome! Knowledge Discovery and Data Mining
2006 Rank Adjusted for Purchasing Power
Presentation transcript:

ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto

Basic Statistics (Country) 37 countries, 486 Submissions Regular Papers: 58 (12%) Short Papers: 67 (14%) High Acceptance Ratio (Regular) –Israel: 4/11 (37%) –Hong Kong: 3/11 (33%)

CountryTotalRegularShort Acceptance Ratio USA % China45204% Australia293528% Canada280621% Germany192432% Japan194337% France181217% Taiwan160319% Brazil15000% Hong Kong124250% UK121225% Israel114255% Italy81125% Finland71129% India70114% Korea60117% Top % Total %

Comparison with 2002 (Top 5)

Basic Statistics (Topics) Top 5 of Submissions: –Mining text and semi-structured data, and mining temporal, spatial and multimedia data –Data mining and machine learning algorithms and methods in traditional areas and in new areas –Data mining applications in electronic commerce, bioinformatics, computer security, Web intelligence –Soft computing and uncertainty management –Data pre-processing, data reduction, feature selection and feature transformation High Acceptance Ratio (Regular) –Statistics and probability in large-scale data mining –Security, privacy and social impact of data mining

TotalRegularShortAcceptance Ratio Mining text and semi-structured data, and mining temporal, spatial and multimedia data % Data mining and machine learning algorithms and methods in traditional areas (such as classification, regression, clustering, probabilistic modeling, and association analysis), and in new areas % Data mining applications in electronic commerce, bioinformatics, computer security, Web intelligence, intelligent learning database system % Soft computing (including neural networks, fuzzy logic, evolutionary computation, and rough sets) and uncertainty management for data mining % Data pre-processing, data reduction, feature selection and feature transformation413520% Complexity, efficiency, and scalability issues in data mining304427% Others211424% Foundations of data mining182117% Data and knowledge representation for data mining163125% Human-machine interaction and visualization in data mining, and visual data mining % Quality assessment and interestingness metrics of data mining results162331% Statistics and probability in large-scale data mining156147% High performance and distributed data mining121225% Post-processing of data mining results111336% Pattern recognition and scientific discovery81013% Security, privacy and social impact of data mining72257% Integration of data warehousing, OLAP and data mining5000% Process-centric data mining and models of data mining process51380% Total %

Comparison with 2002 (Top 5)

Review Scores N Average: SD

Box Plot

Comparison with 2002 Country vs Final Decision –Regular: Hong Kong => Hong Kong, Israel –Short: USA => ? –Reject: Japan, Taiwan => Most of the countries Topics vs Final Decision –Regular: Temporal => Statistics and Probability Text Visualization –Short: Similarity => Postprocessing –Reject: Bayesian => Feature Selection

Corresponding Analysis (Country vs Final Decision) Reject Regular Short Belgium Israel Hong Kong USA r2=0.235 China Brasil France Poland Japan r1=0.325

Corresponding Analysis (Topics vs Final Decision) Reject Short Regular Statistics and probability Security, privacy Process-centric DM Integration of DTW, OLAP and DM Post-processing Human-machine interaction and visualization r1=0.218 r2=0.200 Feature Selection

Corresponding Analysis (# of Authors vs Final Decision) Reject Short Regular Process-centric DM 1 Human-machine interaction and visualization r1=0.218 r2=

Corresponding Summaries Country vs Final Decision –Regular: Hong Kong, Israel –Short: ? –Reject: Most of the countries are located near this region. Topics vs Final Decision –Regular: Statistics and Probability, Visualization –Short: Postprocessing –Reject: Feature Selection # of Authors vs Final Decision –1 or 4 : Regular –2 or 3 : between Short and Regular

Corresponding Analysis (2002) (Country vs Final Decision) Rule: [R1=0]  [R_2=0] :| [R_1=0] |  | [R_2=0] | Rule  Relations between Sets Relation between Supporting Sets are very important. – Rough Set / Granular Computing Index for Rule Induction: – P(R2|R1), P(R1|R2), or f(P(R2|R1)) – Relation between Information Granules Reject Short Regular Hong Kong Austria Japan Taiwan Australia Finland USA Canada China Thailand

Corresponding Analysis in 2002 (Category vs Final Decision) Reject Short Regular Bayesian Statistics Similarity Interestingness Active Learning Theory Temporal Web Mining Structured Text Mining SVM Rule Tree Applications Association R

Comparison with 2002 Country vs Final Decision –Regular: Hong Kong => Hong Kong, Israel –Short: USA => ? –Reject: Japan, Taiwan => Most of the countries Topics vs Final Decision –Regular: Temporal => Statistics and Probability Text Visualization –Short: Similarity => Postprocessing –Reject: Bayesian => Feature Selection

Rule Mining Datasets – Sample Size: 486 – Attributes: 5 Paper No. : ordered by submission date # of Authors # of Characters in Title Country Category –Analyzed by Clementine 7.1

Rule Mining (2) C5.0 –[FINAL=long] 2] & [# of Chars in Title <= 75.0] (Confidence 0.667, Support : 3) – [FINAL=Reject] 4] & [Paper No.>117] & [# of Chars in Titles > 71.0] (Confidence 0.857, Support: 10) # of Authors, Paper No, # of Chars : Important Features

Rule Mining (3) Generalized Rule Induction –[FINAL = Reject]<=[PAPER No. < ] (Confidence: 90%, Support:10.7%) – [FINAL=Reject] 49.5] (Confidence: 100%, Support 4.73%) –[FINAL = long] ] (Confidence: 60%, Support: 1.03%) Paper No.,# of Charits in Title: Important Features

Rule Mining in 2002 C5.0 – [# of Chars in Titles> 43] => Rejected (Conf , Support: 303) – [Paper No. Regular (Conf , Support :4)

Rule Mining in 2002 (Association) Rules –Rejected <= [Paper No.< 542.5] (Conf: 0.88, Suport :41) –Rejected 53.5 ] (Conf: 0.833, Support :29) –Regular <= [Country=Canada] & [Category=Text Mining] (Conf: 0.6, Support: 5) Paper No., Country, Category

Comparison with 2002 Important Features in 2003 –# of Authors, Paper No, # of Chars –Early 57 papers, Long Titles, 2 authors Important Features in 2002 –Paper No, # of Chars, Country, Category –Early 52 papers, Long Titles

Conclusions Do not submit a paper too fast ! –Reflection not only on the contents, but also on the titles needed Mining Text/Web/Semi-structured Data are very popular now. Statistics and Probability is a very stronger topic. Security and Privacy Issues become stronger. Visualization/Interaction are emerging in ICDM 2003: –Visualization/Human-Machine Interaction –Postprocessing of DM Results –Process-centric DM