Download presentation
Presentation is loading. Please wait.
Published byDiana Sparks Modified over 9 years ago
1
ICDM 2003 Review Data Analysis - with comparison between 02 and 03 - Xindong Wu and Alex Tuzhilin Analyzed by Shusaku Tsumoto
2
Basic Statistics (Country) 37 countries, 486 Submissions Regular Papers: 58 (12%) Short Papers: 67 (14%) High Acceptance Ratio (Regular) –Israel: 4/11 (37%) –Hong Kong: 3/11 (33%)
3
CountryTotalRegularShort Acceptance Ratio USA189352833% China45204% Australia293528% Canada280621% Germany192432% Japan194337% France181217% Taiwan160319% Brazil15000% Hong Kong124250% UK121225% Israel114255% Italy81125% Finland71129% India70114% Korea60117% Top 15441586127% Total486586726%
4
Comparison with 2002 (Top 5)
5
Basic Statistics (Topics) Top 5 of Submissions: –Mining text and semi-structured data, and mining temporal, spatial and multimedia data –Data mining and machine learning algorithms and methods in traditional areas and in new areas –Data mining applications in electronic commerce, bioinformatics, computer security, Web intelligence –Soft computing and uncertainty management –Data pre-processing, data reduction, feature selection and feature transformation High Acceptance Ratio (Regular) –Statistics and probability in large-scale data mining –Security, privacy and social impact of data mining
6
TotalRegularShortAcceptance Ratio Mining text and semi-structured data, and mining temporal, spatial and multimedia data 81101227% Data mining and machine learning algorithms and methods in traditional areas (such as classification, regression, clustering, probabilistic modeling, and association analysis), and in new areas 7711825% Data mining applications in electronic commerce, bioinformatics, computer security, Web intelligence, intelligent learning database system 615618% Soft computing (including neural networks, fuzzy logic, evolutionary computation, and rough sets) and uncertainty management for data mining 462924% Data pre-processing, data reduction, feature selection and feature transformation413520% Complexity, efficiency, and scalability issues in data mining304427% Others211424% Foundations of data mining182117% Data and knowledge representation for data mining163125% Human-machine interaction and visualization in data mining, and visual data mining 163338% Quality assessment and interestingness metrics of data mining results162331% Statistics and probability in large-scale data mining156147% High performance and distributed data mining121225% Post-processing of data mining results111336% Pattern recognition and scientific discovery81013% Security, privacy and social impact of data mining72257% Integration of data warehousing, OLAP and data mining5000% Process-centric data mining and models of data mining process51380% Total486586726%
7
Comparison with 2002 (Top 5)
8
Review Scores 2002 2003 N 347 486 Average: 2.39 2.32 SD 0.90 0.92
9
Box Plot
10
Comparison with 2002 Country vs Final Decision –Regular: Hong Kong => Hong Kong, Israel –Short: USA => ? –Reject: Japan, Taiwan => Most of the countries Topics vs Final Decision –Regular: Temporal => Statistics and Probability Text Visualization –Short: Similarity => Postprocessing –Reject: Bayesian => Feature Selection
11
Corresponding Analysis (Country vs Final Decision) Reject Regular Short Belgium Israel Hong Kong USA r2=0.235 China Brasil France Poland Japan r1=0.325
12
Corresponding Analysis (Topics vs Final Decision) Reject Short Regular Statistics and probability Security, privacy Process-centric DM Integration of DTW, OLAP and DM Post-processing Human-machine interaction and visualization r1=0.218 r2=0.200 Feature Selection
13
Corresponding Analysis (# of Authors vs Final Decision) Reject Short Regular Process-centric DM 1 Human-machine interaction and visualization r1=0.218 r2=0.200 4 5 2 3 6
14
Corresponding Summaries Country vs Final Decision –Regular: Hong Kong, Israel –Short: ? –Reject: Most of the countries are located near this region. Topics vs Final Decision –Regular: Statistics and Probability, Visualization –Short: Postprocessing –Reject: Feature Selection # of Authors vs Final Decision –1 or 4 : Regular –2 or 3 : between Short and Regular
15
Corresponding Analysis (2002) (Country vs Final Decision) Rule: [R1=0] [R_2=0] :| [R_1=0] | | [R_2=0] | Rule Relations between Sets Relation between Supporting Sets are very important. – Rough Set / Granular Computing Index for Rule Induction: – P(R2|R1), P(R1|R2), or f(P(R2|R1)) – Relation between Information Granules Reject Short Regular Hong Kong Austria Japan Taiwan Australia Finland USA Canada China Thailand
16
Corresponding Analysis in 2002 (Category vs Final Decision) Reject Short Regular Bayesian Statistics Similarity Interestingness Active Learning Theory Temporal Web Mining Structured Text Mining SVM Rule Tree Applications Association R
17
Comparison with 2002 Country vs Final Decision –Regular: Hong Kong => Hong Kong, Israel –Short: USA => ? –Reject: Japan, Taiwan => Most of the countries Topics vs Final Decision –Regular: Temporal => Statistics and Probability Text Visualization –Short: Similarity => Postprocessing –Reject: Bayesian => Feature Selection
18
Rule Mining Datasets – Sample Size: 486 – Attributes: 5 Paper No. : ordered by submission date # of Authors # of Characters in Title Country Category –Analyzed by Clementine 7.1
19
Rule Mining (2) C5.0 –[FINAL=long] 2] & [# of Chars in Title <= 75.0] (Confidence 0.667, Support : 3) – [FINAL=Reject] 4] & [Paper No.>117] & [# of Chars in Titles > 71.0] (Confidence 0.857, Support: 10) # of Authors, Paper No, # of Chars : Important Features
20
Rule Mining (3) Generalized Rule Induction –[FINAL = Reject]<=[PAPER No. < 67.500] (Confidence: 90%, Support:10.7%) – [FINAL=Reject] 49.5] (Confidence: 100%, Support 4.73%) –[FINAL = long] 61.500] (Confidence: 60%, Support: 1.03%) Paper No.,# of Charits in Title: Important Features
21
Rule Mining in 2002 C5.0 – [# of Chars in Titles> 43] => Rejected (Conf. 0.669, Support: 303) – [Paper No. Regular (Conf. 0.833, Support :4)
22
Rule Mining in 2002 (Association) Rules –Rejected <= [Paper No.< 542.5] (Conf: 0.88, Suport :41) –Rejected 53.5 ] (Conf: 0.833, Support :29) –Regular <= [Country=Canada] & [Category=Text Mining] (Conf: 0.6, Support: 5) Paper No., Country, Category
23
Comparison with 2002 Important Features in 2003 –# of Authors, Paper No, # of Chars –Early 57 papers, Long Titles, 2 authors Important Features in 2002 –Paper No, # of Chars, Country, Category –Early 52 papers, Long Titles
24
Conclusions Do not submit a paper too fast ! –Reflection not only on the contents, but also on the titles needed Mining Text/Web/Semi-structured Data are very popular now. Statistics and Probability is a very stronger topic. Security and Privacy Issues become stronger. Visualization/Interaction are emerging in ICDM 2003: –Visualization/Human-Machine Interaction –Postprocessing of DM Results –Process-centric DM
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.