Analysis of scientific research Mario Sangiorgio Giordano Tamburrelli
The origin of this work Carlo Ghezzi ’s keynote: Reflections on 40+ years of software engineering research and beyond: an insider’s view Analysis based on papers Lack of tools to perform the analysis WHAT research topics WHO contributors HOW/WHEN trends
The origin of this work Time consuming Boring Requires an expert Lack of tools to perform the analysis
Automatic analysis Faster ScalableGeneral method One-click (After training) Feasible with data mining techniques BUT still not perfect (it is not semantic-based)
Steps of the analysis Identification of subtopics Interpretation of paper content Trend analysis (So far) CLUSTERING CLASSIFICATION CLUSTERING STATISTICS
Clustering
Hierarchical Expectation Maximization Algorithm The tool used is Crossbow Thanks to Gianluca Staffiero and Gabriele Valentini Abstracts of papers from both general and specific conferences and journals
The clustering process
Classification
Bayesian classifier Ad hoc tool using Mallet Analysis based on the abstract of the papers
Result evaluation Clustering was iterated until the results were good Classification performs well: high precision and recall values human expert agrees with the classifier
Outcomes Research analysis trends on main conferences and journals Tools to support research automatic bidding
Some trends found Data from IEEE Transactions on Software Engineering
Some trends found Data from IEEE Transactions on Software Engineering
Automatic bidding Build upon analysis methodologies and results
Bidding process Grouping the submissions by topic Creation of a profile for the reviewers Matching papers’ topic with reviewers’ interests CLASSIFICATION SELECTION
Grouping the submissions
Creation of the reviewer profile
Matching profiles and submissions
Result evaluation ICSM 2010
Reviewers’ profiles Carlo Ghezzi Profile: web-services formal methods middleware for distributed systems models software components education CONFIRMED Harald Gall Profile: software mining middleware for distributed systems models empirical studies Do you agree?
Comparison with actual bids Results apparently not so good: recall it is about 53% BUT The actual bid is not an oracle We are suggesting papers for the most relevant topics
Live Testing: ICSE 2011 Propose our bids to the reviewers Get a feedback on our suggestions, based on reviewer impressions
Future works Improvement of the system Ranking of the suggested papers Deeper statistical analysis Paper assignment based on Genetic Algorithms assignment