The Small World of Software Reverse Engineering Ahmed E. Hassan and Richard C. Holt SoftWare Architecture Group (SWAG) University Of Waterloo
Publications We study the evolution of a field through its publications Publications give a picture of –Collaboration: High degree in academia in contrast to industry –Emergence of topics: Hot topics and their effects on collaborations
DBLP DBLP: –DataBase systems and Logic Programming –Digital Bibliography and Library Project Tracks publications in several conferences and communities, such as: –WCRE –Reengineering and maintenance –Software engineering Records for each publication: –Title –Authors –Conference: name and year –Abstract Data available online as an XML file
Studying Collaboration Develop a social collaboration network using co- authorship data from DBLP: –A node exists for each author –An edge exists between two nodes (authors), if they co- authored a paper together –Size of node proportional to # of pubs –Edges have a weight proportional to # of co-authored papers Use a force based algorithm to layout the network
Co-Authorship Graph
1: E. Burd 2: E. Stroulia 3: M. Munro 4: M. Harman 5: E. Merlo 6: G. Canfora 7: K. Kontogiannis 8: R. Koschke 9: R. Holt Co-Authorship Graph
The Largest Component over Time
Small World Graphs Large graphs with small paths connecting its nodes Stanley Milgram studied them in the 60s: –Letters were given to people in Nebraska –Each person hands letter to someone they knew and whom they believe can eventually deliver the letter to a stockbroker in Pittsburgh –Average chain of people between both cities is 6 – six degrees of separation Collaboration networks which are small world graphs: –Good indicator of ease of communication of knowledge between members of a community
Small World Graph Characteristic Path Length (L): measures on average how many individuals an author has to go through to reach other authors –The average shortest path from any node in the graph to any other node in the largest component of the graph Clustering Coefficient (C): measures how collaborative on average are the co-authors of an author –For a node, C is the ratio of edges to neighbors of that node to the maximum number of edges between these neighboring nodes Watts and Strogatz give a formal definition of small world graphs using C, L, and random graphs: L>L random and C >> C random
WCRE is a small world graph! Clustering coefficient is 0.76 Characteristic path length is 4.3 Towards a Standard Schema for C/C++ By Ferenc, Sim, Holt, Koschke and Gyimothy 3.94 4.32 AuthorCentrality Canfora 2.76 Koschke 2.88 Merlo 2.94 De Lucia 3.1 Holt 3.2
Paper Titles Analysis for Emerging Terms
Bigger Small Worlds in SE We compare results against another two research communities: –Maintenance and Reengineering (MR): WCRE, IWPC, CSMR, ICSM –Software Engineering: MR + 17 Conferences DBLP data is not as complete for these conferences
The Largest Component over Time Slow constant growth then rapid growth once researchers know each other Soft Eng has slower growth –Less conferences in early days –Incomplete DBLP data –Wider scope MR and Soft Eng rapid growth since 1996 –Internet and ? MR and WCRE growing since late 90s: –Y2K?
Most Central Authors over Time
WCRE, MR, SE vs. other fields
Title Spectrograph Reverse Java Compon orient object software program experi system engin data design Joint work with J. Wu abstract
Conclusion A meta paper on publications and collaboration networks in WCRE, MR and SE communities Small World collaboration networks facilitate the exchange of ideas and results in a community Many of the techniques presented could be used to study the evolution of software systems (files or developers as nodes)
Generating Small World Graphs Using Random Re-wiring Large L Large C Small L Small C Small L Large C
Percentage of Papers in a year
# of new co-authors in a year