Mining Logical Clones in Software: Revealing High-Level Business & Programming Rules Wenyi Qian 1, Xin Peng 1, Zhenchang Xing 2, Stan Jarzabek 3, Wenyun Zhao 1 1 Fudan University, China 2 Nanyang Technological University, Singapore 3 National University of Singapore, Singapore
Logical Clones may not well documented revealing high-level rules
Logical Clones Logical clones consisting of: –Similar methods –Similar code fragments –Similar entity classes –Persistent data projects
Logical Clones Today’s techniques on clone/similarity detection: –Simple clone (text, token, AST…) –Structural clone (simple clone) –Similar design structures (similarity metrics, machine learning) They are not enough to detect high-level clones: –lack of high-level information –need of pre-defined templates, such as certain design pattern
Approach Overview input abstraction output
Program Model Methods & functional clusters Entity classes Code clones Persistent data objects
Program Model Methods & functional clusters –Semantic clustering
Program Model Entity classes –Encapsulating information with getter/setter
Program Model Code clones –Simple clones in different methods
Program Model Persistent data objects –Data tables in DB or data entries in files
Mining Process PosScreen processPay PosPayCheck PosScreen processPay PosPayGiftCard PosClearPayment PosScreen
Mining Process PosScreen processPay PosPayCheck PosScreen processPay PosPayGiftCard PosClearPayment PosScreen
Mining Process
PosScreen processPay PosPayCheck PosScreen processPay PosPayGiftCard PosClearPayment PosScreen
Mining Process
PosScreen processPay PosPayCheck PosScreen processPay PosPayGiftCard PosClearPayment PosScreen
Mining Process
Tool: MiLico
Case Study Project: Opentaps –14,351 classes & interfaces –253,743 methods 1690 logical clones mined –at least 3 nodes & 2 instances
Case Study
Categories of Logical Clones Categories of Mined Logical Clones (manual work) –Programming Convention (37%) –Design Structure (24%) –Business Task (23%) –Business Process (16%)
Categories of Logical Clones Programming Convention –Similar ways to implement similar functions
Categories of Logical Clones Design Structure –Similar interaction structures
Categories of Logical Clones Business Task –Similar ways to implement similar business task
Categories of Logical Clones Business Process –Similar business process or sub-process
Human Study 5 senior graduate students, 2 questions: Helpful for Programming understanding? Helpful for Reuse/Evolution?
Human Study
5 senior graduate students, 2 questions: Helpful for Programming understanding? YES Helpful for Reuse/Evolution? YES
Discussion Helpful for reuse, without knowledge of code details Developers with good domain knowledge will use logical clones better Making MiLiCo integrated with IDEs will make logical clones more useful
Conclusion The concept of logical clones The approach for mining logical clones The tool: MiLoCo A case study, showing that logical clones are helpful in software understanding, reuse and maintainance
Thanks for your attention!