Presentation is loading. Please wait.

Presentation is loading. Please wait.

Finding File Clones in FreeBSD Ports Collection

Similar presentations


Presentation on theme: "Finding File Clones in FreeBSD Ports Collection"— Presentation transcript:

1 Finding File Clones in FreeBSD Ports Collection
Yusuke Sasaki Tetsuo Yamamoto Yasuhiro Hayase Katsuro Inoue

2 File Clones Research about file-clones is scarce
Two or more files with the same content Comments and code indentation ignored Inside a project or between different projects Research about file-clones is scarce Get new knowledge about file-clones Project A Project B int main() { printf(“Hello msr!”); return 0; }

3 FCFinder Input Output Faster than other tools Detection
.c and .h files Output File-clone sets Faster than other tools Detection Tokenization MD5 Hash Calculation Exact Matching Tool Speed CCFinder 1.4M files / 960 hours x1 1PC D-CCFinder 1.4M files / 51 hours x19 80PCs FCFinder 1.4M files / hours x55

4 These values follow the power law
Experiment Target Only .c and .h files in the FreeBSD Ports Collection ~1.4M files ~12 GB 17.16 hours We measured: File size Number of files in each project Size of each file-clone set Number of file-clones in a project These values follow the power law

5 File-clone Set Size file clone set size 5 10 50 100 Left:used in PHP5
Right:used in PHP4 used in both of PHP4 and 5 D E L:650 sets R:500 sets 419 sets 120 file clones 5 10 50 100 L:61 file clones R:59 file clones file clone set size R*2 =

6 File-clones per Project
Right:PHP4 modules Center:projects related bin-utils Left:PHP5 modules G 5 10 50 100 500 1K K 10K number of file clone sets R*2 =

7 File-clones Between Projects (1/3)
* Nodes show the projects * Edges between projects show the number of file clones between two projects Ex) gcc41 and gfortran shares 7691 file clones

8 File-clones Between Projects (2/3)
* Nodes show the projects * Edges between projects show the number of file clones between two projects

9 File-clones Between Projects (3/3)
* Nodes show the projects * Edges between projects show the number of file clones between two projects

10 Conclusions & Future Work
Measured several features of the FreeBSD Ports collection. Found that the measured features follow the power law Future Work Projects logical coupling investigation


Download ppt "Finding File Clones in FreeBSD Ports Collection"

Similar presentations


Ads by Google