Speaker: Liu Shuchang Osaka University

Slides:



Advertisements
Similar presentations
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Advertisements

Garfield AP Computer Science
Counting Distinct Objects over Sliding Windows Presented by: Muhammad Aamir Cheema Joint work with Wenjie Zhang, Ying Zhang and Xuemin Lin University of.
Analysis of Algorithms CS Data Structures Section 2.6.
Latent Semantic Indexing (mapping onto a smaller space of latent concepts) Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading 18.
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extraction of.
Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort
Bitmap Index Buddhika Madduma 22/03/2010 Web and Document Databases - ACS-7102.
Chapter Physical Database Design Methodology Software & Hardware Mapping Logical Design to DBMS Physical Implementation Security Implementation Monitoring.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions.
DAST 2005 Tirgul 12 (and more) sample questions. DAST 2005 Q.We’ve seen that solving the shortest paths problem requires O(VE) time using the Belman-Ford.
Concept of Basic Time Complexity Problem size (Input size) Time complexity analysis.
Handwritten Thai Character Recognition Using Fourier Descriptors and Robust C-Prototype Olarik Surinta Supot Nitsuwat.
1 Section 2.3 Complexity of Algorithms. 2 Computational Complexity Measure of algorithm efficiency in terms of: –Time: how long it takes computer to solve.
Data Structures and Algorithms Semester Project – Fall 2010 Faizan Kazi Comparison of Binary Search Tree and custom Hash Tree data structures.
New Protocols for Remote File Synchronization Based on Erasure Codes Utku Irmak Svilen Mihaylov Torsten Suel Polytechnic University.
Busby, Dodge, Fleming, and Negrusa. Backtracking Algorithm Is used to solve problems for which a sequence of objects is to be selected from a set such.
Comp 249 Programming Methodology Chapter 15 Linked Data Structure - Part B Dr. Aiman Hanna Department of Computer Science & Software Engineering Concordia.
ITEC 2620A Introduction to Data Structures
Approximate Encoding for Direct Access and Query Processing over Compressed Bitmaps Tan Apaydin – The Ohio State University Guadalupe Canahuate – The Ohio.
Lecture 1: Introduction and Overview CSCI 700 – Algorithms 1.
CS212: DATA STRUCTURES Lecture 1: Introduction. What is this course is about ?  Data structures : conceptual and concrete ways to organize data for efficient.
Review for Final Andy Wang Data Structures, Algorithms, and Generic Programming.
Simple Iterative Sorting Sorting as a means to study data structures and algorithms Historical notes Swapping records Swapping pointers to records Description,
Hashing Hashing is another method for sorting and searching data.
Variant Indexes. Specialized Indexes? Data warehouses are large databases with data integrated from many independent sources. Queries are often complex.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features 王荣 14S
Software. Introduction n A computer can’t do anything without a program of instructions. n A program is a set of instructions a computer carries out.
By Manish Shrotriya CSE MS Software Estimation Effort Estimation: how much effort is required to complete an activity. (How to define efforts: Line.
Ocasta: Clustering Configuration Settings for Error Recovery Zhen Huang, David Lie Department of Electrical and Computer Engineering University of Toronto.
Updating SF-Tree Speaker: Ho Wai Shing.
Minimum Spanning Tree Chapter 13.6.
Introduction Algorithms Order Analysis of Algorithm
Lesson Objectives Aims Understand the following: The big O notation.
Genome alignment Usman Roshan.
Analysis of algorithms
Datastructure.
COMP 103 SORTING Lindsay Groves 2016-T2 Lecture 26
Sorting.
Hash Tables Part II: Using Buckets
CSE 421: Introduction to Algorithms
Images in Binary.
Finding Heuristics Using Abstraction
Searching.
Physical Database Design
Time Complexity Problem size (Input size)
Winter 2018 CISC101 12/1/2018 CISC101 Reminders
Predicting Traffic Dmitriy Bespalov.
Ideal Hash Trees (Bagwell, 2001)
Lecture 13 Algorithm Analysis
Lecture 13 Algorithm Analysis
Accounting for Inventory
Chapter 2.
Hierarchical and Ensemble Clustering
Lecture 2- Query Processing (continued)
Lecture 13 Algorithm Analysis
ITEC 2620M Introduction to Data Structures
ITEC 2620M Introduction to Data Structures
Lecture 13 Algorithm Analysis
Introduction to programming
Accounting for Inventory
CS 584 Project Write up Poster session for final Due on day of final
Weighted Graphs & Shortest Paths
A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.
Mapping Data Production Processes to the GSBPM
Minwise Hashing and Efficient Search
Analysis of algorithms
Clustering The process of grouping samples so that the samples are similar within each group.
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Presentation transcript:

Speaker: Liu Shuchang Osaka University Extraction of Evolution History from Software Source Code Using Linear Counting Speaker: Liu Shuchang Osaka University 1

Background daily software development copy existing code product variant copy edit software evolution 2

Evolution History Example only source code ❌ 3

Introduction Evolution History Recovery product variants using only source code Evolution Tree vertex: variant edge: derived relation (most similar pair) key: product similarity Previous Study diff based (file-to-file similarity) time needed (worst case: 2 days) Linear Counting Algorithm estimating instead of calculating 4

Linear Counting Algorithm Cardinality: 11 Zero: 2 Bitmap Size: 8 -8 × ln(2/8) = 11.0903 An example of the Linear Counting Algorithm 5

Estimate Product Similarity Multiset A Bit Map A Bit Map A∩B hash function bitwise operator Initialization Multiset B Bit Map B Bit Map A∪B hash function Similarity: Jaccard Index |A∩B| ——— |A∪B| LC(A∩B) continued division LC(A∪B) 6

Process Flow Variant A (Source Code) Initial Multiset A Initialization 1. n-gram modeling Jaccard Index 2. each line of the code |A∩B| ——— |A∪B| Linear Counting Algorithm Variant B (Source Code) Initial Multiset B Initialization (A, B), (A, C), (A, D), … Evolution Tree the most similar pair Prim’s Algorithm 7

Research Data A description of datasets we dealt with 8

Final Result of dataset5 The Evolution Tree we extracted (the Best Configuration) Existing actual evolution history 9

Analysis on Bitmap Size Part of the experiment results of dataset5 10

Best Configuration Main Factors N-gram Modeling no (each line of code) Bitmap Size 128,000,000 bits Hashing Function MurmurHash3 Results Proper Edges 86.5% (on average) Time 10s to 5mins 11

Contributions and Future Work extract an ideal Evolution Tree efficiently influence of various factors best configuration faster and showed better accuracy Future Work larger datasets other programming language solve the remaining problems 12