Presentation is loading. Please wait.

Presentation is loading. Please wait.

2014 Semantic-based Code and Documentation Search Engine Reshma Thumma Oct 10,2014 #GHC14 2014.

Similar presentations


Presentation on theme: "2014 Semantic-based Code and Documentation Search Engine Reshma Thumma Oct 10,2014 #GHC14 2014."— Presentation transcript:

1 2014 Semantic-based Code and Documentation Search Engine Reshma Thumma Oct 10,2014 #GHC14 2014

2 The Web is big. Courtesy : Wiki,Wordpress

3 2014 Data On the Web  Data is locked up in small data islands  Other applications usually cannot access this data

4 2014 Semantic Web  Apply semantics to the data. −Change it into linked data.

5 2014 Project Idea  We tried to come up with a semantic data search for one such databases. −People in software often rely on these databases/ repositories.

6 2014 Code search engines  Open Source Software Repositories

7 2014 Why Source code Search?  Large amounts of source code is added consistently online.  Documentation of the source code is generally not attached to it.  Unorganized and distributed among different sources  Many versions of software systems, needed for similarity analysis

8 2014 Cont..  Enormous source code (Github alone has approximately 10 million projects).  Very large and complex.  Most of the query systems available online; use either keyword or meta Information based search.  Not easy to search and analyze.  How to understand code in growing open source repositories?

9 2014 Related Work  Searching mechanisms −Search by keyword −Search by tag −Search by meta information

10 2014 Comparing existing systems Search engineQuery typeUses structureSearch used for Google code Tag basedNoProjects SourceforgeTag basedNoProjects GithubKeyword and Meta information based search NoProjects, source code KodersStructure based search YesSource code CodaseStructure based search YesSource code KrugleKeyword and Meta information based search NoSource code SparsJKeyword basedNoclasses

11 2014 Proposed system  Semantic Code Search −Structure extracted from source code. −Semantic query generation using query generator. −Semantic query execution used for searching source code based on program structure  Documentation Search −Semantic query execution used for searching documentation on the dbpedia datasets.

12 2014 Architecture

13 2014 Document Search

14 2014 Dbpedia

15 2014 Dbpedia

16 2014 Dbpedia

17 2014 Documentation Search  Collect datasets from Dbpedia(structured data from Wikipedia).  SPARQL query generation on datasets using Apache Jena(Query engine)  Determine a resource.  Generate a query based on users choice of the subResource.

18 2014 Semantic Code Search  Source code structure extraction −Source code crawling from different open source software repositories −Extract structure of the source code data

19 2014 Structure extraction  Uses kabbalah model  Saving extracted source code structure as a RDF model

20 2014 Query Generator  Query generation based on the user’s choice  5 types of queries −Package −Class −Method −Constructor −Interface  Use SPARQL query processing on the RDF data

21 2014 Implementation  Web Crawler −Used crawler4j software to crawl the open source software repositories.  Parser −Used RdfCoder to parse the source and extract the source code. −Store the extracted source code as RDF.

22 2014 Implementation  SPARQL Endpoint −Used Apache Jena Query engine to query RDF −Used Apache Jena Query engine to query Dbpedia endpoint  Web Interface −Used Jersey REST web services.

23 2014 Web Interface

24 2014 Querying Package level

25 2014 Results

26 2014 Querying Method Level

27 2014 Results

28 2014 Documentation Search

29 2014 Results

30 2014 Evaluation  Used Hadoop core as test data for evaluating semantic code search.  Used Github for comparing systems.  We tested our system and Github with specific source code queries and found our system to be comprehensive and accurate.  Evaluated system using seven popular queries

31 2014 Evaluation NameQueryCode SearchGithub Q1Class Map extends Mapper Map classListed all the files which contained Mapper or Map Q2List all the classes in package org.apache.hadoop.conf All classes in org.apache.hadoop.conf Listed files containing org,org.apache,org.apach e.hadoop,conf Q3List all classes which extends Mapper All classes which extended Mapper Listed all files which contained Mapper Q4Class throwing UnknownHostException DFShost classNot Found

32 2014 Evaluation o Once more NameQueryCode searchGithub Q5Class implements Closable JavaSerilizatiorDeseri alizer class returned Not found Q6Show me all the public methods in the package org.apache.hadoop.conf listed all the methodsListed 1139 code results which had keyword public or org.apache.hadoop.conf Q7List classes containing method write Listed methodsNot found

33 2014 Performance Evaluation

34 2014 Conclusion and Limitations  The Semantic Code Search and Documentation Search is an unique and efficient way for searching source code with the help of source code semantics.  Limitations: −Supports only Java-based software systems.

35 2014 Future Work  Ranking search results based on structure matching  Scalability with the help of BigData  Release as an eclipse plugin.

36 2014 Got Feedback? Rate and Review the session using the GHC Mobile App To download visit www.gracehopper.org


Download ppt "2014 Semantic-based Code and Documentation Search Engine Reshma Thumma Oct 10,2014 #GHC14 2014."

Similar presentations


Ads by Google