Download presentation
Presentation is loading. Please wait.
Published byGeoffrey Hunt Modified over 8 years ago
1
2014 Semantic-based Code and Documentation Search Engine Reshma Thumma Oct 10,2014 #GHC14 2014
2
The Web is big. Courtesy : Wiki,Wordpress
3
2014 Data On the Web Data is locked up in small data islands Other applications usually cannot access this data
4
2014 Semantic Web Apply semantics to the data. −Change it into linked data.
5
2014 Project Idea We tried to come up with a semantic data search for one such databases. −People in software often rely on these databases/ repositories.
6
2014 Code search engines Open Source Software Repositories
7
2014 Why Source code Search? Large amounts of source code is added consistently online. Documentation of the source code is generally not attached to it. Unorganized and distributed among different sources Many versions of software systems, needed for similarity analysis
8
2014 Cont.. Enormous source code (Github alone has approximately 10 million projects). Very large and complex. Most of the query systems available online; use either keyword or meta Information based search. Not easy to search and analyze. How to understand code in growing open source repositories?
9
2014 Related Work Searching mechanisms −Search by keyword −Search by tag −Search by meta information
10
2014 Comparing existing systems Search engineQuery typeUses structureSearch used for Google code Tag basedNoProjects SourceforgeTag basedNoProjects GithubKeyword and Meta information based search NoProjects, source code KodersStructure based search YesSource code CodaseStructure based search YesSource code KrugleKeyword and Meta information based search NoSource code SparsJKeyword basedNoclasses
11
2014 Proposed system Semantic Code Search −Structure extracted from source code. −Semantic query generation using query generator. −Semantic query execution used for searching source code based on program structure Documentation Search −Semantic query execution used for searching documentation on the dbpedia datasets.
12
2014 Architecture
13
2014 Document Search
14
2014 Dbpedia
15
2014 Dbpedia
16
2014 Dbpedia
17
2014 Documentation Search Collect datasets from Dbpedia(structured data from Wikipedia). SPARQL query generation on datasets using Apache Jena(Query engine) Determine a resource. Generate a query based on users choice of the subResource.
18
2014 Semantic Code Search Source code structure extraction −Source code crawling from different open source software repositories −Extract structure of the source code data
19
2014 Structure extraction Uses kabbalah model Saving extracted source code structure as a RDF model
20
2014 Query Generator Query generation based on the user’s choice 5 types of queries −Package −Class −Method −Constructor −Interface Use SPARQL query processing on the RDF data
21
2014 Implementation Web Crawler −Used crawler4j software to crawl the open source software repositories. Parser −Used RdfCoder to parse the source and extract the source code. −Store the extracted source code as RDF.
22
2014 Implementation SPARQL Endpoint −Used Apache Jena Query engine to query RDF −Used Apache Jena Query engine to query Dbpedia endpoint Web Interface −Used Jersey REST web services.
23
2014 Web Interface
24
2014 Querying Package level
25
2014 Results
26
2014 Querying Method Level
27
2014 Results
28
2014 Documentation Search
29
2014 Results
30
2014 Evaluation Used Hadoop core as test data for evaluating semantic code search. Used Github for comparing systems. We tested our system and Github with specific source code queries and found our system to be comprehensive and accurate. Evaluated system using seven popular queries
31
2014 Evaluation NameQueryCode SearchGithub Q1Class Map extends Mapper Map classListed all the files which contained Mapper or Map Q2List all the classes in package org.apache.hadoop.conf All classes in org.apache.hadoop.conf Listed files containing org,org.apache,org.apach e.hadoop,conf Q3List all classes which extends Mapper All classes which extended Mapper Listed all files which contained Mapper Q4Class throwing UnknownHostException DFShost classNot Found
32
2014 Evaluation o Once more NameQueryCode searchGithub Q5Class implements Closable JavaSerilizatiorDeseri alizer class returned Not found Q6Show me all the public methods in the package org.apache.hadoop.conf listed all the methodsListed 1139 code results which had keyword public or org.apache.hadoop.conf Q7List classes containing method write Listed methodsNot found
33
2014 Performance Evaluation
34
2014 Conclusion and Limitations The Semantic Code Search and Documentation Search is an unique and efficient way for searching source code with the help of source code semantics. Limitations: −Supports only Java-based software systems.
35
2014 Future Work Ranking search results based on structure matching Scalability with the help of BigData Release as an eclipse plugin.
36
2014 Got Feedback? Rate and Review the session using the GHC Mobile App To download visit www.gracehopper.org
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.