2014 Semantic-based Code and Documentation Search Engine Reshma Thumma Oct 10,2014 #GHC14 2014.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Natural Language Processing WEB SEARCH ENGINES August, 2002.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. 1 The Architecture of a Large-Scale Web Search and Query Engine.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Watson Supporting Next Generation Semantic Web Applications Mathieu d’Aquin, Claudio Baldassarre, Laurian Gridinoc, Marta Sabou, Sofia Angeletou, Enrico.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
IST NeOn-project.org The Semantic Web is growing… #SW Pages Lee, J., Goodwin, R. (2004) The Semantic.
Research Problems in Semantic Web Search Varish Mulwad ____________________________ 1.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Sergey Brin and Lawrence Page Distributed Systems - Presentation 6/3/2002 Nancy Alexopoulou.
Cloud based linked data platform for Structural Engineering Experiment Xiaohui Zhang
Internet Research Search Engines & Subject Directories.
Retrieving Location-based Data on the Web Andrei Tabarcea,
Project By: Anuj Shetye Vinay Boddula. Introduction Motivation HBase Our work Evaluation Related work. Future work and conclusion.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
What Can Do for You! Fabian Christ
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
Avalanche Internet Data Management System. Presentation plan 1. The problem to be solved 2. Description of the software needed 3. The solution 4. Avalanche.
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
The Anatomy of a Large-Scale Hypertextual Web Search Engine Presented By: Sibin G. Peter Instructor: Dr. R.M.Verma.
Trisolda Jakub Yaghob Charles University in Prague, Czech Rep.
PERSONALIZED SEARCH Ram Nithin Baalay. Personalized Search? Search Engine: A Vital Need Next level of Intelligent Information Retrieval. Retrieval of.
WHAT IS A SEARCH ENGINE A search engine is not a physical engine, instead its an electronic code or a software programme that searches and indexes millions.
Search Engine Interfaces search engine modus operandi.
Search Engine By Bhupendra Ratha, Lecturer School of Library and Information Science Devi Ahilya University, Indore
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Introduction to Nutch CSCI 572: Information Retrieval and Search Engines Summer 2010.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
KRUGLE BY: Roli Shrivastava. STORIES COLIN SAYS “ It was the first day at my new job and one my new colleagues told me that they were looking for a specific.
Gregor Gisler-Merz How to hit in google The anatomy of a modern web search engine.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
Search engines are the key to finding specific information on the vast expanse of the World Wide Web. Without sophisticated search engines, it would be.
McLean HIGHER COMPUTER NETWORKING Lesson 7 Search engines Description of search engine methods.
استاد : مهندس حسین پور ارائه دهنده : احسان جوانمرد Google Architecture.
May 30, 2016Department of Computer Sciences, UT Austin1 Using Bloom Filters to Refine Web Search Results Navendu Jain Mike Dahlin University of Texas at.
The Anatomy of a Large-Scale Hyper textual Web Search Engine S. Brin, L. Page Presenter :- Abhishek Taneja.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Search Tools and Search Engines Searching for Information and common found internet file types.
Mobile Search Engine Based on idea presented in paper Data mining for personal navigation, Hariharan, G., Fränti, P., Mehta S. (2002)
Scalable Keyword Search on Large RDF Data. Abstract Keyword search is a useful tool for exploring large RDF datasets. Existing techniques either rely.
Design a full-text search engine for a website based on Lucene
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
Semantic Web Project Pancreatic Cancer Search Facilitator.
The World Wide Web. What is the worldwide web? The content of the worldwide web is held on individual pages which are gathered together to form websites.
CMPE58H Project Progress Presentation QAPoint H.Tuğçe Özkaptan Gözde Kaymaz Serkan Kırbaş
Chapter 04 Semantic Web Application Architecture 23 November 2015 A Team 오혜성, 조형헌, 권윤, 신동준, 이인용.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
The Anatomy of a Large-Scale Hypertextual Web Search Engine S. Brin and L. Page, Computer Networks and ISDN Systems, Vol. 30, No. 1-7, pages , April.
General Architecture of Retrieval Systems 1Adrienn Skrop.
SEMINAR ON INTERNET SEARCHING PRESENTED BY:- AVIPSA PUROHIT REGD NO GUIDED BY:- Lect. ANANYA MISHRA.
Large Scale Semantic Data Integration and Analytics through Cloud: A Case Study in Bioinformatics Tat Thang Parallel and Distributed Computing Centre,
Crawling When the Google visit your website for the purpose of tracking, Google does this with help of machine, known as web crawler, spider, Google bot,
A presentation on ElasticSearch
Efficient Multi-User Indexing for Secure Keyword Search
Cloud based linked data platform for Structural Engineering Experiment
SEARCH ENGINES & WEB CRAWLER Akshay Ghadge Roll No: 107.
Map Reduce.
Prepared by Rao Umar Anwar For Detail information Visit my blog:
Search Engines & Subject Directories
The Anatomy of a Large-Scale Hypertextual Web Search Engine
Submitted By: Usha MIT-876-2K11 M.Tech(3rd Sem) Information Technology
Thanks to Bill Arms, Marti Hearst
Data Mining Chapter 6 Search Engines
Search Engines & Subject Directories
Search Engines & Subject Directories
Information Retrieval and Web Design
WSExpress: A QoS-Aware Search Engine for Web Services
Presentation transcript:

2014 Semantic-based Code and Documentation Search Engine Reshma Thumma Oct 10,2014 #GHC

The Web is big. Courtesy : Wiki,Wordpress

2014 Data On the Web  Data is locked up in small data islands  Other applications usually cannot access this data

2014 Semantic Web  Apply semantics to the data. −Change it into linked data.

2014 Project Idea  We tried to come up with a semantic data search for one such databases. −People in software often rely on these databases/ repositories.

2014 Code search engines  Open Source Software Repositories

2014 Why Source code Search?  Large amounts of source code is added consistently online.  Documentation of the source code is generally not attached to it.  Unorganized and distributed among different sources  Many versions of software systems, needed for similarity analysis

2014 Cont..  Enormous source code (Github alone has approximately 10 million projects).  Very large and complex.  Most of the query systems available online; use either keyword or meta Information based search.  Not easy to search and analyze.  How to understand code in growing open source repositories?

2014 Related Work  Searching mechanisms −Search by keyword −Search by tag −Search by meta information

2014 Comparing existing systems Search engineQuery typeUses structureSearch used for Google code Tag basedNoProjects SourceforgeTag basedNoProjects GithubKeyword and Meta information based search NoProjects, source code KodersStructure based search YesSource code CodaseStructure based search YesSource code KrugleKeyword and Meta information based search NoSource code SparsJKeyword basedNoclasses

2014 Proposed system  Semantic Code Search −Structure extracted from source code. −Semantic query generation using query generator. −Semantic query execution used for searching source code based on program structure  Documentation Search −Semantic query execution used for searching documentation on the dbpedia datasets.

2014 Architecture

2014 Document Search

2014 Dbpedia

2014 Dbpedia

2014 Dbpedia

2014 Documentation Search  Collect datasets from Dbpedia(structured data from Wikipedia).  SPARQL query generation on datasets using Apache Jena(Query engine)  Determine a resource.  Generate a query based on users choice of the subResource.

2014 Semantic Code Search  Source code structure extraction −Source code crawling from different open source software repositories −Extract structure of the source code data

2014 Structure extraction  Uses kabbalah model  Saving extracted source code structure as a RDF model

2014 Query Generator  Query generation based on the user’s choice  5 types of queries −Package −Class −Method −Constructor −Interface  Use SPARQL query processing on the RDF data

2014 Implementation  Web Crawler −Used crawler4j software to crawl the open source software repositories.  Parser −Used RdfCoder to parse the source and extract the source code. −Store the extracted source code as RDF.

2014 Implementation  SPARQL Endpoint −Used Apache Jena Query engine to query RDF −Used Apache Jena Query engine to query Dbpedia endpoint  Web Interface −Used Jersey REST web services.

2014 Web Interface

2014 Querying Package level

2014 Results

2014 Querying Method Level

2014 Results

2014 Documentation Search

2014 Results

2014 Evaluation  Used Hadoop core as test data for evaluating semantic code search.  Used Github for comparing systems.  We tested our system and Github with specific source code queries and found our system to be comprehensive and accurate.  Evaluated system using seven popular queries

2014 Evaluation NameQueryCode SearchGithub Q1Class Map extends Mapper Map classListed all the files which contained Mapper or Map Q2List all the classes in package org.apache.hadoop.conf All classes in org.apache.hadoop.conf Listed files containing org,org.apache,org.apach e.hadoop,conf Q3List all classes which extends Mapper All classes which extended Mapper Listed all files which contained Mapper Q4Class throwing UnknownHostException DFShost classNot Found

2014 Evaluation o Once more NameQueryCode searchGithub Q5Class implements Closable JavaSerilizatiorDeseri alizer class returned Not found Q6Show me all the public methods in the package org.apache.hadoop.conf listed all the methodsListed 1139 code results which had keyword public or org.apache.hadoop.conf Q7List classes containing method write Listed methodsNot found

2014 Performance Evaluation

2014 Conclusion and Limitations  The Semantic Code Search and Documentation Search is an unique and efficient way for searching source code with the help of source code semantics.  Limitations: −Supports only Java-based software systems.

2014 Future Work  Ranking search results based on structure matching  Scalability with the help of BigData  Release as an eclipse plugin.

2014 Got Feedback? Rate and Review the session using the GHC Mobile App To download visit