Scalable Keyword Search on Large RDF Data. Abstract Keyword search is a useful tool for exploring large RDF datasets. Existing techniques either rely.

Slides:



Advertisements
Similar presentations
Abstract There is significant need to improve existing techniques for clustering multivariate network traffic flow record and quickly infer underlying.
Advertisements

Abstract Shortest distance query is a fundamental operation in large-scale networks. Many existing methods in the literature take a landmark embedding.
On the Node Clone Detection inWireless Sensor Networks.
Optimizing Cloud Resources for Delivering IPTV Services Through Virtualization.
Toward a Statistical Framework for Source Anonymity in Sensor Networks.
Annotating Search Results from Web Databases. Abstract An increasing number of databases have become web accessible through HTML form-based search interfaces.
A Secure Protocol for Spontaneous Wireless Ad Hoc Networks Creation.
Personalized QoS-Aware Web Service Recommendation and Visualization.
Abstract Provable data possession (PDP) is a probabilistic proof technique for cloud service providers (CSPs) to prove the clients' data integrity without.
Crowdsourcing Predictors of Behavioral Outcomes. Abstract Generating models from large data sets—and deter¬mining which subsets of data to mine—is becoming.
Secure Encounter-based Mobile Social Networks: Requirements, Designs, and Tradeoffs.
Cross-Domain Privacy-Preserving Cooperative Firewall Optimization.
NICE :Network Intrusion Detection and Countermeasure Selection in Virtual Network Systems.
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Environment.
Fast Nearest Neighbor Search with Keywords. Abstract Conventional spatial queries, such as range search and nearest neighbor retrieval, involve only conditions.
Security Evaluation of Pattern Classifiers under Attack.
A Framework for Mining Signatures from Event Sequences and Its Applications in Healthcare Data.
BestPeer++: A Peer-to-Peer Based Large-Scale Data Processing Platform.
Improving Network I/O Virtualization for Cloud Computing.
Privacy Preserving Data Sharing With Anonymous ID Assignment
Mobile Relay Configuration in Data-Intensive Wireless Sensor Networks.
m-Privacy for Collaborative Data Publishing
PACK: Prediction-Based Cloud Bandwidth and Cost Reduction System
Tweet Analysis for Real-Time Event Detection and Earthquake Reporting System Development.
Optimal Client-Server Assignment for Internet Distributed Systems.
Protecting Sensitive Labels in Social Network Data Anonymization.
Identity-Based Secure Distributed Data Storage Schemes.
Enabling Dynamic Data and Indirect Mutual Trust for Cloud Computing Storage Systems.
Hiding in the Mobile Crowd: Location Privacy through Collaboration.
LARS*: An Efficient and Scalable Location-Aware Recommender System.
Anonymization of Centralized and Distributed Social Networks by Sequential Clustering.
Accuracy-Constrained Privacy-Preserving Access Control Mechanism for Relational Data.
Content Sharing over Smartphone-Based Delay- Tolerant Networks.
Abstract Link error and malicious packet dropping are two sources for packet losses in multi-hop wireless ad hoc network. In this paper, while observing.
A System for Denial-of- Service Attack Detection Based on Multivariate Correlation Analysis.
Modeling the Pairwise Key Predistribution Scheme in the Presence of Unreliable Links.
Scalable Distributed Service Integrity Attestation for Software-as-a-Service Clouds.
Anomaly Detection via Online Over-Sampling Principal Component Analysis.
A Generalized Flow-Based Method for Analysis of Implicit Relationships on Wikipedia.
Keyword Query Routing.
Document Clustering for Forensic Analysis: An Approach for Improving Computer Inspection.
Towards Online Shortest Path Computation. Abstract The online shortest path problem aims at computing the shortest path based on live traffic circumstances.
Abstract With the advent of cloud computing, data owners are motivated to outsource their complex data management systems from local sites to the commercial.
Facilitating Document Annotation using Content and Querying Value.
Privacy Preserving Back- Propagation Neural Network Learning Made Practical with Cloud Computing.
Clustering Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm.
Two tales of privacy in online social networks. Abstract Privacy is one of the friction points that emerges when communications get mediated in Online.
Participatory Privacy: Enabling Privacy in Participatory Sensing
Preventing Private Information Inference Attacks on Social Networks.
Video Dissemination over Hybrid Cellular and Ad Hoc Networks.
Abstract We propose two novel energy-aware routing algorithms for wireless ad hoc networks, called reliable minimum energy cost routing (RMECR) and reliable.
DCIM: Distributed Cache Invalidation Method for Maintaining Cache Consistency in Wireless Mobile Networks.
Supporting Privacy Protection in Personalized Web Search.
Twitsper: Tweeting Privately. Abstract Although online social networks provide some form of privacy controls to protect a user's shared content from other.
m-Privacy for Collaborative Data Publishing
A Scalable Two-Phase Top-Down Specialization Approach for Data Anonymization Using MapReduce on Cloud.
Multiparty Access Control for Online Social Networks : Model and Mechanisms.
A New Algorithm for Inferring User Search Goals with Feedback Sessions.
Data Mining with Big Data. Abstract Big Data concerns large-volume, complex, growing data sets with multiple, autonomous sources. With the fast development.
Harnessing the Cloud for Securely Outsourcing Large- Scale Systems of Linear Equations.
Securing Broker-Less Publish/Subscribe Systems Using Identity-Based Encryption.
Privacy-Enhanced Web Service Composition. Abstract Data as a Service (DaaS) builds on service-oriented technologies to enable fast access to data resources.
Privacy-Preserving and Content-Protecting Location Based Queries.
Mona: Secure Multi-Owner Data Sharing for Dynamic Groups in the Cloud.
Whole Test Suite Generation. Abstract Not all bugs lead to program crashes, and not always is there a formal specification to check the correctness of.
Distributed Processing of Probabilistic Top-k Queries in Wireless Sensor Networks.
Facilitating Document Annotation Using Content and Querying Value.
Fast Transmission to Remote Cooperative Groups: A New Key Management Paradigm.
Dynamic Query Forms for Database Queries. Abstract Modern scientific databases and web databases maintain large and heterogeneous data. These real-world.
Spatial Approximate String Search. Abstract This work deals with the approximate string search in large spatial databases. Specifically, we investigate.
Presentation transcript:

Scalable Keyword Search on Large RDF Data

Abstract Keyword search is a useful tool for exploring large RDF datasets. Existing techniques either rely on constructing a distance matrix for pruning the search space or building summaries from the RDF graphs for query processing. In this work, we show that existing techniques have serious limitations in dealing with realistic, large RDF data with tens of millions of triples. Furthermore, the existing summarization techniques may lead to incorrect/incomplete results. To address these issues, we propose an effective summarization algorithm to summarize the RDF data.

Abstract con… Given a keyword query, the summaries lend significant pruning powers to exploratory keyword search and result in much better efficiency compared to previous works. Unlike existing techniques, our search algorithms always return correct results. Besides, the summaries we built can be updated incrementally and efficiently. Experiments on both benchmark and large real RDF data sets show that our techniques are scalable and efficient.

Existing System The RDF (Resource Description Framework) is the de-facto standard for data representation on the Web. So, it is no surprise that we are inundated with large amounts of rapidly growing RDF data from disparate domains. For instance, the Linked Ope n Data (LOD) initiative integrates billions of entities from hundreds of sources. Just one of these sources, the DBpedia dataset, describes more than 3.64 million things using more than 1 billion RDF triples; and it contains numerous keywords, as shown in Figure 1. Keyword search is an important tool for exploring and searching large data corpuses whose structure is either un­known, or constantly changing. So, keyword search has al­ready been studied in the context of relational databases [3], [7], [15], [20], XML documents [8], [23], and more recently over graphs [14], [17] and RDF data [11], [24].

ArchitectureDiagram

System Specification HARDWARE REQUIREMENTS Processor : intel Pentium IV Ram : 512 MB Hard Disk : 80 GB HDD SOFTWARE REQUIREMENTS Operating System : windows XP / Windows 7 FrontEnd : Java BackEnd : MySQL 5

CONCLUSION We studied the problem of scalable keyword search on big RDF data and proposed a new summary-based solution: (i) we construct a concise summary at the type level from RDF data; (ii) during query evaluation, we leverage the summary to prune away a significant portion of RDF data from the search space, and formulate SPARQL queries for efficiently accessing data. Furthermore, the proposed summary can be incrementally updated as the data get updated. Experiments on both RDF benchmark and real RDF datasets showed that our solution is efficient, scalable, and portable across RDF engines. An interesting future direction is to leverage the summary for optimizing generic SPARQL queries on large RDF datasets.

THANK YOU