Annotating Search Results from Web Databases. Abstract An increasing number of databases have become web accessible through HTML form-based search interfaces.

Slides:



Advertisements
Similar presentations
Abstract Shortest distance query is a fundamental operation in large-scale networks. Many existing methods in the literature take a landmark embedding.
Advertisements

Optimizing Cloud Resources for Delivering IPTV Services Through Virtualization.
Toward a Statistical Framework for Source Anonymity in Sensor Networks.
Abstract Cloud data center management is a key problem due to the numerous and heterogeneous strategies that can be applied, ranging from the VM placement.
A Secure Protocol for Spontaneous Wireless Ad Hoc Networks Creation.
Back-Pressure-Based Packet-by-Packet Adaptive Routing in Communication Networks.
Personalized QoS-Aware Web Service Recommendation and Visualization.
Abstract Provable data possession (PDP) is a probabilistic proof technique for cloud service providers (CSPs) to prove the clients' data integrity without.
WARNINGBIRD: A Near Real-time Detection System for Suspicious URLs in Twitter Stream.
Discovering Emerging Topics in Social Streams via Link Anomaly Detection.
Crowdsourcing Predictors of Behavioral Outcomes. Abstract Generating models from large data sets—and deter¬mining which subsets of data to mine—is becoming.
Secure Encounter-based Mobile Social Networks: Requirements, Designs, and Tradeoffs.
Cross-Domain Privacy-Preserving Cooperative Firewall Optimization.
A Survey of Mobile Cloud Computing Application Models
NICE :Network Intrusion Detection and Countermeasure Selection in Virtual Network Systems.
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Environment.
Fast Nearest Neighbor Search with Keywords. Abstract Conventional spatial queries, such as range search and nearest neighbor retrieval, involve only conditions.
Understanding the External Links of Video Sharing Sites: Measurement and Analysis.
Security Evaluation of Pattern Classifiers under Attack.
A Framework for Mining Signatures from Event Sequences and Its Applications in Healthcare Data.
BestPeer++: A Peer-to-Peer Based Large-Scale Data Processing Platform.
Improving Network I/O Virtualization for Cloud Computing.
Privacy Preserving Data Sharing With Anonymous ID Assignment
Mobile Relay Configuration in Data-Intensive Wireless Sensor Networks.
m-Privacy for Collaborative Data Publishing
Tweet Analysis for Real-Time Event Detection and Earthquake Reporting System Development.
A Fast Clustering-Based Feature Subset Selection Algorithm for High- Dimensional Data.
Optimal Client-Server Assignment for Internet Distributed Systems.
Protecting Sensitive Labels in Social Network Data Anonymization.
Enabling Dynamic Data and Indirect Mutual Trust for Cloud Computing Storage Systems.
LARS*: An Efficient and Scalable Location-Aware Recommender System.
Cooperative Caching for Efficient Data Access in Disruption Tolerant Networks.
Anonymization of Centralized and Distributed Social Networks by Sequential Clustering.
Accuracy-Constrained Privacy-Preserving Access Control Mechanism for Relational Data.
Content Sharing over Smartphone-Based Delay- Tolerant Networks.
Abstract Link error and malicious packet dropping are two sources for packet losses in multi-hop wireless ad hoc network. In this paper, while observing.
A System for Denial-of- Service Attack Detection Based on Multivariate Correlation Analysis.
Modeling the Pairwise Key Predistribution Scheme in the Presence of Unreliable Links.
Scalable Distributed Service Integrity Attestation for Software-as-a-Service Clouds.
Anomaly Detection via Online Over-Sampling Principal Component Analysis.
A Method for Mining Infrequent Causal Associations and Its Application in Finding Adverse Drug Reaction Signal Pairs.
A Generalized Flow-Based Method for Analysis of Implicit Relationships on Wikipedia.
Keyword Query Routing.
Document Clustering for Forensic Analysis: An Approach for Improving Computer Inspection.
Towards Online Shortest Path Computation. Abstract The online shortest path problem aims at computing the shortest path based on live traffic circumstances.
Abstract With the advent of cloud computing, data owners are motivated to outsource their complex data management systems from local sites to the commercial.
Facilitating Document Annotation using Content and Querying Value.
Clustering Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm.
Participatory Privacy: Enabling Privacy in Participatory Sensing
Preventing Private Information Inference Attacks on Social Networks.
Scalable Keyword Search on Large RDF Data. Abstract Keyword search is a useful tool for exploring large RDF datasets. Existing techniques either rely.
DCIM: Distributed Cache Invalidation Method for Maintaining Cache Consistency in Wireless Mobile Networks.
Supporting Privacy Protection in Personalized Web Search.
Twitsper: Tweeting Privately. Abstract Although online social networks provide some form of privacy controls to protect a user's shared content from other.
m-Privacy for Collaborative Data Publishing
Attribute-Based Encryption With Verifiable Outsourced Decryption.
A Scalable Two-Phase Top-Down Specialization Approach for Data Anonymization Using MapReduce on Cloud.
Multiparty Access Control for Online Social Networks : Model and Mechanisms.
A New Algorithm for Inferring User Search Goals with Feedback Sessions.
Data Mining with Big Data. Abstract Big Data concerns large-volume, complex, growing data sets with multiple, autonomous sources. With the fast development.
Privacy-Enhanced Web Service Composition. Abstract Data as a Service (DaaS) builds on service-oriented technologies to enable fast access to data resources.
Privacy-Preserving and Content-Protecting Location Based Queries.
Mona: Secure Multi-Owner Data Sharing for Dynamic Groups in the Cloud.
Whole Test Suite Generation. Abstract Not all bugs lead to program crashes, and not always is there a formal specification to check the correctness of.
Load Rebalancing for Distributed File Systems in Clouds.
Facilitating Document Annotation Using Content and Querying Value.
Fast Transmission to Remote Cooperative Groups: A New Key Management Paradigm.
Dynamic Query Forms for Database Queries. Abstract Modern scientific databases and web databases maintain large and heterogeneous data. These real-world.
Spatial Approximate String Search. Abstract This work deals with the approximate string search in large spatial databases. Specifically, we investigate.
Presentation transcript:

Annotating Search Results from Web Databases

Abstract An increasing number of databases have become web accessible through HTML form-based search interfaces. The data units returned from the underlying database are usually encoded into the result pages dynamically for human browsing. For the encoded data units to be machine processable, which is essential for many applications such as deep web data collection and Internet comparison shopping, they need to be extracted out and assigned meaningful labels. In this paper, we present an automatic annotation approach that first aligns the data units on a result page into different groups such that the data in the same group have the same semantic. Then, for each group we annotate it from different aspects and aggregate the different annotations to predict a final annotation label for it. An annotation wrapper for the search site is automatically constructed and can be used to annotate new result pages from the same web database. Our experiments indicate that the proposed approach is highly effective.

Existing system Alarge portion of the deep web is database based, i.e., for many search engines, data encoded in the returned result pages come from the underlying structured data¬bases. Such type of search engines is often referred as Web databases (WDB). A typical result page returned from a WDB has multiple search result records (SRRs). Each SRR contains multiple data units each of which describes one aspect of a real-world entity. Fig. 1 shows three SRRs on a result page from a book WDB. Each SRR represents one book with several data units, e.g., the first book record in Fig. 1 has data units “Talking Back to the Machine: Computers and Human Aspiration," “Peter J. Denning," etc.

Architecture Diagram

System Specification HARDWARE REQUIREMENTS Processor : Intel Pentium IV Ram : 512 MB Hard Disk : 80 GB HDD SOFTWARE REQUIREMENTS Operating System : Windows XP / Windows 7 FrontEnd : Java BackEnd : MySQL 5

C ONCLUSION In this paper, we studied the data annotation problem and proposed a multiannotator approach to automatically con¬structing an annotation wrapper for annotating the search result records retrieved from any given web database. This approach consists of six basic annotators and a probabilistic method to combine the basic annotators. Each of these annotators exploits one type of features for annotation and our experimental results show that each of the annotators is useful and they together are capable of generating high- quality annotation. A special feature of our method is that, when annotating the results retrieved from a web database, it utilizes both the LIS of the web database and the IIS of multiple web databases in the same domain. We also explained how the use of the IIS can help alleviate the local interface schema inadequacy problem and the inconsistent label problem.

THANK YOU