Versus: A Web Repository Daniel Gomes, João P. Campos, Mário J. Silva XLDB Research Group University of Lisbon [dcg, jcampos, Versus is.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
1 Ontolog OOR Use Case Review Todd Schneider 1 April 2010 (v 1.2)
1 Ontolog Open Ontology Repository Review 19 February 2009.
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
NGAS – The Next Generation Archive System Jens Knudstrup NGAS The Next Generation Archive System.
CLEARSPACE Digital Document Archiving system INTRODUCTION Digital Document Archiving is the process of capturing paper documents through scanning and.
Sidra: a Flexible Distributed Indexing and Ranking Architecture for Web Search Miguel Costa, Mário J. Silva Universidade de Lisboa, Faculdade de Ciências,
How did we get here? (CMIS v0.5) F2F, January 2009.
Edoclite and Managing Client Engagements What is Edoclite? How is it used at IU? Development Process?
Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU.
Albert Godfrind GeoSpatial and Multimedia Technologies Oracle Corporation Sophia Antipolis, France Oracle9 i XML Database.
Windows SharePoint Services: Advancements In Document, Content, And Data Storage Dustin Friesenhahn OFF409 Program Manager Microsoft Corporation.
Daniele Fusi.  shared core: C# in.NET 3.5 (LINQ to XML; original version used C# 2.0)  storage: XML (UTF-8 Unicode)  Word-processor import: MS Open.
 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. 1 The Architecture of a Large-Scale Web Search and Query Engine.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Dongseop Kim.  Purpose of HTML 5  Semantic Mark up  Web Form Function  Support Rich Web Application.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Kerim KORKMAZ A. Tolga KILINÇ H. Özgür BATUR Berkan KURTOĞLU.
HTML, XML, PDF Pros and Cons.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Version Enterprise Architect Redefines Modeling in 2006 An Agile and Scalable modeling solution Provides Full Lifecycle.
Apache Chemistry face-to-face meeting April 2010.
1 Large-scale Incremental Processing Using Distributed Transactions and Notifications Written By Daniel Peng and Frank Dabek Presented By Michael Over.
XML Overview. Chapter 8 © 2011 Pearson Education 2 Extensible Markup Language (XML) A text-based markup language (like HTML) A text-based markup language.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
Presenting Statistical Data Using XML Office for National Statistics, United Kingdom Rob Hawkins, Application Development.
AstroGrid AstroGrid increases scientific research possibilities by : enabling access to distributed astronomical data and information resources. providing.
Web Indexing and Searching By Florin Zidaru. Outline Web Indexing and Searching Overview Swish-e: overview and features Swish-e: set-up Swish-e: demo.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
LinkWare LinkWare is a web-enabled, open platform for generation and distribution of electronic technical documentation and e–catalogues. The LinkWare.
Intro. to XML & XML DB Bun Yue Professor, CS/CIS UHCL.
Service Computation 2010November 21-26, Lisbon.
Web Archiving and Access Mike Smorul Joseph JaJa ADAPT Group University of Maryland, College Park.
Large-scale Incremental Processing Using Distributed Transactions and Notifications Daniel Peng and Frank Dabek Google, Inc. OSDI Feb 2012 Presentation.
Syllabus Management System. The Problem There is need for a management system for syllabi that: Provides a simple and effective user interface Allows.
Department of computer science and engineering Two Layer Mapping from Database to RDF Martin Švihla Research Group Webing Department.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
ABSTRACT The JDBC (Java Database Connectivity) API is the industry standard for database- independent connectivity between the Java programming language.
Uwe SchindlerGES 2007 – May 2-4, 2007 Data Information Service based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler 1, Benny Bräuer.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
1 Earth System Modeling Framework Documenting and comparing models using Earth System Curator Sylvia Murphy: Julien Chastang:
May 2003National Coastal Data Development Center Brief Introduction Two components Data Exchange Infrastructure (DEI) Spatial Data Model (SDM) Together,
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
Lundi 7 décembre 2015 Lavoisier. Motivations data sources provided by many partners –heterogeneity of used technologies objectives –reduce complexity.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
INRIA - Progress report DBGlobe meeting - Athens November 29 th, 2002.
Content Challenges for Open Government Dale Waldt Sr. Analyst / Consultant
CopperCore An Open Source Learning Design Engine Hubert Vogten, 20 September 2004.
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
XML Tools (Chapter 4 of XML Book). What tools are needed for a complete XML application? n Fundamental components n Web infrasructure n XML development.
Selected Semantic Web UMBC CoBrA – Context Broker Architecture  Using OWL to define ontologies for context modeling and reasoning  Taking.
Reliable Web Service Execution and Deployment in Dynamic Environments * Markus Keidl, Stefan Seltzsam, and Alfons Kemper Universität Passau Passau,
Managing live digital content with DuraSpace services Bill Branan PASIG Spring 2015.
Physical Oceanography Distributed Active Archive Center THUANG June 9-13, 20089th GHRSST-PP Science Team Meeting GHRSST GDAC and EOSDIS PO.DAAC.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
I Copyright © 2004, Oracle. All rights reserved. Introduction.
Using E-Business Suite Attachments
Open Source distributed document DB for an enterprise
Document & Web Content Management
GSAF Grid Storage Access Framework
GSAF Grid Storage Access Framework
Topics Covered in COSC 6340 Data models (ER, Relational, XML (short))
Dr. Bhavani Thuraisingham The University of Texas at Dallas
Metadata The metadata contains
Presentation transcript:

Versus: A Web Repository Daniel Gomes, João P. Campos, Mário J. Silva XLDB Research Group University of Lisbon [dcg, jcampos, Versus is a repository that enables storage, management and access to Web data. Versus enables quick development of Web applications that need to process large amounts of information in a short period of time, using a simple and extensible JAVA API

1st Prototype Features Generic data model –enables storage and management of multi- purpose Web data Time support –enables past views of objects through a versioning system Scalability and distributed operation –enables applications to increase processing capabilities, through the parallel processing of data partitions Extensible meta-data –Enables extension of meta-data information stored in Versus Extensible code –to fulfill specific requirements Web processing applications, with minimum code generation. Well defined API –provides access methods according to the needs of Web applications, supporting different levels of access and performance.

Versus is especially useful for applications that can profit from parallel processing of Web data.

Building a Web Application with Versus 1. Define the application’s unique identifier; 2. Define the partitioning function; 3. Define a conflict resolution policy; 4. Define timeouts for the processing of working units; 5. Write application-specific code for data processing using Versus API.

Data Model An object has a reference to a Web document; A version is a snapshot of an object at a given instant; A layer represents a time unit in the repository; An objectKey is a property associated to an object and therefore to every version of it. A versionProperty is a property associated to a certain version of an object; Xmeta-Data is a container for XML data associated with each version.

Operational Model Archive Workspace: stores data permanently, keeping version history of objects; Group Workspace: maintains a shared view of the data common to all application threads; Private Workspace: provides local storage and fast access to one application thread. Working Units: container for a partition of data, which can be independently processed; Check In/Check Out: operations that move the working units from one workspace to another.

Current & Future Work New Versus –Native XML meta-data –Improve performance –P2P storage server for massive data management and scalable performance Versus applications –TUMBA! (our Web search engine for the Portuguese Web at Web pages repository with history Tarântula V.2 (Web crawler) Tumba’s Index Generator and PageRanker –Performance measurement and analysis –Validation of Versus API. XQuery + XSLT engine tightly coupled with Versus Query engine with access methods and adaptive algorithms

Research on data management. We study and develop systems for data analysis, information integration and user access to large quantities of complex data from heterogeneous sources. Current main project is TUMBA!, a new search engine for the Portuguese Web. Research on: –Integration with Semantic Web –Location-awareness –Web Archiving