University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

A Prototype Implementation of a Framework for Organising Virtual Exhibitions over the Web Ali Elbekai, Nick Rossiter School of Computing, Engineering and.
Level 1 Recall Recall of a fact, information, or procedure. Level 2 Skill/Concept Use information or conceptual knowledge, two or more steps, etc. Level.
Chapter 11 user support. Issues –different types of support at different times –implementation and presentation both important –all need careful design.
Chapter 2. Slide 1 CULTURAL SUBJECT GATEWAYS CULTURAL SUBJECT GATEWAYS Subject Gateways  Started as links of lists  Continued as Web directories  Culminated.
Company confidential Prepared by HERE Transit Sr. Product Manager, HERE Transit Product Overview David Volpe.
1 Towards Automating Complex Associative Access to Multiple Bioinformatics Data Sources Ling Liu, Calton Pu David Buttler, Wei Han Henrique Paques, Dan.
Effective Coordination of Multiple Intelligent Agents for Command and Control The Robotics Institute Carnegie Mellon University PI: Katia Sycara
Information Retrieval in Practice
Technical Architectures
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
1 Lecture 13: Database Heterogeneity Debriefing Project Phase 2.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
1 Lecture 13: Database Heterogeneity. 2 Outline Database Integration Wrappers Mediators Integration Conflicts.
Extracting Semistructured Information from the Web J. Hammer, H. Garcia-Molina, J. Cho, R. Aranha, A. Crespo from Stanford University Presented by: Wei.
Overview of Search Engines
November 2011 At A Glance GREAT is a flexible & highly portable set of mission operations analysis tools that increases the operational value of ground.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
FHIRFarm – How to build a FHIR Server Farm (quickly)
Audumbar Chormale Advisor: Dr. Anupam Joshi M.S. Thesis Defense
Intelligent Tutoring Systems Traditional CAI Fully specified presentation text Canned questions and associated answers Lack the ability to adapt to students.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Web Explanations for Semantic Heterogeneity Discovery Pavel Shvaiko 2 nd European Semantic Web Conference (ESWC), 1 June 2005, Crete, Greece work in collaboration.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
1 A Web Specific Language for Content Management Systems Viðar Svansson, Roberto E. Lopez-Herrejon Computing Laboratory University of Oxford.
Master Thesis Defense Jan Fiedler 04/17/98
SWETO: Large-Scale Semantic Web Test-bed Ontology In Action Workshop (Banff Alberta, Canada June 21 st 2004) Boanerges Aleman-MezaBoanerges Aleman-Meza,
Designing and Developing WS B. Ramamurthy. Plans We will examine the resources available for development of JAX-WS based web services. We need an IDE,
La partecipazione del Gruppo Informatica di Lecce al Progetto EU-US GRID Earth Observation Systems High Energy Physics ASI ESA.
© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang 5-1 Chapter 5 Business Intelligence: Data.
Navigational Plans For Data Integration Marc Friedman Alon Levy Todd Millistein Presented By Avinash Ponnala Avinash Ponnala.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
1 Lessons from the TSIMMIS Project Yannis Papakonstantinou Department of Computer Science & Engineering University of California, San Diego.
1 06/ /21/2015 ECOOP 2000 Workshop QoS in DOSJohn Zinky BBN Technologies Quality Objects (QuO) Middleware Framework ECOOP 2000 Workshop QoS in DOS.
Using a LDAP Directory Server for Environmental Data Discovery Donald Denbo NOAA-PMEL/UW-JISAO Presented by Eugene Burger NOAA-PMEL/UW-JISAO
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
Rule-Based Programming for VORBs Bertram Ludaescher Arcot Rajasekar Data and Knowledge Systems San Diego Supercomputer Center U.C. San Diego.
Carey Probst Technical Director Technology Business Unit - OLAP Oracle Corporation.
The Forest and the Trees Julia Stoyanovich Candidacy Exam in Database Systems Fall 2005.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
A Context Model based on Ontological Languages: a Proposal for Information Visualization School of Informatics Castilla-La Mancha University Ramón Hervás.
Interoperability & Knowledge Sharing Advisor: Dr. Sudha Ram Dr. Jinsoo Park Kangsuk Kim (former MS Student) Yousub Hwang (Ph.D. Student)
Validating an Access Cost Model for Wide Area Applications Louiqa Raschid University of Maryland CoopIS 2001 Co-authors V. Zadorozhny, T. Zhan and L. Bright.
Facilitating Document Annotation using Content and Querying Value.
Scaling Heterogeneous Databases and Design of DISCO Anthony Tomasic Louiqa Raschid Patrick Valduriez Presented by: Nazia Khatir Texas A&M University.
1/22/08 RTR Project Presentation to TPTF RTR Project Michael Daskalantonakis & Brian Cook.
Information Integration BIRN supports integration across complex data sources – Can process wide variety of structured & semi-structured sources (DBMS,
12/6/2015B.Ramamurthy1 Java Database Connectivity B.Ramamurthy.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Issues in Ontology-based Information integration By Zhan Cui, Dean Jones and Paul O’Brien.
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
Aim Ability to automate the detection of financial inconsistency and irregularity Problem Need to create a unified and logically rigorous terminology.
2012 © Trivadis BASEL BERN LAUSANNE ZÜRICH DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. HAMBURG MÜNCHEN STUTTGART WIEN Welcome November 2012 Einführung in.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
An Ontological Approach to Financial Analysis and Monitoring.
Efficient Evaluation of Queries in a Mediator for WebSources Louiqa Raschid University of Maryland Joint work with Zadorozhny, Vidal, Urhan, Bright.
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
AFS/OSD Project R.Belloni, L.Giammarino, A.Maslennikov, G.Palumbo, H.Reuter, R.Toebbicke.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
ISC321 Database Systems I Chapter 2: Overview of Database Languages and Architectures Fall 2015 Dr. Abdullah Almutairi.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
XACML and the Cloud.
Joseph JaJa, Mike Smorul, and Sangchul Song
Enhancing Internet Search Engines to Achieve Concept-based Retrieval
Information Integration for Digital Libraries
Data Warehousing and Data Mining
The Globus Toolkit™: Information Services
AI Discovery Template IBM Cloud Architecture Center
Presentation transcript:

University of Maryland Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid

University of Maryland Wide-Area Data Access Problems n Scalability of Wrapper-Mediator Systems n Publishing and Discovery of Sources n Dissemination of Relevant Information Relevant Technologies n Flexible Architectures n Adaptive Systems n Metadata Management

University of Maryland The Big Picture

University of Maryland the little picture Predator O-R DBMS Remote wrapper interface Planner Scrambler MDT Wrapper interface Web sources

University of Maryland Querying Web Sources n Generating wrappers for Web accessible sources to provide an API for queries and structured answers. n Obtaining and representing source capability and content descriptions to use in query planning. n Estimating the response time for cost-based optimization

University of Maryland Web application wrapper toolkit n Define the capabilities of Web sources n A wrapper interface to publish source capability n A wrapper toolkit u Translation from query + bindings –› URL u Declarative language to specify Extractors Simple extractors HTML or XMLData –» structured object Complex extractors - customizable crawler utility for extraction of meta-information n Generator for JDBC compliant wrappers n Metadata and query and answer interface

University of Maryland Weather source

University of Maryland Results from the Weather source

University of Maryland

Query Planning for Web sources Objective: Generate safe optimal plans with possibly replicated sources n Multiple heterogeneous sources u Limited capability (bindings) u Possible replication of contents u Complete / Incomplete sources n Use meta-information to construct lattices n Generate safe plans with alternatives n Mediator algebra and rules for optimization

University of Maryland

Content and Capability Descriptions n Domain information n Capability descriptions: u I/O relationships: Time,Date Channel,Title,Category u Content: Date:CurrentYear Time:{0, …,23} Channel:CNW u Completeness information, Complete. Source S3 provides complete answer when Time and Date are bound and Channel=ppv and Category=Movies. F Explicitly provided by the source DBA. F Augmented by inference. F Augmented by learning based on query feedback

University of Maryland Sources in Lattices

University of Maryland Display pay-per-view movies shown on August 14th,1998 at 9:30am. Using Buckets (S1|S3) in AlternatePartition and (S5  S1) and (S5  S3)in SimilarPartition

University of Maryland Web Source Response Time Estimation Tool - MDT Problem: Difficulty in determining evaluation costs n Physical implementation details unknown n Load on network and source unknown Objective: Tool to estimate response time based on query feedback and estimate confidence. To be used in a combined cost-model and to choose between alternate sources. n MDT is a tool that estimates response time based on Day, Time, Quantity, etc.

University of Maryland Configuring and learning in the MDT MDT is configured for some hierarchy of dimensions n Calibration of each dimension u min/ max/ scale u Allowed deviation u Confidence window n Learning algorithm u Cell splitting algorithm u Value correction algorithm u Estimate response time and confidence

University of Maryland Correcting the confidence of estimated value

University of Maryland

Conclusions n Extend the Predator O-R DBMS with scalable mediator functionality n Current implementation status u Scrambling enabled optimizer u Mediator algebra and logical optimizer u Cost-based optimizer based on MDT estimation n Toolkit for generating wrappers for Web sources

University of Maryland Still to come … n Publishing source metadata n Discovering sources n Source selection using metadata n User profiles n Dissemination of relevant data