Download presentation
Presentation is loading. Please wait.
Published byLeon Bradley Modified over 8 years ago
1
MEDIATORS
2
Mediation Typical file-sharing systems have a single global schema for describing their data P2P networks have to consider heterogeneous schemas in the network and have to rely on local transformation mechanisms and rules. Mediation TRIPLE. As has been pointed out in [8], defining views appears to be the right means for mediation, especially in case of schemas or ontologies modeled with the help of description logics. Since on the Semantic Web current approaches for schema/ontology languages build on description logics, e.g., DAML+OIL and its W3C successor OWL [11, 34], a powerful rule language with the capability to define views seems to be a promising candidate for mediation. Especially relevant in our Edutella TRIPLE peer is its capability to define parameterized views which add the flexibility to define multi- step mappings (by nesting/sequencing such The expansion of these abstract query plans can be based on different strategies, related to the quality of clustering in the P2P network. If the data are clustered well with respect to the queries, it is most efficient to push joins in the query as near as possible to the data sources, and then take the union of the results for these joins. If the clustering does not reflect the partitions needed by the views). This peer (currently being developed as part of the ELENA project5) allows advanced querying, inferencing and mediation, and also provides reasoning services to be used in ELENA for providing personalization in the context of a smart learning space. It can also be used to express query correspondence assertions and model correspondences as flexible mechanisms to express mappings between heterogeneous schemas as discussed in [21].
3
Information Integration (Ullman, 1997)Ullman, 1997 Aerosol-relevant information arises from multiplicity of sources, each having specific evolution history, driving forces, formats etc. Data analysis, i.e. the transformation of raw data into ‘actionable’ knowledge, requires data from numerous sources Hence, there is a value in combining information from various sources but there are problems: –Legacy data systems can not be altered to support integration –Data systems use different terms or meaning of similar terms –Distributed sources, such as the web, may not have as schema at all.
4
Integration Architecture (Ullman, 1997)Ullman, 1997 Heterogeneous sources are wrapped by software that translates between the sources local language, model and concepts and the shared global concepts Mediators obtain information from one or more components (wrappers or other mediators) and pass it on to other mediators or to external users. In a sense, a mediator is a view of the data found in one or more sources; it does not hold the data but it acts as it it did. The job of the mediator is to go to the sources and provide an answer to the query.
5
Mediation Wiederhold Slide show on Mediation We have worked for many years on information architectures which are intended to provided extra value to the customers, beyond the value provided by the information sources. The modules, called Mediators are interposed between databases and other information sources, and client applications. They often carry out the roles that used to be performed by human intermediaries, as reviewers, abstracters, critics, writers of surveys and anthologies, staff experts, advices givers as consumer organizations and colleagues, librarians, and the person sitting next to you on a bus. We have less access to such human resources now. The resulting disintermediation leads to problems of access, information overload, and maintenance of sharable information resources. We observed that there were many ad-hoc tools and aproaches being developed to deal with this issue, including work in our own DARPA-funded KBMS project. An early paper, mapping earlier work into a simple concept and architecture was written in 1991: Mediators. Information pops on the web up spontaneously … no master plan but it is awkward to obtain due to lack of a catalog. We need services, composed of software and people, that select, filter, digest, integrate, and abstract data for specific topics of interest [Resnick:97]. There will be meta-services as well, helping to locate those services and reporting on their quality. We refer to the combination of experts and software to perform these functions, as mediators Functions. The role of mediators is to translate data to information for multiple customers by intelligent processing, statistics or by other means [W:91]. Traditional middleware [Kleinrock:94] connects and transports data, but a mediator also transforms the content.
6
Functional Service Layers Service Serviceinterface Resource access interface interface User interface Real-world interface interface Human-computer Interaction Interaction Application- specific code specific code Domain- specific specific code code Source- specific specific code code MEDIATION Services Services Available Sources Client
7
Architecture instances Applications.... Mediators...... Resources... _ …. ….. _ …. ….. include computational resources
8
Architecture of Dvoy Federated Information System After Busse et. al., 1999 The main software components of Dvoy are wrappers, which encapsulate sources and remove technical heterogeneity, and mediators, which resolve the logical heterogeneity. Wrapper classes are available for geo-spatial (incl. satellite) images, SQL servers, text files,etc. The mediator classes are implemented as web services for uniform data access to n-dimensional data.
9
NSF Middleware Middleware: The Basics Middleware is software that connects two or more otherwise separate applications across the Internet or local area networks. More specifically, the term refers to an evolving layer of services that resides between the network and more traditional applications for managing security, access and information exchange to Let scientists, engineers, and educators transparently use and share distributed resources, such as computers, data, networks, and instruments, Develop effective collaboration and communications tools such as Grid technologies, desktop video, and other advanced services to expedite research and education, and Develop a working architecture and approach that can be extended to the larger set of Internet and network users. Middleware makes resource sharing seem transparent to the end user, providing consistency, security, privacy, and capabilities. The diagram below represents the relationship between Middleware and the technical and policy components of an information technology system that are required to work with it.
10
GLOBAL SCHEMA
11
4 D Geo-Environmental Data Cube (X, Y, Z, T) Environmental data represent measurements in the physical world which has space (X, Y, Z) and time (T) as its dimensions. The specific inherent dimensions for geo-environmental data are: Longitude X, Latitude Y, Elevation Z and DateTime T. The needs for finding, sharing and integration of geo-environmental data requires that data are ‘coded’ in this 4D data space – at the minimum. Additional for
12
Hierarchy of Data Objects: DataGranule, Data Series, DataCube Measure A measure (in OLAP terminology) represent numerical values for a specific entity to be analyzed (e.g. temperature, wind speed, pollutant).OLAP A collection of measures form a special dimension ‘ Measures’ (??Can Measures be Dimensions??) special dimension Data Granules A data granules– discrete, atomic data entities that cannot be further broken down. A data series is an ordered collection of data granules DataSeries is a collection of DataGranules having common attributes All data points in a measure represent the same measured parameter e.g. temperature. Hence, they share the same units and dimensionality. The data points of a measure are enclosed in a conceptual multidimensional data cube; each data point occupies a volume (slice or point) in the data cube. Data points in a measure share the same dimensions; Conversely, each data point has to have the dimensional coordinates in the data cube of the measure that it belongs to. Dimension Y DataSeries Dimension Z Dimension X DataCube DataGranule
13
Dimension Y Data Array Dimension Z Dimension X DataCube DataGranule 3D 2D 1D
14
Multi-Dimensional Data Model Data can be distributed over 1,2, …n dimensions 1 Dimensional e.g. Time dimension i j k j i Data Granule i 1 Dimensional e.g. Location & Time 1 Dimensional e.g. Location, Time & Parameter View 1 Data Space View 2 Views are orthogonal slices through multidimensional data cubes Spatial and temporal slices through the data are most common
15
WRAPPERS
16
src_img_width src_img_height src_margin_right src_margin_left src_margin_top src_margin_bottom src_lon_max src_lat_max src_lon_min src_lat_min
17
Dvoy: Components and Image Data Flow and Dvoy Compone nts: Distributed Data Image Registration Data Catalog Data Query Image Delivery Image Viewer Distributed Image Data Image Registration Data Catalog XY MAP: Z,T fixed Image Data Browser Data QueryImage Delivery Web Service Measure: Elevation Measure: TOMS Measure: SeaWiFS Data Selection: Measure, X, Y, Z, T
18
The ‘Minimal’ Star Schema The minimal Site table includes SiteID, Name and Lat/Lon. The minimal Parameter table consists of ParamterID, Description and Unit The time dimensional table is usually skipped since time is self- describing The minimal Fact (Data) table consists of the Obs_Value and the three dimensional codes for Obs_DateTime, Site_ID and Parameter_ID For integrative, cross-Supersite analysis, data queries by time, location and parameter, the database has to have time, location and parameter as dimensions The above minimal (multidimensional) schema was used in the CAPITA data exploration software, Voyager for the past 22 years, encoding 1000+ datasets. Most Supersite data require a more elaborate schema to fully capture the content
19
Database Schema Design Fact Table: A fact table (yellow) contains the main data of interest, i.e. the pollutant concentration by location, day, pollutant and measurement method.Fact Table Star Schema consists of a central fact table surrounded by de- normalized dimensional tables (blue) describing the sites, parameters, methods.. Snowflake Schema is an extension of the star schema where each point of the star ‘explodes’ into further fully normalized tables, expanding the description of each dimension. Snowflake schema can capture all the key data content and relationships if full detail. It is well suited for capturing and encoding complex monitoring data into a robust relational database.
20
Extended Star Schema for SRDS The Supersite program employs a variety of instrument/sampling/procedures Hence, at least one additional dimension table is needed for Methods A example extended star schema encodes the IMPROVE relational database (B. Schichtel)
21
Snowflake Example: Central Calif. AQ Study, CCAQS CCAQS schema incorporates a rich set of parameters needed for QA/QC (e.g. sample tracking) as well as for data analysis. The fully relational CCAQS schema permits the enforcing of integrity constraints and it has been demonstrated to be useful for data entry/verification. However, no two snowflakes are identical. Similarly, the rich snowflake schemata for one sampling/analysis environment cannot be easily transplanted elsewhere. More importantly, many of the recorded parameters ‘on the fringes’ are not particularly useful for integrative, cross-supersite, regional analyses. Hence the shared ( exposed) subset of the entire data set may consist of a small subset of the ‘snowflake’
22
From Heterogeneous to Homogeneous Schema Individual Supersite SQL databases can be queried along spatial, temporal and parameter dimensions. However, the query to retrieve the same information depends on the of the particular database. A way to homogenize the distributed data is access all the data through a Data Adapter using only a subset of the tables/fields from any particular database (red) The proposed extracted uniform (abstract) schema is the Minimal Star Schema, (possibly expanded). The final form of the uniformly extracted data schema will be arrived at by consensus. Subset used Uniform Schema Fact Table Data Adapter Extraction of homogeneous data from heterogeneous sources
23
MEDIATOR
24
Render Service Chaining in Spatio- Temporal Data Browser Spatial Slice Find/Bind Data nDim Data Cube Time Slice Time Portrayal Spatial PortrayalSpatial Overlay Time Overlay OGC-Compliant GIS Services Time-Series Services PortrayOverlay Homogenizer Catalog Wrapper Mediator Client Browser Cursor/Controller Maintain Data Vector GIS Data XDim Data SQL Table OLAP Satellite Images Data Sources
25
Overlay of multiple Datasets Each DataCube may have 0-n dimensions Each dimension is assigned a view 3 D DataCube 2 D DataCube DataView 3 Layer 2 Layer 1 DataView 1 DataView 2 In a view, the number of layers is the number of datasets If a DataCube does not have a data for a view, a Null Layer is assigned Null Layer
26
Overlay of multiple Datasets Each DataCube may have 0-n dimensions Each dimension is assigned a view DataView 3 DataView 1 DataView 2 In a view, the number of layers is the number of datasets If a DataCube does not have a data for a view, a Null Layer is assigned 3 D DataCube Data Access Connections Data Render Connections
27
Dvoy Data Flow and Processes DataView 1 View Data Abstract Portrayal Device Portrayal Render Device View Portrayal Device Driver Trans - mission Abstr.Data Access View Wrapper Physical Data Abstract Data Physical Data Physical Data reside in servers Data are accessed by view-specific wrappers yielding homogeneous abstract data ‘slices’ Abstract Data Abstract Data are virtual Abstract data are requested by viewers; homogeneous real data are delivered by abstract interface DataView 2 DataView 3 View Data View Data enriched for portrayal View data from abstract interface are enriched by parameters useful for portrayal/processing
28
Three-Tier Federated Data Warehouse Architecture (Note: In this context, ‘Federated’ differs from ‘Federal’ in the direction of the driving force. Federated meant to indicate a driving force for sharing from ‘bottom up’ i.e. from the members, not dictated from ‘above’, by the Feds) 1.Provider Tier: Back-end servers containing heterogeneous data, maintained by the federation members 2.Proxy Tier: Retrieves designated Provider data and homogenizes it into common, uniform Datasets 3.User Tier: Accesses the Proxy Server and uses the uniform data for presentation, integration or processing Provider Tier Heterogeneous data in distributed SQL Servers Proxy Tier Data homogenization, transformation Federated Data Warehouse User Tier Data presentation, processing
29
Federated Data Warehouse Interactions The Provider servers interact only with the Proxy Server in accordance with the Federation Contract –The contract sets the rules of interaction (accessible data subsets, types of queries) –Strong server security measures enforced, e.g. through Secure Socket layer The data User interacts only with the generic Proxy Server using flexible Web Services interface –Generic data queries, applicable to all data in the Warehouse (e.g. data sub- cube by space, time, parameter) –The data query is addressed to the Web Service provided by the Proxy Server –Uniform, self-describing data packages are passed to the user for presentation or further processing SQLDataAdapter1 CustomDataAdapter SQLDataAdapter2 SQLServer1 SQLServer2 LegacyServer Presentation Data Access & Use Provider Tier Heterogeneous Data Proxy Tier Data Homogenization, etc. Member Servers Proxy Server User Tier Data Consumption Processing Integration Federated Data Warehouse Fire Wall, Federation Contract Web Service, Uniform Query & Data
30
SQLDataAdapter1 CustomDataAdapter ImageDataAdapter2 SQLServer1 ImageServer2 LegacyServer Presentation Data Access & Use Provider Tier Heterogeneous Data Proxy Tier Data Homogenization, etc. Member Servers Proxy Server User Tier Data Consumption Processing Integration Federated Data Warehouse Firewall; Federation Contract Web Service, Uniform Query & Data Federated Data Warehouse Federated Data Warehouse Architecture Three-tier architecture consisting of –Provider Tier: Back-end servers containing heterogeneous data, maintained by the federation members –Proxy Tier: Retrieves Provider data and homogenizes it into common, uniform Datasets –User Tier: Accesses the Proxy Server and uses the uniform data for presentation, integration or further processing The Provider servers interact only with the Proxy Server in accordance with the Federation Contract –The contract sets the rules of interaction (accessible data subsets; types of queries submitted by the Proxy) –The Proxy layer allows strong security measures, e.g. through Secure Socket layer The data User interacts only with the generic Proxy Server using flexible Web Services interface –Generic data queries, applicable to all data in the Warehouse (e.g. space, time, parameter data sub-cube) –The data query is addressed to a Web Service provided by the Proxy Server of the Federation –Uniformly formatted, self-describing XML data packages are handed to the user for presentation or further machine processing
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.