Use of the SPSSMR Data Model at ATP 12 January 2004.

Slides:



Advertisements
Similar presentations
© 1998, Progress Software Corporation 1 Migration of a 4GL and Relational Database to Unicode Tex Texin International Product Manager.
Advertisements

Connecting to Databases. relational databases tables and relations accessed using SQL database -specific functionality –transaction processing commit.
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
Database System Concepts and Architecture
Manish Bhide, Manoj K Agarwal IBM India Research Lab India {abmanish, Amir Bar-Or, Sriram Padmanabhan IBM Software Group, USA
Chapter 17: Client/Server Computing Business Data Communications, 4e.
Fundamentals, Design, and Implementation, 9/e Chapter 12 ODBC, OLE DB, ADO, and ASP.
Visual Web Information Extraction With Lixto Robert Baumgartner Sergio Flesca Georg Gottlob.
Fast Track to ColdFusion 9. Getting Started with ColdFusion Understanding Dynamic Web Pages ColdFusion Benchmark Introducing the ColdFusion Language Introducing.
Stanford University EH&S A Service Oriented Architecture For Rich Internet Applications Sheldon M. Heitz.
Client-server database systems and ODBC l Client-server architecture and components l More on reliability and security l ODBC standard.
Attribute databases. GIS Definition Diagram Output Query Results.
NFS. The Sun Network File System (NFS) An implementation and a specification of a software system for accessing remote files across LANs. The implementation.
MD807: Relational Database Management Systems Introduction –Course Goals & Schedule –Logistics –Syllabus Review RDBMS Basics –RDBMS Role in Applications.
Distributed Systems: Client/Server Computing
Data Warehousing: Defined and Its Applications Pete Johnson April 2002.
Instructions to run this Demo Uncheck the Check Box – ‘Always Ask Before Opening this type of file’ Always OPEN & do not save the file ‘1KEYAgile.ppt’
A tour of new features introducing LINQ. Agenda of LINQ Presentation We have features for every step of the way LINQ Fundamentals Anonymous Functions/Lambda.
Discover, Master, InfluenceSlide 1 SQL Server Compact Edition and the Entity Framework Rob Sanders Readify.
Copyright 2004, SPSS Inc. 1 Using the SPSS MR Data Model Sam Winstanley Solution Architect - SPSS 21 st January 2004.
Objectives of the Lecture :
6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Database Systems – Data Warehousing
Chapter 5 Lecture 2. Principles of Information Systems2 Objectives Understand Data definition language (DDL) and data dictionary Learn about popular DBMSs.
Configuration Management and Server Administration Mohan Bang Endeca Server.
4-1 INTERNET DATABASE CONNECTOR Colorado Technical University IT420 Tim Peterson.
1 INTRODUCTION TO DATABASE MANAGEMENT SYSTEM L E C T U R E
Fundamentals of Database Chapter 7 Database Technologies.
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
OnLine Analytical Processing (OLAP)
Pierre-Louis Usselmann, Ben Watt SOGETI Switzerland Master Data Services.
Component 4: Introduction to Information and Computer Science Unit 6: Databases and SQL Lecture 2 This material was developed by Oregon Health & Science.
This material was developed by Duke University, funded by the Department of Health and Human Services, Office of the National Coordinator for Health Information.
Universal Data Access and OLE DB. Customer Requirements for Data Access Technologies High-Performance access to data Reliability Vendor Commitment Broad.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
Design of a Search Engine for Metadata Search Based on Metalogy Ing-Xiang Chen, Che-Min Chen,and Cheng-Zen Yang Dept. of Computer Engineering and Science.
Personal Computer - Stand- Alone Database  Database (or files) reside on a PC - on the hard disk.  Applications run on the same PC and directly access.
Lecture2: Database Environment Prepared by L. Nouf Almujally 1 Ref. Chapter2 Lecture2.
Chapter 17: Client/Server Computing Business Data Communications, 4e.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
1 CS 430 Database Theory Winter 2005 Lecture 2: General Concepts.
INFO1408 Database Design Concepts Week 15: Introduction to Database Management Systems.
McGraw-Hill/Irwin The O’Leary Series © 2002 The McGraw-Hill Companies, Inc. All rights reserved. Microsoft Access 2002 Lab 3 Analyzing Tables and Creating.
Component 4/Unit 6b Topic II Relational Databases Keys and relationships Data modeling Database acquisition Database Management System (DBMS) Database.
Distributed Information Systems. Motivation ● To understand the problems that Web services try to solve it is helpful to understand how distributed information.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
Database Management Systems (DBMS)
Advanced Tips And Tricks For Power Query
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
Chapter 5 Introduction To Form Builder. Lesson A Objectives  Display Forms Builder forms in a Web browser  Use a data block form to view, insert, update,
Object storage and object interoperability
Bigtable: A Distributed Storage System for Structured Data
Text TCS INTERNAL Oracle PL/SQL – Introduction. TCS INTERNAL PL SQL Introduction PLSQL means Procedural Language extension of SQL. PLSQL is a database.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Introduction to SQL Server 2000 Reporting Services Jeff Dumas Technical Specialist Microsoft Corporation
In this session, you will learn to: Understand managed code Create managed database objects Define the Hypertext Transfer Protocol endpoints Implement.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Databases and DBMSs Todd S. Bacastow January 2005.
Introducing SmartView
Data Virtualization Tutorial: Introduction to SQL Script
Leveraging BI in SharePoint with PowerPivot and Power View
What’s changed in the Shibboleth 1.2 Origin
Lecture 1: Multi-tier Architecture Overview
Database Environment Transparencies
Chapter 17: Client/Server Computing
CORBA and COM TIP Two practical techniques for object composition
Palestinian Central Bureau of Statistics
Presentation transcript:

Use of the SPSSMR Data Model at ATP 12 January 2004

Introduction John Lyon ATP - IT Solutions Department How the SPSS Data Model has changed the development process at ATP Vector and example project

What counts as the Data Model? The Metadata Model The Case Data Model SPSS MR OLEDB Provider SPSS Evaluate Component, SPSS Function Libraries, SQL Aggregation Engine

Example project Browser-based front end to legacy in-house cross- tabulation system Complex continuous research project 1,000,000+ records Hierarchical data Multiple end user clients with complex access control Simple filter definition available in the front end Complex filter and derived variable definition at back- end

Other considerations Minimum interruption of existing production processes Client is committed to Dimensions products and the system can take advantage of a server version of the Data Model Compatibility with other systems including MRTables ATP had already developed an aggregation engine that works on top of the Data Model with most of the functionality required - Vector

Vector Architecture Vector aggregation component Dimensions DSCODBCtriple-s Vector manipulation component OLAP Table object models e.g. Winyaps, mrTables Web tables Intranet/Extranet pages Automate d charting etc.

Demo Example project

Can we use the Data Model? Does the Data Models conceptual view of a data set support all the structures I need? Does the Data Models API provide functionality that makes it worth using? Do all the layers of actual implementation I planning to use support the functionality I need in an efficient way? – TEST THEM Can we really afford to ignore the compatibility and future proof-ness that we gain from using the data model?

Metadata Model - Structure Thinking of the Data Model as a data format analogous to SSS – does the metadata object model definition support all the basic structures I need? In our experience, for reporting and at an individual project level, the answer will almost certainly be yes Key features are: full set of data types including categorical variables, support for hierarchical data, versioning, languages, text variations, loops and grids

Whats missing Routing No structures for defining user access rights Because the structures tend to be project based, I think it lacks the structures that would be needed to implement a global research metadata repository There is no concept of hierarchical categorical structures like product groupings, employee structures, retail output networks, etc

Example project The legacy system supports the definition of derived variables that are composites of numeric and categorical variables - these could not be resolved into Data Model variables types Need to control access to variables down to an individual user level - data model does not support any structures to define this We also needed to control access to based on time The obvious solution was to take advantage of the open structure of the data model and control access through the API

Metadata Model API By this I mean the COM object that is used to load and manipulate the metadata structures – MDM Document The object exposes a comprehensive set of properties allowing developers to access, modify and create new instances of all the structures supported by the data model It handles the complexities of controlling multiple versions, languages, contexts and merging documents Its open - you can develop your own MDSCs

Whats missing It can be very slow to load The load is all or nothing – theres no concept of a partial load You can only lock the document for read/write operations at a document level In short, the MDM Document is exactly what it says – a DOCUMENT, with all the limitations that that implies. What I really want is an Object Database system that gives me full multi-user access to the metadata

Example project API structure of the Data Model makes it possible to work around most problems Decided to build an MDSC that understands the metadata structures used by the legacy system including the structure for user level access control to variables and back data Developers point of view - each user connects to a different data source and the MDSC deals with returning a partial view of the database for that user Clients point of view - they can use their existing production processes and access control procedures to manage the system

Case Data Model CDSCs – which map a native data source to something understood by the data model SPSSMR OLEDB Provider which process SQL based requests for the data and returns a series of virtual tables to the client application To achieve this, the OLEDB Provider uses the Expression Evaluation Component, an SPSSMR Function Library and the SQL Aggregation Engine

The Evaluation Component For me the expression the evaluation component and the function library it uses are the very heart of the data model and the best reason for using it They parse and evaluate expressions involving the market research specific data structures supported by the Case Data Model Prior to version 2.8, the most significant feature was the ability to handle categorical variables as sets The data model defines how these sets behave with the usual range of operators and also provides a useful range of set based functions to manipulate them

Version 2.8 With release 2.8, the data model also supports a very neat way of defining expressions on hierarchical data by introducing a syntax to uplev and downlev variables The uplev operator is used in conjunction with an aggregate function to control how the uplev is performed Another new feature of Data Model 2.8 is the support for hierarchical SQL queries which implement a hierarchical syntax and return hierarchical recordsets Syntax for running sub-queries The syntax is a clever compromise between the ideas of standard SQL and the need to support a concise expression syntax that can be evaluated outside the context of an SQL query

Example project Involves hierarchical data and on-the-fly evaluation of derived variables involving expressions across different levels of the hierarchy With support for hierarchical data our plan is to… Develop a CDSC to load the base data variables into HDATA tables Develop the MDSC to load the base variables as usual but map any derived variables in the legacy system into derived variables in the MDM. These will then be evaluated on-the-fly by the Case Data Model Unfortunately, we started development a while ago and we decided to build the logic for evaluating hierarchical expressions into the CDSC

Problems with the Case Data Performance The case data model may use SQL to retrieve the data but its not a RDBMS You cant really ignore the underlying structure of the data file – which DSC you use in any situation makes a big difference Test everything In Vector we cache the data into inverted binary files

Conclusion Thinking about this application in isolation, you could argue that we dont need to go through the data model Some of the reasons are defensive: our clients expect it; we want to be able to integrate with other SPSSMR applications; we want to be able to support a wide range of exports; etc. The most important reason is that the data model APIs include a wealth of functionality for handling research data that are exposed to developers in a flexible way In the long term – it saves time! ATP will always look to build future projects around the data model wherever possible