Stefania Bergamasco, Cecilia Colasanti An integrated approach to turn statistics into knowledge combining data warehouse, controlled vocabularies and advanced.

Slides:



Advertisements
Similar presentations
Collections Management Software for Museums and Archives r e d i s c o v e r y s o f t w a r e. c o m O V E R V I E W P R E S E N T A T I O N.
Advertisements

Fatma Y. ELDRESI Fatma Y. ELDRESI ( MPhil ) Systems Analysis / Programming Specialist, AGOCO Part time lecturer in University of Garyounis,
Wincite Knowledge Warehousing and Networking Sophisticated Simplicity.
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
Business Development Suit Presented by Thomas Mathews.
COMBASE: strategic content management system Soft Format, 2006.
Making the Case for Metadata at SRS-NSF National Science Foundation Division of Science Resources Statistics Jeri Mulrow, Geetha Srinivasarao, and John.
Seattle Drupal Clinic Introduction to Drupal and Web Content Management.
Management Information Systems, Sixth Edition
Chapter 2. Slide 1 CULTURAL SUBJECT GATEWAYS CULTURAL SUBJECT GATEWAYS Subject Gateways  Started as links of lists  Continued as Web directories  Culminated.
Christine Apikul. Module 4 Objectives To discuss the features and functions of a content management system To understand the tools and options available.
Chapter 9 DATA WAREHOUSING Transparencies © Pearson Education Limited 1995, 2005.
© Network Ltd 2006 Social Bookmarking in the Work Place Network Ltd Ryan Butler David Gould
DATA WAREHOUSING.
Chapter 2: The Visual Studio.NET Development Environment Visual Basic.NET Programming: From Problem Analysis to Program Design.
Information System for Quality Documentation A Short Presentation for the ESTP Course “Data Dissemination and Publication of Statistics” by Sonia Vittozzi.
Chapter 4 Database Management Systems. Chapter 4Slide 2 What is a Database Management System (DBMS)?  Database An organized collection of related data.
A Product of Enterprise Content Management System (CMS) Web & Portal Content Management Systems for faster web publishing Copyright.
Federated Searching Pre-Conference Workshop - The federated searching cookbook Qin Zhu HP Labs Research Library February 18, 2007.
by Ha Do Statistical Standard Methodology and ITC Department
Databases & Data Warehouses Chapter 3 Database Processing.
Drupal Workshop Introduction to Drupal Part 1: Web Content Management, Advantages/Disadvantages of Drupal, Drupal terminology, Drupal technology, directories.
Section 13.1 Add a hit counter to a Web page Identify the limitations of hit counters Describe the information gathered by tracking systems Create a guest.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Regional Intelligence in Central Macedonia, Greece The METAFORESIGHT solution Isidoros Passas, Nicos Komninos, Elena Sefertzi, Lina Kyrgiafini URENIO Research.
Case Studies: Statistics Canada (WP 11) Alice Born Statistics UNECE Workshop on Statistical Metadata.
Implementing ESS standards for reference metadata and quality reporting at Istat Work Session on Statistical Metadata Topic (i): Metadata standards and.
Web 2.0: Concepts and Applications 6 Linking Data.
Using the SAS® Information Delivery Portal
Met a-data Resources in Europe: within NSIs and from Dosis Projects Wilfried Grossmann Department of Statistics and Decision Support Systems University.
Ms. Irene Onyancha ISTD/Library & Information Management Services United Nations Economic Commission for Africa The Second Session of the Committee on.
5/26/2016DataSet™ Presentation 1 Front Cover 2008 DataSet™ An Advanced Business Intelligence Solution.
FP WIKT '081 Marek Skokan, Ján Hreňo Semantic integration of governmental services in the Access-eGov project Faculty of Economics.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
Seattle Drupal Clinic Introduction to Drupal Part 1: Web Content Management, Advantages/Disadvantages of Drupal, Drupal terminology.
Francesco Rizzo (ISTAT - Italy) Stefano De Francisci (ISTAT – Italy) An integration approach for the Statistical Information System of Istat using SDMX.
United Nations Economic Commission for Europe Statistical Division Mapping Data Production Processes to the GSBPM Steven Vale UNECE
Implementation Experiences METIS – April 2006 Russell Penlington & Lars Thygesen - OECD v 1.0.
Recuperação de Informação B Cap. 10: User Interfaces and Visualization , , 10.9 November 29, 1999.
Supporting Researchers and Institutions in Exploiting Administrative Databases for Statistical Purposes: Istat’s Strategy G. D’Angiolini, P. De Salvo,
.  A multi layer architecture powered by Spring Framework, ExtJS, Spring Security and Hibernate.  Taken advantage of Spring’s multi layer injection.
Chapter 4 Decision Support System & Artificial Intelligence.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Integrated metadata systems History Status Vision Roadmap
Pasewark & Pasewark Microsoft Office 2003: Introductory 1 INTRODUCTION Lesson 1 – Microsoft Office 2003 Basics and the Internet.
Copyright All right reserved 1 i - LIKE Linked Data enrichment for an e-learning system Networked interactions to create, learn and share knowledge.
WebDat: A Web-based Test Data Management System J.M.Nogiec January 2007 Overview.
1 « Luxembourg, 18 April 2007 « Virtual Library of Official Statistics « Dissemination Working Group.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
June 30, 2005 Public Web Site Search Project Update: 6/30/2005 Linda Busdiecker & Andy Nguyen Department of Information Technology.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
METADATA MANAGEMENT AT ISTAT: CONCEPTUAL FOUNDATIONS AND TOOLS Istituto Nazionale di Statistica ITALY.
Microsoft Office 2010 Basics and the Internet
Microsoft Office 2010 Basics and the Internet
Using E-Business Suite Attachments
Prepared by: Galya STATEVA, Chief expert
Google Search Appliance: improving the search experience
Objective % Explain concepts used to create websites.
Chapter 2 Database Environment Pearson Education © 2009.
(VIP-EDC) Point 6 of the agenda
Metadata in the modernization of statistical production at Statistics Canada Carmen Greenough June 2, 2014.
2. An overview of SDMX (What is SDMX? Part I)
Introduction to Systems Analysis and Design Stefano Moshi Memorial University College System Analysis & Design BIT
Malte Dreyer – Matthias Razum
The Database Environment
Metadata The metadata contains
The INTERACT Website: Important source of information for the ETC Community Karen Vandeweghe, Communications Manager, IS Bratislava 27 January 2010.
The New LexisNexis® Statistical
SOA initiatives at Istat
Introduction to reference metadata and quality reporting
Presentation transcript:

Stefania Bergamasco, Cecilia Colasanti An integrated approach to turn statistics into knowledge combining data warehouse, controlled vocabularies and advanced search engine Authors: Stefania Bergamasco, Cecilia Colasanti, Stefano De Francisci, Paola Giacché, Paolo Giacomi NTTS 2009 Brussel, February 2009

The main issues An integrated approach to turn statistics into knowledge combining data warehouse, controlled vocabularies and advanced search engine The aim of this session is to illustrate the main components of the Integrated Output Management System of ISTAT (ISTAR) a. Data Module b. Doc Module c. Glossary Module d. GSA Module in order to show a. the solution adopted to integrate the statistical data warehouse and the new ways to organize and retrieve the information on the Web; b. the use of the principles of controlled vocabularies to manage the glossary of the system; d. the technical solution adopted to optimize the search engine to scan the dynamic Web pages generated by the information system. NTTS Brussel, February 2009

ISTAR: the Integrated Output Management System of ISTAT The integrated system is based on the construction of several metadata layers. They cover not only the description, the design and the reference of the contents, but are also oriented towards the management of the navigation, the finding, the interchange and the semantics of the data. An integrated approach to turn statistics into knowledge combining data warehouse, controlled vocabularies and advanced search engine NTTS Brussel, February 2009 ISTAR Data module Doc module Glossary module GSA module

ISTAR: the Integrated Output Management System of ISTAT – Data Module The Data module of Istar is a collection of tools specifically designed to support the statisticians in all the phases required to disseminate statistical aggregate data on the Web. From the functional point of view, the collection is structured in two different kinds of toolkits: modelling tools and analysis and reporting tools. Modelling tools allow to design the semantic layers of the system; analysis and reporting tools provide navigation tools, in-house or publication on the Web, of the data warehouse contents. The application architecture of Data module of Istar is layered as follows: An integrated approach to turn statistics into knowledge combining data warehouse, controlled vocabularies and advanced search engine NTTS Brussel, February 2009 Data module Metadata layers Administration module Statistical data warehouse OLAP engine

ISTAR: the Integrated Output Management System of ISTAT – Doc Module The Doc module manages the non-structured reference data and documents linked to the subject matter areas of the system. This module allows the connection and the interchange between Istar and the centralised system for surveys documentation (SIDI) and, in particular, its component dedicated to the Web dissemination (SIQual). The integration is possible through two navigation paths: links to statistical sources which feed the system links to documentation materials of the specific domain of interest An integrated approach to turn statistics into knowledge combining data warehouse, controlled vocabularies and advanced search engine NTTS Brussel, February 2009 Doc module SiQual statistical sources documents

ISTAR: the Integrated Output Management System of ISTAT – Glossary modul An integrated approach to turn statistics into knowledge combining data warehouse, controlled vocabularies and advanced search engine NTTS Brussel, February 2009 Glossary module system glossary consist of table glossary Table glossary button Link to the data source Link to the glossary items Specification of the statistical kind of the items Statistical Classifications Analysis unit DefinitionCategorySource of def.Related termsStatistical sources

ISTAR: the Integrated Output Management System of ISTAT – GSA modul An integrated approach to turn statistics into knowledge combining data warehouse, controlled vocabularies and advanced search engine NTTS Brussel, February 2009 The search engine ISTAT has chosen Google products and services to match the search needs of both internal and external users. GSA is packaged in an appliance including both hardware and software.

… ISTAR: the Integrated Output Management System of ISTAT – GSA modul An integrated approach to turn statistics into knowledge combining data warehouse, controlled vocabularies and advanced search engine NTTS Brussel, February 2009 GSA provides universal search across many sources (file shares, intranets, databases, applications and content management systems) through a single easy-to-use search box Users can customize the service based on their specific needs and Administrators can configure sets of search profiles Users are able to view search result only if they have access to the original content, so company data are always protected from unauthorized access GSA supports several authentication and Single Sign On

An integrated approach to turn statistics into knowledge combining data warehouse, controlled vocabularies and advanced search engine NTTS Brussel, February 2009 The main features We configured the product in considering four disjoint collections: electronic documents, press releases, statistical tables in electronic sheet format and database. It is also possible to show or not the metadata. … ISTAR: the Integrated Output Management System of ISTAT – GSA modul

An integrated approach to turn statistics into knowledge combining data warehouse, controlled vocabularies and advanced search engine NTTS Brussel, February 2009 By choosing “Advanced search” it’s also possible to associate an item among the taxonomies with one or more facets ( thematic area, statistical source, year, territory) … ISTAR: the Integrated Output Management System of ISTAT – GSA modul

The solution adopted We faced with the following problems: 1.how to exploit the indexing of documents by the engine search of GSA through dynamic web pages of a Data Warehouse avoiding the scan database saturation; 2.how to exploit the concepts of taxonomy and facet enabling the users to retrieve the information, independently from where the information is stored; 3.how to provide search result with the related metadata within the snippet We adopted a solution based on three concepts: 1.we have associated to each “object” stored in the system all the metadata useful for managing the taxonomy, the facets and the snippet; 2.the potentialities of the search engine have been exploited not in reference of the scanned web pages but for the scanned database; 3.we tagged as “non crawling “ all the visited Web pages. An integrated approach to turn statistics into knowledge combining data warehouse, controlled vocabularies and advanced search engine NTTS Brussel, February 2009 … ISTAR: the Integrated Output Management System of ISTAT – GSA modul

Going more in details: 1. within each system a relational table, called search_engine, was built. This table is populated through specific procedures invoked by “insert, modify, cancel” events. The table has the following fields structure (Clob): typology / theme / data source / territory / time / title / other (e. g. the modes of the classifications associated to the table / url An integrated approach to turn statistics into knowledge combining data warehouse, controlled vocabularies and advanced search engine NTTS Brussel, February a specific functionality to enlist the databases and the tables to be scanned and indexed has been parameterised and the URL of the related objects has been enlisted. In this way the search engine does not scan each web dynamic page of the system, but it scans the contents of fields in the database; 3.the search interfaces have been customized using - the field typology as hierarchical-enumerative classification, - the fields theme, data source, territory, time for the faceted classification - the fields time, title for the snippet. … ISTAR: the Integrated Output Management System of ISTAT – GSA modul

An integrated approach to turn statistics into knowledge combining data warehouse, controlled vocabularies and advanced search engine Conclusions The strategy adopted to increase the value of the information is based on a complex scenario of integration: - data warehouses - metadata information systems - descriptive and textual information The technical solutions include: - the construction of specific metadata layers - the optimization of search, in order to improve the performances of the scanning operations - the combination of new opportunities offered by the new web technologies with the capability of dynamic Web Warehouses Two lessons: it is possible to integrate and to share knowledge also when informations are organized in various ways (from legacy data base or data warehouse to textual documents, volumes, etc.) … paying more and more attention to the information needs of the users. NTTS Brussel, February 2009