1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH-6963-01 Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project.

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

Chapter 10: Designing Databases
Presentation by Priyanka Sawarkar
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
Information Retrieval in Practice
Scientific Workflows Systems : In Drug discovery informatics Presented By: Tumbi Muhammad Khaled 3 rd Semester Department of Pharmacoinformatics.
Copyright 2002 Prentice-Hall, Inc. Chapter 1 The Systems Development Environment 1.1 Modern Systems Analysis and Design Third Edition Jeffrey A. Hoffer.
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Enterprise Search With SharePoint Portal Server V2 Steve Tullis, Program Manager, Business Portal Group 3/5/2003.
eGovernance Under guidance of Dr. P.V. Kamesam IBM Research Lab New Delhi Ashish Gupta 3 rd Year B.Tech, Computer Science and Engg. IIT Delhi.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
Lecture Nine Database Planning, Design, and Administration
Knowledge Portals and Knowledge Management Tools
Overview of Search Engines
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Digital Object: A Virtual Online Storage Solution 598C Course Project Huajing Li.
Chapter 1 Overview of Databases and Transaction Processing.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
January, 23, 2006 Ilkay Altintas
Overview of SQL Server Alka Arora.
SC32 WG2 Metadata Standards Tutorial Metadata Registries and Big Data WG2 N1945 June 9, 2014 Beijing, China.
Web Services (Part 1) Service-Oriented Architecture Overview ITEC 625 Web Development Fall 2006 Reference: Web Services and Service-Oriented Architectures.
Zhonghua Qu and Ovidiu Daescu December 24, 2009 University of Texas at Dallas.
Copyright 2002 Prentice-Hall, Inc. Chapter 1 The Systems Development Environment 1.1 Modern Systems Analysis and Design.
1 Peter Fox Data Science – ITEC/CSCI/ERTH Week 2, September 7, 2010 Data and information acquisition (curation, preservation) and metadata - management.
1 Peter Fox Xinformatics 4400/6400 Week 10, April 14, 2015 Unstructured Information, Information Audit / Workflow and Discovery.
Copyright 2002 Prentice-Hall, Inc. Chapter 1 The Systems Development Environment 1.1 Modern Systems Analysis and Design Third Edition Jeffrey A. Hoffer.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
GCMD/IDN STATUS AND PLANS Stephen Wharton CWIC Meeting February19, 2015.
1 Adapted from Pearson Prentice Hall Adapted form James A. Senn’s Information Technology, 3 rd Edition Chapter 7 Enterprise Databases and Data Warehouses.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
1 Knowledge Portals and Knowledge Management Tools Chapter 13.
Dr. Mohamed Osman Hegazi 1 Database Systems Concepts Database Systems Concepts Course Outlines: Introduction to Databases and DBMS. Database System Concepts.
Peter Bajcsy, Rob Kooper, Luigi Marini, Barbara Minsker and Jim Myers National Center for Supercomputing Applications (NCSA) University of Illinois at.
I.Information Building & Retrieval Learning Objectives: the process of Information building the responsibilities and interaction of each data managing.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
The Saguaro Digital Library for Natural Asset Management Dr. Sudha RamSudha Ram Advanced Database Research Group Dept. of MIS The University of Arizona.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
New Ideas for IA Readings review - How to manage the process Content Management Process Management - New ideas in design Information Objects Content Genres.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
CS370 Spring 2007 CS 370 Database Systems Lecture 1 Overview of Database Systems.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
IS 325 Notes for Wednesday August 28, Data is the Core of the Enterprise.
Interoperability & Knowledge Sharing Advisor: Dr. Sudha Ram Dr. Jinsoo Park Kangsuk Kim (former MS Student) Yousub Hwang (Ph.D. Student)
Grid Computing & Semantic Web. Grid Computing Proposed with the idea of electric power grid; Aims at integrating large-scale (global scale) computing.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
10/24/09CK The Open Ontology Repository Initiative: Requirements and Research Challenges Ken Baclawski Todd Schneider.
1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH Week 11, April 20, 2010 Information management and workflow.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
1 Limitations of BLAST Can only search for a single query (e.g. find all genes similar to TTGGACAGGATCGA) What about more complex queries? “Find all genes.
1 Peter Fox Xinformatics 4400/6400 Week 10, April 9, 2013 Information management, workflow and discovery /check-in for project definitions.
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
Software Reuse Course: # The Johns-Hopkins University Montgomery County Campus Fall 2000 Session 4 Lecture # 3 - September 28, 2004.
Copyright (c) 2014 Pearson Education, Inc. Introduction to DBMS.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Partnerships in Innovation: Serving a Networked Nation Grid Technologies: Foundations for Preservation Environments Portals for managing user interactions.
1 Peter Fox Xinformatics Week 9, March 27, 2012 Information management, workflow and discovery /check-in for project definitions.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
Chapter 1 Overview of Databases and Transaction Processing.
Information Retrieval in Practice
Introduction To DBMS.
Unstructured Information, Information Audit / Workflow and Discovery
Terms: Data: Database: Database Management System: INTRODUCTION
Data Management Components for a Research Data Archive
Scientific Workflows Lecture 15
Presentation transcript:

1 Peter Fox Xinformatics – ITEC 6961/CSCI 6960/ERTH Week 9, April 5, 2011 Information management, workflow and discovery /check-in for project definitions

Contents Review of assignment 3 presentations Reading (re-organized) Information management Information workflow Information discovery Checking in for project definitions? Next class 2

Discussion – A3 What did you learn? What was easy/ hard? What did you learn from others? Will you ever look at an information system the same again? –Sorry? –Happy? 3

Logical Collections The primary goal of a Management system is to abstract the physical collection into logical collections. The resulting view is a uniform homogeneous collection. –Identifying naming conventions and organization –Aligning cataloguing and naming to facilitate search, access, use –Provision of contextual information 4

Physical Handling This layer maps between the physical to the logical views. Here you find items like replication, backup, caching, etc. –Where and who does it come from? –How is it transferred into a physical form? –Backup, archiving, and caching… –Formats –More --- naming conventions 5

Interoperability Support Normally the information does not reside in the same place, or various collections (like catalogues) should be put together in the same logical collection. –Bit/byte and platform/ wire neutral encodings –Programming or application interface access –Structure and vocabulary (metadata) conventions and standards 6

Security Access authorization and change verification. This is the basis of trusting your information. –What mechanisms exist for securing? –Who performs this task? –Change and versioning (yes, the information may change), who does this, how? –Who has access? –How are access methods controlled, audited? –Who and what – authentication and authorization? –Encryption and integrity 7

Ownership Define who is responsible for quality and meaning –Rights and policies – definition and enforcement –Limitations on access and use –Requirements for acknowledgement and use –Who and how is quality defined and ensured? –Who may ownership migrate too? –How to address replication? –How to address revised/ derivative products? 8

Metadata Metadata are data about data. Metainformation are information about information –How to know what conventions, standards, best practices exist? –How to use them, what tools? –Understanding costs of incomplete and inconsistent metadata –Understanding the line between metadata and data and when it is blurred –Knowing where and how to manage metadata and where to store it (and where not to) 9

Persistence Definition of lifetime. Deployment of mechanisms to counteract technology obsolescence. –Where will you put your information so that someone else (e.g. one of your class members) can access it? –What happens after the class, the semester, after you graduate? –What other factors are there to consider? 10

Discovery Ability to identify useful relations and information inside the collection –If you choose (see ownership and security), how does someone find your information? –How would you provide discovery of collections, versus files, versus ‘bits’? –How to enable the narrowest/ broadest discovery? 11

Dissemination 12 Mechanism to make aware the interested parties of changes and additions to the collections. –Who should do this? –How and what needs to be put in place? –How to advertise? –How to inform about updates? –How to track use, significance?

Summary of Management Creation of logical collections Physical handling Interoperability support Security support Ownership Metadata collection, management and access. Persistence Knowledge and information discovery Dissemination and publication 13

Information Workflow What it is? Why you would use it? Some pointers to workflow systems 14

15 What is a workflow? General definition: series of tasks performed to produce a final outcome Information workflow – “analysis pipeline” –Automate tedious jobs that users traditionally performed by hand for each dataset –Process large volumes of data/ information faster than one could do by hand

16 Background: Business Workflows Example: planning a trip Need to perform a series of tasks: book a flight, reserve a hotel room, arrange for a rental car, etc. Each task may depend on outcome of previous task –Days you reserve the hotel depend on days of the flight –If hotel has shuttle service, may not need to rent a car

17 What about information workflows? Perform a set of transformations/ operations on a data or information source Examples –Generating images from raw data –Identifying areas of interest from a large information source –Classifying set of objects –Querying a web service for more information on a set of objects –Many others…

18 More on Workflows Formal models of the flow of data/ information among processing components May be simple and linear or more complex Can process many data/ information types: –Archives –Web pages –Streaming/ real time –Images (e.g., medical or satellite) –Simulation output –Observational data

19 Challenges Questions: –What are some challenges for users in implementing workflows? –What are some challenges to executing these workflows? –What are limitations of writing a program? Mastering a programming language Visualizing workflow Sharing/exchanging workflow Formatting issues Locating datasets, services, or functions

20 Workflow Management Systems Graphical interfaces for developing and executing scientific workflows Users can create workflows by dragging and dropping Automates low-level processing tasks Provides access to repositories, compute resources, workflow libraries

21 Workflow Management Systems

22 Benefits of Workflows Documentation of aspects of analysis Visual communication of analytical steps Ease of testing/debugging Reproducibility Reuse of part or all of workflow in a different project

23 Additional Benefits Integration of multiple computing environments Automated access to distributed resources via other architectural components, e.g. web services and Grid technologies System functionality to assist with integration of heterogeneous components

Why not just use a script? Script does not specify low-level task scheduling and communication May be platform-dependent Can’t be easily reused May not have sufficient documentation to be adapted for another purpose 24

Why is a GUI useful? No need to learn a programming language Visual representation of what workflow does Allows you to monitor workflow execution Enables user interaction Facilitates sharing of workflows 25

Some workflow systems Kepler SCIRun Sciflo Triana Taverna Pegasus Some commercial tools: –Windows Workflow Foundation –Mac OS X Automator See reading for this week 26

Recall forms of information Structured/ un-structured Presentation and organization Syntax-semantics-pragmatics Managed, designed and architected. Goal of this part of the class is to understand how discovery is enabled or disabled based on these factors 27

Discovery How does someone find your information? How would you provide discovery of –collections –files –‘bits’ How would you find -> 28

Discovery o Federated Search o Folksonomies (user contributed) o Intelligent Agents o Search Engines o Taxonomies o Find photos of Kim o Boy or girl? 29

Use cases Find a sound recording of a swallow. Excuse me? 30

Use cases Find a sound recording of an African Swallow Find a sound recording of a bird that sounds like an African Swallow Media types – how can you discover them? 31

Use cases Find the movie that Jean Tripplehorn first starred in/ that was her most successful/ was lead actress? Has anyone gene sequenced a mouse? Discovery can often involve information integration 32

33 Three level ‘metadata’ solution for DATA Level 1: Data Registration at the Discovery Level, e.g. Volcano location and activity Level 2: Data Registration at the Inventory Level, e.g. list of datasets, times, products Level 3: Data Registration at the Item Detail Level, e.g. access to individual quantities Ontology based Data Integration Using scientific workflows Earth Sciences Virtual Database A Data Warehouse where Schema heterogeneity problem is Solved; schema based integration Data DiscoveryData Integration A.K.Sinha, Virginia Tech, 2006

34 Three level ‘metadata’ solution? Level 1: Registration at the Discovery Level, e.g. Find the upper level entry point to a source Level 2: Registration at the Inventory Level, e.g. list of datasets, using the logical organization Level 3: Registration at the Item Detail Level, i.e. annotation e.g. tagging Integration using mapping management Catalog/ Index Schema based integration Information Discovery Information Integration A.K.Sinha, Virginia Tech, 2006

Information discovery What makes discovery work? –Metadata –Logical organization –Attention to the fact that someone would want to discover it –It turns out that file types are a key enabler or inhibitor to discovery What does not work? –Result ranking using *any* conventional algorithm 35

Federated search “is the simultaneous search of multiple online databases or web resources and is an emerging feature of automated, web-based library and information retrieval systems. It is also often referred to as a portal or a federated search engine.” wikipedia Libraries have been doing this for a long time (Z39.50, ISO23950) Key is consistent search metadata fields (keywords) E.g. Geospatial One Stop 36

Search engines (1) Contains an automated spider or crawler No theoretical limits in the amount of indexing (limited by hardware) Support remote indexing Continual background indexing of content Custom metatag support (some low-end products do not support this feature) Support for indexing PDF,.doc, etc (some low-end products do not support this feature) Supports URL and word exclusions & inclusions 37

Search engines (2) Server-Side Includes (SSI) supported Search by custom metatags Case sensitive or insensitive searching Simple Customizable search/results pages Boolean Searching capabilities Provide users meta description and page title in search results Inexpensive – ~$200 (2010) Easily customizable search/results interface 38

Search engines (3) Result weighting feature URL Inclusion list Require significant memory (RAM) and disk space as the collection grows Low-end alternatives often do not possess the capabilities to do phrase or natural language searching. 39

Improve www discovery? Implement metatags on your and your partners web sites Update content frequently Register your site with the major search engines (tools exist to aid in this process) Perform a basic study of where your site results within the major search engine providers Do not spam the search engine providers Re-evaluate your web site directory structure to ensure information is appropriately categorized/ described within your URL strings 40

Improve www discovery Look through your server log files to determine what users are trying to find on your site and/or the path they are using to find information Perform basic usability testing of your site to determine what users expect and can easily gather from your site. This also may determine why users go to an Internet search engine provider versus accessing your site directly. Realize that Internet search engines don’t all act the same, index at the same time period, and often value a particular metatag, document date, etc. more than another vendor product. 41

Smart search Semantically aware search, e.g. (Water -> Semantic Search) Faceted search, e.g. mspace ( ), Earth System Grid ( ), exhibit (MIT) 42

NOESIS 43

Faceted search Semantically aware search, e.g. Faceted search, e.g. mspace ( ), Earth System Grid ( ) 44

Summary - discovery Useful to write a few discovery use cases to drive how your design is developed Evolution of your role in facilitating discovery and what/ how others implement access to your information 45

Reading for this week Is retrospective 46

Check in for Project Assignment Analysis of existing information system content and architecture, critique, redesign and prototype redeployment Or a new use case, development, etc. 47

What is next No class next week April 12 – GM week Week 10 – Information integration, life-cycle and visualization 48