SEASR & Meandre for Second Generation Digital Libraries

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

1 Ontolog OOR Use Case Review Todd Schneider 1 April 2010 (v 1.2)
HATHI TRUST A Shared Digital Repository Delivering Data For New Generations of Research Strategies and Challenges Jeremy York NISO/BISG Forum ALA 2010.
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
University of Illinois Visualizing Text Loretta Auvil UIUC February 25, 2011.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
DEV392: Extending SharePoint Products And Technologies Through Web Parts And ASP.NET Clint Covington, Program Manager Data And Developer Services - Office.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
University of Illinois Role of Mashups, Cloud Computing, and Parallelism for Visual Analytics Loretta Auvil.
C++ Training Datascope Lawrence D’Antonio Lecture 11 UML.
SEASR Overview Loretta Auvil, Boris Capitanu National Center for Supercomputing Applications University of Illinois at Urbana-Champaign
1st Project Introduction to HTML.
Overview of Search Engines
SEASR Analytics and Zotero University of Illinois at Urbana-Champaign.
February Semantion Privately owned, founded in 2000 First commercial implementation of OASIS ebXML Registry and Repository.
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.
Creating Business Workflow Using SharePoint Designer 2007 Presented by Tarek Ghazali IT Technical Specialist Microsoft SQL Server MVP Microsoft SQL Server.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
CPS120: Introduction to Computer Science The World Wide Web Nell Dale John Lewis.
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.
Zhonghua Qu and Ovidiu Daescu December 24, 2009 University of Texas at Dallas.
The Semantic Web Service Shuying Wang Outline Semantic Web vision Core technologies XML, RDF, Ontology, Agent… Web services DAML-S.
SEASR Applications and Future Work University of Illinois at Urbana-Champaign.
Installation and Development Tools National Center for Supercomputing Applications University of Illinois at Urbana-Champaign The SEASR project and its.
SEASR Analytics for Zotero Loretta Auvil Automated Learning Group Data-Intensive Technologies and Applications, National Center for.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
An Ontological Framework for Web Service Processes By Claus Pahl and Ronan Barrett.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
Meandre Workbench National Center for Supercomputing Applications University of Illinois at Urbana-Champaign.
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation Meandre Workbench National Center for Supercomputing.
SEASR Analytics Loretta Auvil Automated Learning Group Data-Intensive Technologies and Applications, National Center for Supercomputing.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Installation - Plus Loretta Auvil National Center for Supercomputing Applications University of Illinois at Urbana-Champaign
Tools and Deployment University of Illinois at Urbana-Champaign.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
1 Ontolog OOR-BioPortal Comparative Analysis Todd Schneider 15 October 2009.
SEASR Overview Loretta Auvil, Boris Capitanu University of Illinois at Urbana-Champaign
SEASR Analytics and Zotero University of Illinois at Urbana-Champaign.
HTML PROJECT #1 Project 1 Introduction to HTML. HTML Project 1: Introduction to HTML 2 Project Objectives 1.Describe the Internet and its associated key.
1 ODF and Web Mashups Basic techniques Rob Weir, IBM :15.
12. DISTRIBUTED WEB-BASED SYSTEMS Nov SUSMITHA KOTA KRANTHI KOYA LIANG YI.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
1 CASE Computer Aided Software Engineering. 2 What is CASE ? A good workshop for any craftsperson has three primary characteristics 1.A collection of.
TRIG: Truckee River Info Gateway Dave Waetjen Graduate Student in Geography Information Center for the Environement (ICE) University of California, Davis.
Website Source Code Free Download.
Information Retrieval in Practice
Building Enterprise Applications Using Visual Studio®
.NET Omid Darroudi.
Deployment of Flows Loretta Auvil
JRA2: Acceptance Testing senarious
Introduction to Visual Basic 2008 Programming
Introduction to Advanced Java Programming
Design and Manufacturing in a Distributed Computer Environment
Middleware independent Information Service
Project 1 Introduction to HTML.
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Installation - Plus Loretta Auvil
Chapter 18 MobileApp Design
SEASR Overview Loretta Auvil, Boris Capitanu
Introduction to J2EE Architecture
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
System And Application Software
Tools of Software Development
Lecture 1: Multi-tier Architecture Overview
An Introduction to Software Architecture
BUILDING A DIGITAL REPOSITORY FOR LEARNING RESOURCES
Business Process Management and Semantic Technologies
SDMX IT Tools SDMX Registry
Presentation transcript:

SEASR & Meandre for Second Generation Digital Libraries National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Automated Learning Group

SEASR This project focuses on developing, integrating, deploying, and sustaining a set of reusable and expandable software components and a supporting framework, SEASR will benefit a broad set of data mining applications for scholars in humanities The key goals established for this effort are a set of software centric directives: Support the development of a state-of-the-art software environment for unstructured data management and analysis of digital libraries, repositories and archives, as well as educational platforms that are expected to contribute to many of the humanities breakthroughs of the 21st century. Support the continued development, expansion, and maintenance of end-to-end software system – user interfaces, workflow engines, data management, analysis and visualization tools, collaborative tools, and other software integrated into a complete environment SEASR – to bring the full power of data analytics to the scholars.  Support education and training for use of this software environment for analysis through workshops to promote its usage among scholars

The SEASR Picture

SEASR Enables Scholarly Research Discovery What hypothesis or rules can be generated by the “features” of the corpus? What “features” or language of the corpus best describes the corpus? What are the “similarities” between elements, documents, or corpuses to each other.

SEASR Architecture

SEASR: Reach + Relevance + Reuse + Repeatability SEASR emphasizes flexibility, scalability, modularity, and access to heterogeneous data and computational systems Semantic driven environment for SOA interoperability Encourages sharing and participation for building communities Modular construction allows flows to be modified and configured to encourage reusability within and across domains Enables a mashup and integration of tools Data-intensive flows can be executed on a simple desktop or a large cluster(s) without modification Computation can be created for distributed execution on servers where the content lives

SEASR @ Work – Tag Cloud Counts tokens Several different filtering options supported

SEASR @ Work – Dunning Loglikelihood Feature Comparison of Tokens Specify an analysis document/collection Specify a reference document/collection Perform Statistics comparison using Dunning Loglikelihood Example showing over-represented Analysis Set: The Project Gutenberg EBook of A Tale of Two Cities, by Charles Dickens Reference Set: The Project Gutenberg EBook of Great Expectations, by Charles Dickens

SEASR @ Work – Date Entities to Simile Timeline Entity Extraction with OpenNLP Dates viewed on Simile Timeline Locations viewed on Google Map

SEASR @ Work – Text Clustering Clustering of Text by token counts Filtering options for stop words, Part of Speech Dendogram Visualization

SEASR @ Work – Audio Analysis NEMA: Executes a SEASR flow for each run Loads audio data Extracts features for every 10 sec moving window of audio Loads and applies the models Sends results back to the WebUI NESTER: Annotation of Audio via Spectral Analysis

SEASR @ Work – Zotero Plugin to Firefox Zotero manages the collection Launch SEASR Analytics Citation Analysis uses the JUNG network importance algorithms to rank the authors in the citation network that is exported as RDF data from Zotero to SEASR Zotero Export to Fedora through SEASR Saves results from SEASR Analytics to a Collection

SEASR @ Work – Fedora Repository Search & Browse Interactive Web Application Web Service Zotero Upload to Repository

Meandre: Infrastructure SEASR/Meandre Infrastructure: Dataflow execution paradigm Semantic-web driven Web Oriented Supports publishing services Modular components Encapsulation and execution mechanism Promotes reuse, sharing, and collaboration The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation

Meandre Data-Intensive Flows SEASR Apps SEASR Plugins SEASR Web Apps SEASR Services Meandre Data-Intensive Flows SEASR Components Data Analytics Visualization Developer Tools Gateway Connections Data Persistence Data Transformation Natural Lang Processing Descriptive Statistics Predictive Modeling Discovery Graphing Modeling Vis Info Vis (small multiples) Component Repository Component Discovery Meandre Infrastructure Shared Stores File Systems Metadata Stores SOA Gateways Virtualization Infrastructure

Meandre: Data Driven Execution Execution Paradigms Conventional programs perform computational tasks by executing a sequence of instructions. Data driven execution revolves around the idea of applying transformation operations to a flow or stream of data when it is available. Dataflow Approach May have zero to many inputs May have zero to many outputs Performs a logical operation when data is available The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation

Meandre: Dataflow Example Value1 Value2 Sum Logical Operation Output Inputs Dataflow Addition Example Logical Operation ‘+’ Requires two inputs Produces one output When two inputs are available Logical operation can be preformed Sum is output When output is produced Reset internal values Wait for two new input values to become available The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation

Meandre: The Dataflow Component Data dictates component execution semantics Inputs Outputs Component P Descriptor in RDF of its behavior The component implementation The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation 18

Meandre: Semantic Web Concepts Relies on the usage of the resource description framework (RDF) which uses simple notation to express graph relations written usually as XML to provide a set of conventions and common means to exchange information Provides a common framework to share and reuse data across application, enterprise, and community boundaries Focuses on common formats for integration and combination of data drawn from diverse sources Pays special attention to the language used for recording how the data relates to real world objects Allows navigation to sets of data resources that are semantically connected. The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation

Meandre: Metadata Ontologies Meandre's metadata relies on three ontologies: The RDF ontology serves as a base for defining Meandre descriptors The Dublin Core Elements ontology provides basic publishing and descriptive capabilities in the description of Meandre descriptors The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation

Meandre: Components in RDF @prefix meandre: <http://www.meandre.org/ontology/> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix dc: <http://purl.org/dc/elements/1.1/> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix : <#> . <http://dita.ncsa.uiuc.edu/meandre/e2k/components/limited-iterations> meandre:name "Limited iterations"^^xsd:string ; rdf:type meandre:executable_component ; dc:creator "Xavier Llora"^^xsd:string ; dc:date "2007-11-17T00:32:35"^^xsd:date ; dc:description "Allows only a limited number of iterations"^^xsd:string ; dc:format "java/class"^^xsd:string ; dc:rights "University of Illinois/NCSA Open Source License"^^xsd:string ; meandre:execution_context <http://norma.ncsa.uiuc.edu/public-dav/Meandre/demos/E2K/V1/resources/colt.jar> , <http://norma.ncsa.uiuc.edu/public-dav/Meandre/demos/E2K/V1/resources/gacore.jar> , <http://dita.ncsa.uiuc.edu/meandre/e2k/components/limited-iterations/implementation/> , <http://norma.ncsa.uiuc.edu/public-dav/Meandre/demos/E2K/V1/resources/gacore-meandre.jar> , <http://norma.ncsa.uiuc.edu/public-dav/Meandre/demos/E2K/V1/resources/formj2.0.jar> ; ... Existing Standards The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation

Meandre: Components Types Components are the basic building block of any computational task. There are two kinds of Meandre components: Executable components Perform computational tasks that require no human interactions during runtime Processes are initialized during flow startup and are fired when in accordance to the policies defined for it. Control components Used to pause dataflow during user interaction cycles WebUI may be a HTML Form, Applet, or Other user interface The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation

Meandre: Create, Publish, & Share “Components” and “Flows” have RDF descriptors Easily shared, fosters sharing, & reuse Allow machines to read and interpret Independent of the implementations Combine different implementation & platforms Components: Java, Python, Lisp, Web Services Execution: On a Laptop or a High Performance Cluster A “Location” is RDF descriptor of one to many components, one to many flows, and their implementations The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation

Meandre: Repository & Locations Each location represents a set components/flows Users can Combine different locations together Create components Assemble flows Share components and flows Repositories Help Administrate complex environments Organize components and flows The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation

Meandre: Metadata Properties Components and Flows share properties such as component name, creator, creation date, description, tags, and rights. Components specific metadata to describe the components' behavior, it’s location, type of implementation, firing policy, runnable, format, resource location, and execution context Flow specific metadata describes the directed graph of components, components instances, connectors, connector instance data port source, connector, instance data port target, connector instance source, connector instance target, instance name The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation

Meandre: Programming Paradigm The programming paradigm creates complex tasks by linking together a bunch of specialized components. Meandre's publishing mechanism allows components developed by third parties to be assembled in a new flow. There are two ways to develop flows : Meandre’s Workbench visual programming tool Meandre’s ZigZag scripting language The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation

Meandre: Workbench Existing Flow Web-based UI Components and flows are retrieved from server Additional locations of components and flows can be added to server Create flow using a graphical drag and drop interface Change property values Execute the flow Components Flows Locations The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation

Meandre: ZigZag Script Language ZigZag is a simple language for describing data- intensive flows Modeled on Python for simplicity. ZigZag is declarative language for expressing the directed graphs that describe flows. Command-line tools allow ZigZag files to compile and execute. A compiler is provided to transform a ZigZag program (.zz) into Meandre archive unit (.mau). Mau(s) can then be executed by a Meandre engine. The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation

DEMO Perform analysis using "Tag Cloud Viewer" on a hard coded web page Open browser and go to http://seasr.org/documentation/example-flows/tag-cloud-viewer/ Click on the "Execute" button to launch the creation of a tag cloud view for "Emma" by Jane Austen retrieved from Project Gutenberg

Discussion Questions What are data repositories that you utilize in your scholarly research? What tools or applications are being utilized against these repositories?