SEASR Overview Loretta Auvil, Boris Capitanu

Slides:



Advertisements
Similar presentations
HATHI TRUST A Shared Digital Repository Delivering Data For New Generations of Research Strategies and Challenges Jeremy York NISO/BISG Forum ALA 2010.
Advertisements

Improving Learning Object Description Mechanisms to Support an Integrated Framework for Ubiquitous Learning Scenarios María Felisa Verdejo Carlos Celorrio.
Database System Concepts and Architecture
University of Illinois Visualizing Text Loretta Auvil UIUC February 25, 2011.
University of Illinois OCR Workshop Loretta Auvil UIUC October 18, 2011.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
DEV392: Extending SharePoint Products And Technologies Through Web Parts And ASP.NET Clint Covington, Program Manager Data And Developer Services - Office.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
University of Illinois Role of Mashups, Cloud Computing, and Parallelism for Visual Analytics Loretta Auvil.
C++ Training Datascope Lawrence D’Antonio Lecture 11 UML.
SEASR Overview Loretta Auvil, Boris Capitanu National Center for Supercomputing Applications University of Illinois at Urbana-Champaign
1st Project Introduction to HTML.
SEASR Analytics and Zotero University of Illinois at Urbana-Champaign.
Java Programming, 2E Introductory Concepts and Techniques Chapter 1 An Introduction to Java and Program Design.
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.
More HTRC Loretta Auvil, Boris Capitanu University of Illinois at Urbana-Champaign
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
SEASR Applications and Future Work University of Illinois at Urbana-Champaign.
Installation and Development Tools National Center for Supercomputing Applications University of Illinois at Urbana-Champaign The SEASR project and its.
SEASR Analytics for Zotero Loretta Auvil Automated Learning Group Data-Intensive Technologies and Applications, National Center for.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Linking Tasks, Data, and Architecture Doug Nebert AR-09-01A May 2010.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
Meandre Workbench National Center for Supercomputing Applications University of Illinois at Urbana-Champaign.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation Meandre Workbench National Center for Supercomputing.
SEASR Analytics Loretta Auvil Automated Learning Group Data-Intensive Technologies and Applications, National Center for Supercomputing.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Installation - Plus Loretta Auvil National Center for Supercomputing Applications University of Illinois at Urbana-Champaign
Mashups and Dashboards National Center for Supercomputing Applications University of Illinois at Urbana-Champaign.
Tools and Deployment University of Illinois at Urbana-Champaign.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Visualizations, Mashups and Dashboards University of Illinois at Urbana-Champaign.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
SEASR Overview Loretta Auvil, Boris Capitanu University of Illinois at Urbana-Champaign
HTRC Loretta Auvil, Boris Capitanu University of Illinois at Urbana-Champaign
SEASR Analytics and Zotero University of Illinois at Urbana-Champaign.
V7 Foundation Series Vignette Education Services.
General Architecture of Retrieval Systems 1Adrienn Skrop.
HTML PROJECT #1 Project 1 Introduction to HTML. HTML Project 1: Introduction to HTML 2 Project Objectives 1.Describe the Internet and its associated key.
INTRO. To I.T Razan N. AlShihabi
Information Retrieval in Practice
Introduction to Oracle Forms Developer and Oracle Forms Services
Introduction to Visual Basic. NET,. NET Framework and Visual Studio
Databases (CS507) CHAPTER 2.
Deployment of Flows Loretta Auvil
UML Diagrams By Daniel Damaris Novarianto S..
SEASR & Meandre for Second Generation Digital Libraries
Introduction to Oracle Forms Developer and Oracle Forms Services
Introduction to Advanced Java Programming
Project 1 Introduction to HTML.
Introduction to Oracle Forms Developer and Oracle Forms Services
Installation - Plus Loretta Auvil
Chapter 18 MobileApp Design
UML Diagrams Jung Woo.
The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Tools of Software Development
Lecture 1: Multi-tier Architecture Overview
An Introduction to Software Architecture
BUILDING A DIGITAL REPOSITORY FOR LEARNING RESOURCES
Guided Research: Intelligent Contextual Task Support for Mails
AI Discovery Template IBM Cloud Architecture Center
Web Application Development Using PHP
SDMX IT Tools SDMX Registry
Presentation transcript:

SEASR Overview Loretta Auvil, Boris Capitanu University of Illinois at Urbana-Champaign lauvil@illinois.edu, capitanu@ncsa.uiuc.edu

Overview of Course Monday Tuesday Wednesday Thursday Friday MORNING SEASR Overview Overview of Course SEASR Overview and Motivation Example SEASR Analytics and Applications SEASR Architecture Introduction of Meandre SEASR Community Hub Attendee Project Plan Text Analytics Dunning Loglikelihood Comparison Entity Extraction Spell Checking Text Analytics: Monk Attendee Project Work SEASR Analytics for Zotero Demonstrations of SEASR Analytics Use of SEASR services with Zotero and VUE Creating Zotero Flows Configuration Mechanism Web Service Components Zotero-enabled Flows VUE-enabled Flows Visualizations, Mashups & Dashboards Visualization SEASR as a service Other services Text App: Correlation & Ngram Viewer Text App: ProseVis SEASR Apps & Future Work Audio Analytics: NEMA Text Analytics: HTRC SEASR Central Future Meandre Features Future Meandre Workbench Features Attendee Plan Presentations AFTERNOON Meandre Workbench, Installation & NLP Overview Overview of Workbench Overview of Repositories Designing and Constructing Flows Installation NLP Overview & Examples More Text Analytics Emotion Tracking Concept Tracking Topic Modeling Modify project plan Identify data set Identify analysis Tools & Deployment Community Collaboration Tools Architecture Details Overview of Development Tools Overview of ZigZag Parallelization Example ZigZag flows with Zotero, VUE and Fedora

Outline Overview of Course SEASR Overview and Motivation Example SEASR Analytics and Applications Attendee Plan SEASR Architecture Introduction of Meandre SEASR Community Hub Hands-On

SEASR Overview

SEASR Project This project will focus on developing, integrating, deploying, and sustaining a set of reusable and expandable software components and a supporting framework, to benefit a broad set of data mining applications for scholars in humanities.

Key Goals Support the development of a state-of-the-art software environment for unstructured data management and analysis of digital libraries, repositories and archives, as well as educational platforms that are expected to contribute to many of the humanities breakthroughs of the 21st century. Support the continued development, expansion, and maintenance of end-to-end software system – user interfaces, workflow engines, data management, analysis and visualization tools, collaborative tools, and other software integrated into a complete environment SEASR – to bring the full power of data analytics to the scholars.  Support education and training for use of this software environment for analysis through workshops to promote its usage among scholars.

The SEASR Picture

Workshop Objective The objective of the workshop is: To explain and demonstrate the utility of SEASR for digital humanities, and to bring you to a point where you could deploy, contribute and utilize the SEASR environment.

Workshop Goals The goals of the workshop are: LEARN: Provide a detailed understanding of the SEASR framework LEARN: Provide a foundation and examples for participant teams to use SEASR in a study or inquiry ADOPT: Share participant generated research plans to utilize SEASR INSTALL: Provide detailed instructions on how to install, build components, integrate existing applications, and maintain the SEASR environment SUPPORT: Develop plans for resolution of issues raised by the user community in utilization of SEASR SUSTAIN: Develop a plan for community driven future development and dissemination of SEASR

SEASR @ Work – Tag Cloud Count tokens Filter options supported Stem words

SEASR @ Work – Ngram Tag Cloud Count multiple words Filter options Stem

SEASR @ Work – Dunning Loglikelihood Feature comparison of tokens Specify an analysis document/collection Specify a reference document/collection Perform statistics comparison using Dunning Loglikelihood Example showing over-represented Analysis Set: The Project Gutenberg EBook of A Tale of Two Cities, by Charles Dickens Reference Set: The Project Gutenberg EBook of Great Expectations, by Charles Dickens

SEASR @ Work – HITS Summarizer

SEASR @ Work – Entity Mash-up Entity Extraction with OpenNLP or Stanford NER Locations viewed on Google Map Dates viewed on Simile Timeline

SEASR @ Work – Entities To Network Identify entities Define relationships between entities within same sentence

SEASR @ Work – Topic Modeling Given: Set of documents Find: To reveal the semantic content in large collection of documents Usage: Mallet Topic Modeling tools Output: Shows the percentage of relevance for each document in each cluster Shows the key words and their counts for each topic

SEASR @ Work – Concept Mapping Goal is to have this type of Visualization to track emotions across a text document (Leveraging flare.prefuse.org)

SEASR @ Work – Audio Analysis NEMA: Executes a SEASR flow for each run Loads audio data Extracts features for every 10 sec moving window of audio Loads and applies the models Sends results back to the WebUI NESTER: Annotation of Audio via Spectral Analysis

SEASR @ Work – MONK Executes flows for each analysis requested Predictive modeling using Naïve Bayes Predictive modeling using Support Vector Machines (SVM) Feature comparisons

SEASR @ Work – Zotero Plugin to Firefox Zotero manages the collection Launch SEASR Analytics Citation Analysis uses the JUNG network importance algorithms to rank the authors in the citation network that is exported as RDF data from Zotero to SEASR Zotero Export to Fedora through SEASR Saves results from SEASR Analytics to a Collection Launch MONK Processing MONK DB Ingestion Workflow

TEI components for SEASR by Brown U http://teicomponents.wordpress.com/sample-flows/

Attendee Project Plan Explore tool usage during learning exercises Participate in discussion Design a project plan to use SEASR this week for some analysis Modify and develop the project plan over the week Present and discuss project plan and results on Friday

Attendee Project Plan (2) Study/Project Title Team Members and their Affiliation Procedural Outline of Study/Project Research Question/Purpose of Study Data Sources Analysis Tools Activity Timeline or Milestones Report or Project Outcome(s) Ideas on what your team needs from SEASR staff to help you achieve your goal.

SEASR Architecture

Meandre Data-Intensive Flows SEASR Architecture Visualizations User Interfaces Apps Plugins Web Apps Services Meandre Workbench Repositories Data Analysis Components Flows Meandre Data-Intensive Flows Components Developer Tools Data Analytics Visualization Component Repository Component Discovery Meandre Infrastructure Java Virtual Machine

Data Driven Models

SEASR: Reach + Relevance + Reuse + Repeatability SEASR emphasizes flexibility, scalability, modularity, provides community hub and access to heterogeneous data and computational systems Semantic driven environment for SOA interoperability Encourages sharing and participation for building communities Modular construction allows flows to be modified and configured to encourage reusability within and across domains Enables a mashup and integration of tools Data-intensive flows can be executed on a simple desktop or a large cluster(s) without modification Computation can be created for distributed execution on servers where the content lives User accessibility to control trust and compliance with required copyright license of content Relies on standardized Resource Description Framework (RDF) to define components and flow

Enables Humanist To ask key questions: What recurrent patterns would be of interest to literary scholars Which patterns are characteristic of the English language and which are characteristic of a particular author, work, topic, or time? Patterns based on words can be extracted from literary bodies; however, can patterns be extracted based on grammar or plot constructs? When are correlated patterns meaningful? Can they be organized based on such criteria? How can an author’s intentionality be assessed given an extracted pattern?

SEASR Enables Scholarly Research Discovery What hypothesis or rules can be generated by the “features” of the corpus? What “features” or language of the corpus best describes the corpus? What are the “similarities” between elements, documents, or corpuses to each other.

Meandre: Infrastructure SEASR/Meandre Infrastructure: Dataflow execution paradigm Semantic-web driven Web Oriented Supports publishing services Modular components Encapsulation and execution mechanism Promotes reuse, sharing, and collaboration

Meandre: Semantic Web Concepts Relies on the usage of the resource description framework (RDF) which uses simple notation to express graph relations written usually as XML to provide a set of conventions and common means to exchange information Provides a common framework to share and reuse data across application, enterprise, and community boundaries Focuses on common formats for integration and combination of data drawn from diverse sources Pays special attention to the language used for recording how the data relates to real world objects Allows navigation to sets of data resources that are semantically connected.

Meandre: Metadata Ontologies Meandre's metadata relies on three ontologies: The RDF ontology serves as a base for defining Meandre descriptors The Dublin Core Elements ontology provides basic publishing and descriptive capabilities in the description of Meandre descriptors The Meandre ontology describes a set of relationships that model valid components, as understood by the Meandre execution engine architecture

Meandre: Components in RDF @prefix meandre: <http://www.meandre.org/ontology/> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix dc: <http://purl.org/dc/elements/1.1/> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix : <#> . <http://dita.ncsa.uiuc.edu/meandre/e2k/components/limited-iterations> meandre:name "Limited iterations"^^xsd:string ; rdf:type meandre:executable_component ; dc:creator "Xavier Llora"^^xsd:string ; dc:date "2007-11-17T00:32:35"^^xsd:date ; dc:description "Allows only a limited number of iterations"^^xsd:string ; dc:format "java/class"^^xsd:string ; dc:rights "University of Illinois/NCSA Open Source License"^^xsd:string ; meandre:execution_context <http://norma.ncsa.uiuc.edu/public-dav/Meandre/demos/E2K/V1/resources/colt.jar> , <http://norma.ncsa.uiuc.edu/public-dav/Meandre/demos/E2K/V1/resources/gacore.jar> , <http://dita.ncsa.uiuc.edu/meandre/e2k/components/limited-iterations/implementation/> , <http://norma.ncsa.uiuc.edu/public-dav/Meandre/demos/E2K/V1/resources/gacore-meandre.jar> , <http://norma.ncsa.uiuc.edu/public-dav/Meandre/demos/E2K/V1/resources/formj2.0.jar> ; ... Existing Standards

Meandre: Components Types Components are the basic building block of any computational task. There are two kinds of Meandre components: Executable components Perform computational tasks that require no human interactions during runtime Processes are initialized during flow startup and are fired when in accordance to the policies defined for it. Control components Used to pause dataflow during user interaction cycles WebUI may be a HTML Form, Applet, or Other user interface

Meandre: Dataflow Example Dataflow Addition Example Logical Operation ‘+’ Requires two inputs Produces one output When two inputs are available Logical operation can be preformed Sum is output When output is produced Reset internal values Wait for two new input values to become available Value1 Value2 Sum Logical Operation Output Inputs

Meandre: Create, Publish, & Share “Components” and “Flows” have RDF descriptors Easily shared, fosters sharing, & reuse Allow machines to read and interpret Independent of the implementations Combine different implementation & platforms Components: Java, Python, Lisp, Web Services Execution: On a Laptop or a High Performance Cluster A “Location” is RDF descriptor of one to many components, one to many flows, and their implementations

Meandre: Repository & Locations Each location represents a set components/flows Users can Combine different locations together Create components Assemble flows Share components and flows Repositories Help Administrate complex environments Organize components and flows

Meandre: Metadata Properties Components and Flows share properties such as component name, creator, creation date, description, tags, and rights. Components specific metadata to describe the components' behavior, it’s location, type of implementation, firing policy, runnable, format, resource location, and execution context Flow specific metadata describes the directed graph of components, components instances, connectors, connector instance data port source, connector, instance data port target, connector instance source, connector instance target, instance name

Meandre: Programming Paradigm The programming paradigm creates complex tasks by linking together a bunch of specialized components. Meandre's publishing mechanism allows components developed by third parties to be assembled in a new flow. There are two ways to develop flows : Meandre’s Workbench visual programming tool Meandre’s ZigZag scripting language

Meandre: Workbench Existing Flow Web-based UI Components and flows are retrieved from server Additional locations of components and flows can be added to server Create flow using a graphical drag and drop interface Change property values Execute the flow Components Flows Locations

Meandre: ZigZag Script Language ZigZag is a simple language for describing data- intensive flows Modeled on Python for simplicity. ZigZag is declarative language for expressing the directed graphs that describe flows. Command-line tools allow ZigZag files to compile and execute. A compiler is provided to transform a ZigZag program (.zz) into Meandre archive unit (.mau). Mau(s) can then be executed by a Meandre engine.

Community Hub Explore existing flows to find others of interest Keyword Cloud Connections Find related flows Execute flow Comments

Community Hub: Keyword Cloud Design

Community Hub

Keyword Cloud Implementation Keyword Cloud functionality is currently implemented as a wordpress plugin

Detail View of Application Detail View with Related Flows

Community Hub: Connections Design

Demonstration Community Hub NEMA's Son of Blinkie Keyword Cloud Functionality Tag Cloud Viewer Ngram Tag Cloud Viewer HITS Summarizer Date Entity to Simile Timeline Location Entity to Google Map Google Search to Tag Cloud Viewer Entity to Protovis Network Graph Readability NEMA's Son of Blinkie

Learning Exercises: Community Hub Explore Community Hub's Keyword Cloud Functionality Open browser and go to http://seasr.org Click on "Keyword Cloud” (top left side, under Download) Click on "visualization" to see all the existing applications that have a tag of "visualization" Click on "cluster" to see all the existing applications that have a tag of "visualization" and ”tag cloud” Click on the delete button to remove ”tag cloud" from the selection Click on the "Tag Cloud Viewer" for more detail information about this application

Learning Exercises: Tag Cloud Viewer Perform analysis using "Tag Cloud Viewer" on a hard coded web page Use Community Hub to open the "Tag Cloud Viewer" page or open browser and go to http://seasr.org/documentation/example-flows/tag- cloud-viewer/ Click on the "Execute" button to launch the creation of a tag cloud view for "Emma" by Jane Austen retrieved from Project Gutenberg

Learning Exercises: Tag Cloud Viewer Perform analysis using Tag Cloud Viewer" on a webpage of your choice Use Community Hub to open the "Tag Cloud Viewer" page or open browser and go to http://seasr.org/documentation/example-flows/tag- cloud-viewer/ Find a web url that you are interested in analyzing Click on the "Custom Execute" button to launch the application where you can copy and paste a web url that you are interested in analyzing

Learning Exercises: Google Search Perform analysis using "Google Search to Tag Cloud Viewer" on a topic of your choice Use Community Hub to open the "Google Search to Tag Cloud Viewer" page or open browser and go to http://seasr.org/documentation/example- flows/google-search-to-tag-cloud-viewer/ Click on the "Custom Execute" button to launch the application where you can type your Google query for analysis

Attendee Project Plan Identify Research Question Study/Project Title Team Members and their Affiliation Procedural Outline of Study/Project Research Question/Purpose of Study Data Sources Analysis Tools Activity Timeline or Milestones Report or Project Outcome(s) Ideas on what your team needs from SEASR staff to help you achieve your goal. Identify Research Question

Discussion Questions Which kinds of data repositories do you utilize in your scholarly research? What analytical tools or applications do you utilize with these repositories?