The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.

Slides:



Advertisements
Similar presentations
웹 서비스 개요.
Advertisements

Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
HATHI TRUST A Shared Digital Repository Delivering Data For New Generations of Research Strategies and Challenges Jeremy York NISO/BISG Forum ALA 2010.
Improving Learning Object Description Mechanisms to Support an Integrated Framework for Ubiquitous Learning Scenarios María Felisa Verdejo Carlos Celorrio.
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
The Documentum Team Lance Callaway, Brooke Durbin, Perry Koob, Lorie McMillin, Jennifer Song Missouri University of Science and Technology Rolla, Missouri.
Key-word Driven Automation Framework Shiva Kumar Soumya Dalvi May 25, 2007.
University of Illinois Visualizing Text Loretta Auvil UIUC February 25, 2011.
IEC Substation Configuration Language and Its Impact on the Engineering of Distribution Substation Systems Notes Dr. Alexander Apostolov.
University of Illinois OCR Workshop Loretta Auvil UIUC October 18, 2011.
Information Retrieval in Practice
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Presented by IBM developer Works ibm.com/developerworks/ 2006 January – April © 2006 IBM Corporation. Making the most of Creating Eclipse plug-ins.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
University of Illinois Role of Mashups, Cloud Computing, and Parallelism for Visual Analytics Loretta Auvil.
Course Instructor: Aisha Azeem
SEASR Overview Loretta Auvil, Boris Capitanu National Center for Supercomputing Applications University of Illinois at Urbana-Champaign
Overview of Search Engines
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
SEASR Analytics and Zotero University of Illinois at Urbana-Champaign.
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
Mihir Daptardar Software Engineering 577b Center for Systems and Software Engineering (CSSE) Viterbi School of Engineering 1.
Introduction to MDA (Model Driven Architecture) CYT.
SEASR Applications and Future Work University of Illinois at Urbana-Champaign.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
Installation and Development Tools National Center for Supercomputing Applications University of Illinois at Urbana-Champaign The SEASR project and its.
SEASR Analytics for Zotero Loretta Auvil Automated Learning Group Data-Intensive Technologies and Applications, National Center for.
OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger Life Cycle Architecture Review.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Visualization Workshop David Bock Visualization Research Programmer National Center for Supercomputing Applications - NCSA University of Illinois at Urbana-Champaign.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
Meandre Workbench National Center for Supercomputing Applications University of Illinois at Urbana-Champaign.
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation Meandre Workbench National Center for Supercomputing.
Nature Reviews/2012. Next-Generation Sequencing (NGS): Data Generation NGS will generate more broadly applicable data for various novel functional assays.
L JSTOR Tools for Linguists 22nd June 2009 Michael Krot Clare Llewellyn Matt O’Donnell.
SEASR Analytics Loretta Auvil Automated Learning Group Data-Intensive Technologies and Applications, National Center for Supercomputing.
The Astronomy challenge: How can workflow preservation help? Susana Sánchez, Jose Enrique Ruíz, Lourdes Verdes-Montenegro, Julian Garrido, Juan de Dios.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Installation - Plus Loretta Auvil National Center for Supercomputing Applications University of Illinois at Urbana-Champaign
Introduction to the Semantic Web and Linked Data
SCAPE Rainer Schmidt SCAPE Training Event September 16 th – 17 th, 2013 The British Library Building Scalable Environments Technologies and SCAPE Platform.
Design and Implementation of a Rationale-Based Analysis Tool (RAT) Diploma thesis from Timo Wolf Design and Realization of a Tool for Linking Source Code.
Tools and Deployment University of Illinois at Urbana-Champaign.
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
1 Technical & Business Writing (ENG-715) Muhammad Bilal Bashir UIIT, Rawalpindi.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
1 Ontolog OOR-BioPortal Comparative Analysis Todd Schneider 15 October 2009.
SEASR Overview Loretta Auvil, Boris Capitanu University of Illinois at Urbana-Champaign
SEASR Analytics and Zotero University of Illinois at Urbana-Champaign.
Creating Zotero Flows Data-Intensive Technologies and Applications, National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Ganga/Dirac Data Management meeting October 2003 Gennady Kuznetsov Production Manager Tools and Ganga (New Architecture)
Apache Struts Technology A MVC Framework for Java Web Applications.
5/29/2001Y. D. Wu & M. Liu1 Content Management for Digital Library May 29, 2001.
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
GUILLOU Frederic. Outline Introduction Motivations The basic recommendation system First phase : semantic similarities Second phase : communities Application.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Deployment of Flows Loretta Auvil
UML Diagrams By Daniel Damaris Novarianto S..
SEASR & Meandre for Second Generation Digital Libraries
Knowledge Management Systems
Installation - Plus Loretta Auvil
SEASR Overview Loretta Auvil, Boris Capitanu
UML Diagrams Jung Woo.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Guided Research: Intelligent Contextual Task Support for Mails
Presentation transcript:

The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National Center for Supercomputing Applications University of Illinois at Urbana-Champaign [lauvil or

Outline Overview of Workshop SEASR Overview and Motivation Team Presentations –Digital Humanities Observatory, Susan Schreibman –Brown University, Andrew Ashton –JSTOR, Clare Llewellyn, Michael Krot –VUE, Anoop Kumar New SEASR Data Flows and Components Future

SEASR Overview

SEASR This project will focus on developing, integrating, deploying, and sustaining a set of reusable and expandable software components and a supporting framework, SEASR that will benefit a broad set of data mining applications for scholars in humanities The key goals established for this effort are a set of software centric directives: –Support the development of a state-of-the-art software environment for unstructured data management and analysis of digital libraries, repositories and archives, as well as educational platforms that are expected to contribute to many of the humanities breakthroughs of the 21st century. –Support the continued development, expansion, and maintenance of end-to-end software system – user interfaces, workflow engines, data management, analysis and visualization tools, collaborative tools, and other software integrated into a complete environment SEASR – to bring the full power of data analytics to the scholars. –Support education and training for use of this software environment for analysis through workshops to promote its usage among scholars

Workshop Objective The objective of the workshop is to: Provide current status of SEASR Indicate where SEASR is headed Learn what you have done or are planning to do with SEASR

The SEASR Picture

SEASR Architecture

Data Driven Models

SEASR: Reach + Relevance + Reuse + Repeatability SEASR emphasizes flexibility, scalability, modularity, provides community hub and access to heterogeneous data and computational systems –Semantic driven environment for SOA interoperability –Encourages sharing and participation for building communities –Modular construction allows flows to be modified and configured to encourage reusability within and across domains –Enables a mashup and integration of tools –Data-intensive flows can be executed on a simple desktop or a large cluster(s) without modification –Computation can be created for distributed execution on servers where the content lives –User accessibility to control trust and compliance with required copyright license of content –Relies on standardized Resource Description Framework (RDF) to define components and flow

SEASR Enables Scholarly Research Discovery –What hypothesis or rules can be generated by the “features” of the corpus? –What “features” or language of the corpus best describes the corpus? –What are the “similarities” between elements, documents, or corpuses to each other? –What patterns can be identified?

Enables Humanist to Ask… Pattern identification using automated learning –Which patterns are characteristic of the English language? –Which patterns are characteristic of a particular author, work, topic, or time? –Which patterns based on words, phrases, sentences, etc. can be extracted from literary bodies; –Which patterns are identified based on grammar or plot constructs? –When are correlated patterns meaningful? –Can they be categorized based on specific criteria? –Can an author’s intent be identified given an extracted pattern?

Work – Tag Cloud Counts tokens Several different filtering options supported

Work – Dunning Loglikelihood Feature Comparison of Tokens Specify an analysis document/collection Specify a reference document/collection Perform Statistics comparison using Dunning Loglikelihood Example showing over-represented Analysis Set: The Project Gutenberg EBook of A Tale of Two Cities, by Charles Dickens Reference Set: The Project Gutenberg EBook of Great Expectations, by Charles Dickens Example showing over-represented Analysis Set: The Project Gutenberg EBook of A Tale of Two Cities, by Charles Dickens Reference Set: The Project Gutenberg EBook of Great Expectations, by Charles Dickens

Work – Date Entities to Simile Timeline Entity Extraction with OpenNLP Dates viewed on Simile Timeline Locations viewed on Google Map

Work – Text Clustering Clustering of Text by token counts Filtering options for stop words, Part of Speech Dendogram Visualization

Meandre: Infrastructure SEASR/Meandre Infrastructure: –Dataflow execution paradigm –Semantic-web driven –Web Oriented –Supports publishing services –Modular components –Encapsulation and execution mechanism –Promotes reuse, sharing, and collaboration The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation

Meandre: Semantic Web Concepts Relies on the usage of the resource description framework (RDF) which uses simple notation to express graph relations written usually as XML to provide a set of conventions and common means to exchange information Provides a common framework to share and reuse data across application, enterprise, and community boundaries Focuses on common formats for integration and combination of data drawn from diverse sources Pays special attention to the language used for recording how the data relates to real world objects Allows navigation to sets of data resources that are semantically connected. The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation

Meandre: Dataflow Example Dataflow Addition Example –Logical Operation ‘+’ –Requires two inputs –Produces one output When two inputs are available –Logical operation can be preformed –Sum is output When output is produced –Reset internal values –Wait for two new input values to become available The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation Value1 Value2 Sum

Meandre: Create, Publish, & Share “Components” and “Flows” have RDF descriptors –Easily shared, fosters sharing, & reuse –Allow machines to read and interpret –Independent of the implementations –Combine different implementation & platforms –Components: Java, Python, Lisp, Web Services –Execution: On a Laptop or a High Performance Cluster A “Location” is RDF descriptor of one to many components, one to many flows, and their implementations The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation

Meandre: Repository & Locations Each location represents a set components/flows Users can –Combine different locations together –Create components –Assemble flows –Share components and flows Repositories Help –Administrate complex environments –Organize components and flows The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation

Meandre: Programming Paradigm The programming paradigm creates complex tasks by linking together a bunch of specialized components. Meandre's publishing mechanism allows components developed by third parties to be assembled in a new flow. There are two ways to develop flows : –Meandre’s Workbench visual programming tool –Meandre’s ZigZag scripting language The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation

Locations Components Flows Meandre: Workbench Existing Flow The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation Web-based UI Components and flows are retrieved from server Additional locations of components and flows can be added to server Create flow using a graphical drag and drop interface Change property values Execute the flow

Meandre: ZigZag Script Language ZigZag is a simple language for describing data- intensive flows – Modeled on Python for simplicity. –ZigZag is declarative language for expressing the directed graphs that describe flows. Command-line tools allow ZigZag files to compile and execute. –A compiler is provided to transform a ZigZag program (.zz) into Meandre archive unit (.mau). –Mau(s) can then be executed by a Meandre engine. The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation

Work – Zotero Plugin to Firefox Zotero manages the collection Launch SEASR Analytics –Citation Analysis uses the JUNG network importance algorithms to rank the authors in the citation network that is exported as RDF data from Zotero to SEASR –Zotero Export to Fedora through SEASR –Saves results from SEASR Analytics to a Collection

Repository Search & Browse Web Service Interactive Web Application Zotero Upload to Repository Work – Fedora

Community Hub Explore existing flows to find others of interest –Keyword Cloud –Connections Find related flows Execute flow Comments

Detail View of Application Detail View with Related Flows

DHSI Course Materials MondayTuesdayWednesdayThursdayFriday SEASR Overview Overview of Course SEASR Overview and Motivation SEASR Architecture Introduction of Meandre SEASR Community Hub Example Applications Meandre Workbench Meandre Data Flows Overview of Workbench Overview of Repositories Constructing Flows SEASR Analytics for Zotero Demonstrations of SEASR Analytic Interaction between Zotero and SEASR Installation and Development Tools Installation Community Collaboration Tools Architecture Details Overview of Development Tools Future SEASR Central Future Meandre Features Future Meandre Workbench Features Google Books Attendee Plan Presentations Course Wrap-up Text Analytics Overview of Text Analytics Text Clustering Frequent Patterns Analysis Entity Extraction Meandre Server Interface SEASR Applications Audio Analytics: NEMA: Blinkie Text Analytics: Monk Emotion Tracking Creating Zotero Flows Configuration Mechanism Specific Web Service Components Zotero-enabled Flows Deployment of Flows Overview of ZigZag Parallelization Example ZigZag flows Zotero and Fedora