SEASR Analytics Loretta Auvil Automated Learning Group Data-Intensive Technologies and Applications, National Center for Supercomputing.

Slides:



Advertisements
Similar presentations
Collaborative e-Portfolios
Advertisements

Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
HATHI TRUST A Shared Digital Repository Delivering Data For New Generations of Research Strategies and Challenges Jeremy York NISO/BISG Forum ALA 2010.
Use Watch folders to automatically add PDFs to Mendeley Desktop.
History Study Center Primary and secondary sources documenting global history 2010.
Welcome to informaworld TM. The following demo will show you just a few of the features on informaworld TM. Please select where you would like start. ePublication.
For Details Visit : or For any Help Contact the Librarian EBSCOhost 2.0.
Business Development Suit Presented by Thomas Mathews.
Introduction to Mendeley. What is Mendeley? Mendeley is a reference manager allowing you to manage, read, share, annotate and cite your research papers...
Reference Management Software Tools Mendeley. Table of Contents: Part A Background/Location Signup/Login Import References Organize (Manage) References.
Google Chrome & Search C Chapter 18. Objectives 1.Use Google Chrome to navigate the Word Wide Web. 2.Manage bookmarks for web pages. 3.Perform basic keyword.
1 SUBJECT DATABASES ENGLISH 115 Hudson Valley Community College Marvin Library Learning Commons.
University of Illinois Visualizing Text Loretta Auvil UIUC February 25, 2011.
Your online classroom. Powerhouse Campus o Custom Class dashboards o Links with Moodle, Studywiz, Bb, ClickView & all web apps o Links your school library.
Using ProQuest Databases Jackson Community College Atkinson Library.
SEASR Overview Loretta Auvil, Boris Capitanu National Center for Supercomputing Applications University of Illinois at Urbana-Champaign
Management of information. Objectives Discuss the benefits of good management practice Present reference management tools Present bookmark management.
Managing references : Mendeley
Use Watch folders to automatically add PDFs to Mendeley Desktop. When you place a document in a watched folder, it will be automatically added to Mendeley.
2. Introduction to the Visual Studio.NET IDE 2. Introduction to the Visual Studio.NET IDE Ch2 – Deitel’s Book.
Welcome to the Minnesota SharePoint User Group. Introductions / Overview Project Tracking / Management / Collaboration via SharePoint Multiple Audiences.
Section 13.1 Add a hit counter to a Web page Identify the limitations of hit counters Describe the information gathered by tracking systems Create a guest.
SEASR Analytics and Zotero University of Illinois at Urbana-Champaign.
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
Crystal Hoyer Program Manager IIS Team Preview of features that will be announced at MIX09 Please do not blog, take pictures or video of session.
Web 2.0: Concepts and Applications 4 Organizing Information.
Getting started on informaworld™ How do I register my institution with informaworld™? How is my institution’s online access activated? What do I do if.
Library Workshop for EPA Sep Outline 2 Find Library resources for research  iSearch  ProQuest Education Databases RefWorks – a web-based.
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation SEASR Overview Loretta Auvil and Bernie Acs National.
Introduction to Mendeley. What is Mendeley? Mendeley is a reference manager allowing you to manage, read, share, annotate and cite your research papers...
SEASR Applications and Future Work National Center for Supercomputing Applications University of Illinois at Urbana-Champaign.
Tutorial 1: Browser Basics.
Using the University of Northampton Library A student guide Please note: The slides are animated but you need to click to move on to each new slide.
SEASR Applications and Future Work University of Illinois at Urbana-Champaign.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
Installation and Development Tools National Center for Supercomputing Applications University of Illinois at Urbana-Champaign The SEASR project and its.
SEASR Analytics for Zotero Loretta Auvil Automated Learning Group Data-Intensive Technologies and Applications, National Center for.
Meandre Workbench National Center for Supercomputing Applications University of Illinois at Urbana-Champaign.
Digital Commons & Open Access Repositories Johanna Bristow, Strategic Marketing Manager APBSLG Libraries: September 2006.
LOGO A comparison of two web-based document management systems ShaoxinYu Columbia University March 31, 2009.
The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation Meandre Workbench National Center for Supercomputing.
L JSTOR Tools for Linguists 22nd June 2009 Michael Krot Clare Llewellyn Matt O’Donnell.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
 2002 Prentice Hall. All rights reserved. 1 Chapter 2 – Introduction to the Visual Studio.NET IDE Outline 2.1Introduction 2.2Visual Studio.NET Integrated.
SEASR Overview Loretta Auvil, Boris Capitanu University of Illinois at Urbana-Champaign
HTRC Loretta Auvil, Boris Capitanu University of Illinois at Urbana-Champaign
SEASR Analytics and Zotero University of Illinois at Urbana-Champaign.
Creating Zotero Flows Data-Intensive Technologies and Applications, National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign.
WebDat: A Web-based Test Data Management System J.M.Nogiec January 2007 Overview.
Organize. Collaborate. Discover. 1 Introduction to Mendeley.
Microsoft Office 2008 for Mac – Illustrated Unit D: Getting Started with Safari.
Next Welcome Personalization Home Citing, Printing, and Sharing Search and Filter Results Browse Content by Subfield Saved Content Full Text View More.
IE 411/511: Visual Programming for Industrial Applications Lecture Notes #2 Introduction to the Visual Basic Express 2010 Integrated Development Environment.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
EndNote Ver.X7: A Reference Management Software
TOPSpro Special Topics VI:TOPSpro for Instructors.
AEM Digital Asset Management - DAM Author : Nagavardhan
Deployment of Flows Loretta Auvil
SEASR & Meandre for Second Generation Digital Libraries
CONTENT MANAGEMENT SYSTEM CSIR-NISCAIR, New Delhi
Summon discovers contents from one search box!
SEASR Overview Loretta Auvil, Boris Capitanu
Built by Schools for Schools
Chapter 2 – Introduction to the Visual Studio .NET IDE
Download from Zotero Home Page
USER MANUAL - WORLDSCINET
Lab 2: Information Retrieval
USER MANUAL - WORLDSCINET
Presentation transcript:

SEASR Analytics Loretta Auvil Automated Learning Group Data-Intensive Technologies and Applications, National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation

SEASR Overview

SEASR Focus Project’s focus: –Supporting framework –Developing –Integrating –Deploying –Sustaining a set of Reusable and Expandable software components and SEASR can provide benefit a broad set of data mining applications for scholars in humanities

SEASR Goals The key goals are: –Support the development of a state-of-the-art software environment for unstructured data management and analysis of digital libraries, repositories and archives –Develop user interfaces, a data-flow engine and the data-flows that data management, analysis and visualization –Support education and training through workshops to promote its usage among scholars

Workshop Objective The objective of the workshop is to: Introduction of SEASR Learn what analytics SEASR can do

The SEASR Picture

SEASR Enables Scholarly Research Discovery –What are the words used in the corpus? –What named entities (people, locations, dates) can be extracted? –What hypothesis or rules can be generated by the “features” of the corpus? –What “features” or language of the corpus best describes the corpus? –What are the “similarities” between elements, documents, or corpuses to each other? –What patterns can be identified?

Enables Scholar to Ask… Pattern identification using automated learning –Which patterns are characteristic of the English language? –Which patterns are characteristic of a particular author, work, topic, or time? –Which patterns based on words, phrases, sentences, etc. can be extracted from literary bodies? –Which patterns are identified based on grammar or plot constructs? –When are correlated patterns meaningful? –Can they be categorized based on specific criteria? –Can an author’s intent be identified given an extracted pattern?

Tag Cloud Counts tokens Several different filtering options supported

Flesch-Kincaid Readability Test Results show scores for each item selected –Designed to indicate comprehension difficulty when reading a passage of contemporary academic English –Flesch Reading Ease: higher scores indicate material that is easier to read; lower numbers mark passages that are more difficult to read –Flesch–Kincaid Grade Level: result is a number that corresponds with a grade level

Dunning Loglikelihood Feature comparison of tokens Specify an analysis document/collection Specify a reference document/collection Perform Statistics comparison using Dunning Loglikelihood Example showing over-represented Analysis Set: The Project Gutenberg EBook of A Tale of Two Cities, by Charles Dickens Reference Set: The Project Gutenberg EBook of Great Expectations, by Charles Dickens Example showing over-represented Analysis Set: The Project Gutenberg EBook of A Tale of Two Cities, by Charles Dickens Reference Set: The Project Gutenberg EBook of Great Expectations, by Charles Dickens

Date Entities to Simile Timeline Entity Extraction with OpenNLP Dates viewed on Simile Timeline

Frequent Patterns Given: Set of documents Find Frequent Patterns such that –Common words patterns used in the collection Evaluation: What Is Good Patterns? Results: 1060 patterns discovered 322: Lincoln 147: Abe 117: man 100: Mr. 100: time 98: Lincoln Abe 91: father 85: Lincoln Mr. 85: Lincoln man 75: day 70: Abraham 70: President 68: boy 67: Lincoln time 65: Lincoln Abraham 65: life 63: Lincoln father 57: men 57: work 52: Lincoln day … 322: Lincoln 147: Abe 117: man 100: Mr. 100: time 98: Lincoln Abe 91: father 85: Lincoln Mr. 85: Lincoln man 75: day 70: Abraham 70: President 68: boy 67: Lincoln time 65: Lincoln Abraham 65: life 63: Lincoln father 57: men 57: work 52: Lincoln day …

HITS Summarizer Find the top sentences and tokens from all items submitted

Text Clustering Clustering of Text by token counts Filtering options for stop words, Part of Speech Dendogram Visualization

NEMA: Executes a SEASR flow for each run –Loads audio data –Extracts features for every 10 sec moving window of audio –Loads and applies the models –Sends results back to the WebUI NESTER: Annotation of Audio via Spectral Analysis Audio Analysis

Emotion Tracking Goal is to have this type of Visualization to track emotions across a text document (Leveraging flare.prefuse.org)

Future: Application for Meme “MemeTracker builds maps of the daily news cycle by analyzing around 900,000 news stories and blog posts per day from 1 million online sources, ranging from mass media to personal blogs”

Where can I Run SEASR Analysis Services that can be executed from –SEASR website –Zotero –MONK –VUE

SEASR Community Hub Explore existing flows to find others of interest –Keyword Cloud –Connections Find related flows Execute flow Comments

What is Zotero? (from Zotero Quick Start Guide) A citation manager. It is designed to store, manage, and cite bibliographic references, such as books and articles. In Zotero, each of these references constitutes an item. An extension for the Firefox web-browser by the Center for History and New Media at George Mason University. Installed by visiting zotero.org and clicking the download button on the page.

SEASR Analytics for Zotero An extension for the Firefox web-browser by the SEASR Team Uses your Zotero Collections Performs analysis using SEASR Services

The Value Add for SEASR & Zotero Analytical Results are saved as Zotero items (View Snapshot) –Includes metadata –Item naming strategy identifies the item or collection processed –Creator indicates the Menu Label of the SEASR Analysis Related Tab links to the items processed in the Analysis No need to install the analysis, it runs as web service

MONK Executes flows for each analysis requested –Predictive modeling using Naïve Bayes –Predictive modeling using Support Vector Machines (SVM) –Feature comparisons

SEASR Support in VUE Goal: Provide functionality in VUE to use SEASR flows Implementations: –Add content to map –Get metadata for content –Get information about content

Meandre Workbench Web-based UI Components and flows are retrieved from server Additional locations of components and flows can be added to server Create flow using a graphical drag and drop interface Change property values Execute the flow The SEASR project and its Meandre infrastructure are sponsored by The Andrew W. Mellon Foundation

Extensible to Analysis that You Create You can leverage the flows we have on your server or request your university to host this analysis You can modify these flows and redeploy You can create new flows –Perhaps you want to see only nouns or verbs –Perhaps you want to see a list of extracted entities You can share these flows back to the community

Repository Search & Browse Web Service Interactive Web Application Zotero Upload to Repository Zotero to SEASR : Fedora

JSTOR Data for Research:SEASR Accesses APIs Access JSTOR API in SEASR components Use the output of these components with existing SEASR components

feedback | login | search central Categories Recently Added Top 50 Submit About RSS Featured Component [read more] Word Counter by Jane Doe Description Amazing component that given text stream, counts all the different words that appear on the text Rights: NCSA/UofI open source license Featured Component [read more] Word Counter by Jane Doe Description Amazing component that given text stream, counts all the different words that appear on the text Rights: NCSA/UofI open source license Featured Flow [read more] FPGrowth by Joe Does Browse By Joe Doe Rights: NCSA/UofI Description: Webservices given a Zotero entry tries to retrieve the content and measure its By Joe Doe Rights: NCSA/UofI Description: Webservices given a Zotero entry tries to retrieve the content and measure its Type Component Flows Categories Image JSTOR Zotero Name Author Centrality Readability Upload Fedora SEASR Central Sharing and finding flows and components

Discussion Questions What kinds of data assets are you interested? What analysis would you like to use against this data?