John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio.

Slides:



Advertisements
Similar presentations
Mark Servilla & Duane Costa LTER Network Office LTER 2012 All Scientist Meeting LTER Network Office.
Advertisements

1 Configuring Internet- related services (April 22, 2015) © Abdou Illia, Spring 2015.
2009 Mid–Term Review El Verde Field Station June 4, 2009.
John Porter Why this presentation? The forms data take for analysis are often different than the forms data take for archival storage Spreadsheets are.
Demonstration of a Blaise Instrument Documentation System “BlaiseDoc” Gina-Qian Cheung May 25, 2005 Institution for Social Research University of Michigan.
Data Analytics and Dynamic Languages Lee E. Edlefsen, Ph.D. VP of Engineering 1.
1 Computing for Todays Lecture 22 Yumei Huo Fall 2006.
Python and Web Programming
The Internet & The World Wide Web Notes
Synthesis of Incomplete and Qualified Data using the GCE Data Toolbox Wade Sheldon Georgia Coastal Ecosystems LTER University of Georgia.
Introduction to R Statistical Software Anthony (Tony) R. Olsen USEPA ORD NHEERL Western Ecology Division Corvallis, OR (541)
Linux Operations and Administration
Form Handling, Validation and Functions. Form Handling Forms are a graphical user interfaces (GUIs) that enables the interaction between users and servers.
1 Spidering the Web in Python CSC 161: The Art of Programming Prof. Henry Kautz 11/23/2009.
Introduction to. What is Office 365 Office 365 is the same Office you already know and use every day. Office 365 is powered by “the cloud” which is a.
Chapter Seven Advanced Shell Programming. 2 Lesson A Developing a Fully Featured Program.
Based on material developed by Samantha Romanello and
WebFOCUS Developer Studio Update Dimitris Poulos Technical Director September 3, 2015 Copyright 2009, Information Builders. Slide 1.
JSP Standard Tag Library
Topics Covered: Data preparation Data preparation Data capturing Data capturing Data verification and validation Data verification and validation Data.
A Guide to SQL, Eighth Edition Chapter Three Creating Tables.
Database-Driven Web Sites, Second Edition1 Chapter 8 Processing ASP.NET Web Forms and Working With Server Controls.
SAS Workshop Lecture 1 Lecturer: Annie N. Simpson, MSc.
Chapter 16 The World Wide Web. 2 The Web An infrastructure of information combined and the network software used to access it Web page A document that.
What is Sure BDCs? BDC stands for Batch Data Communication and is also known as Batch Input. It is a technique for mass input of data into SAP by simulating.
Tutorial 1 Getting Started with Adobe Dreamweaver CS3
© 2011 Delmar, Cengage Learning Chapter 7 Managing a Web Server and Files.
XHTML Introductory1 Linking and Publishing Basic Web Pages Chapter 3.
JavaScript, Fourth Edition
Introduction to SPSS Edward A. Greenberg, PhD
Amber Annett David Bell October 13 th, What will happen What is this business about personal web pages? Designated location of your own web page.
WEB DESIGN USING DREAMWEAVER. The World Wide Web –A Web site is a group of related files organized around a common topic –A Web page is a single file.
Training course on biodiversity data publishing and fitness-for-use in the GBIF Network, 2011 edition How Darwin Core Archives have changed the landscape.
Miscellaneous Excel Combining Excel and Access. – Importing, exporting and linking Parsing and manipulating data. 1.
CP2022 Multimedia Internet Communication1 HTML and Hypertext The workings of the web Lecture 7.
LiveCycle Data Services Introduction Part 2. Part 2? This is the second in our series on LiveCycle Data Services. If you missed our first presentation,
Eurotrace Hands-On The Eurotrace File System. 2 The Eurotrace file system Under MS ACCESS EUROTRACE generates several different files when you create.
Microsoft FrontPage 2003 Illustrated Complete Finalizing a Web Site.
Unit 2, cont. September 12 More HTML. Attributes Some tags are modifiable with attributes This changes the way a tag behaves Modifying a tag requires.
JavaScript, Fourth Edition
Creating Dynamic Web Pages Using PHP and MySQL CS 320.
EASI a free web database application for collecting and managing monitoring records.
ETMS Documentation Roger Milego Stockholm, Project meeting 8-9 October 2013.
Moodle with Style Integrating new technologies to empower learning and transform leadership.
® IBM Software Group © 2006 IBM Corporation JSF Progress Bar This Learning Module shows how to integrate EGL/JSF functionality into a run-time progress.
Variables and ConstantstMyn1 Variables and Constants PHP stands for: ”PHP: Hypertext Preprocessor”, and it is a server-side programming language. Special.
_______________________________________________________________CMAQ Libraries and Utilities ___________________________________________________Community.
Introduction to Morpho BEAM Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
R. Suresh (NASA/MTECH) Ben Burford (JAXA) Bernhard Buckl (DLR) Contact: - CEOS WGISS Meeting, Beijing, China, September 2004 A RSS.
EML Analysis Tools Introduction Ecoinformatics Working Group Taiwan Forestry Research Institute (TFRI)
Test Automation For Web-Based Applications Portnov Computer School Presenter: Ellie Skobel.
Introduction to Morpho RCN Workshop Samantha Romanello Long Term Ecological Research University of New Mexico.
The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management Kristin Vanderbilt Department of Biology University.
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
Microsoft Office 2013 Try It! Chapter 4 Storing Data in Access.
Irakli Garibashvili Director, National Scientific Library in Georgia.
Dataset Usability IMC Annual Meeting 2011, EIMC. NIS Time Line IMC Annual Meeting 2011, EIMC.
Copyright © Terry Felke-Morris WEB DEVELOPMENT & DESIGN FOUNDATIONS WITH HTML5 7 TH EDITION Chapter 2 Key Concepts 1 Copyright © Terry Felke-Morris.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Text2PTO: Modernizing Patent Application Filing A Proposal for Submitting Text Applications to the USPTO.
MESA A Simple Microarray Data Management Server. General MESA is a prototype web-based database solution for the massive amounts of initial data generated.
Comprehensive Continuous Improvement Plan(CCIP) Training Module 4 Funding Application.
XP Creating Web Pages with Microsoft Office
11 DEPLOYING AN UPDATE MANAGEMENT INFRASTRUCTURE Chapter 6.
9/21/04 James Gallagher Server Installation and Testing: Hands-on ● Install the CGI server with the HDF and FreeForm handlers ● Link data so the server.
Information Management & Technology of the VCR/LTER Project
Advanced Programing practices
The webinar will start at 12:10pm.
Getting Started With Solr
Presentation transcript:

John Porter Sheng Shan Lu M. Gastil Gastil-Buhl With special thanks to Chau-Chin Lin and Chi-Wen Hsaio

 Maximizing the potential of LTER data to be used to make new ecological discoveries  Moving from the era of single datasets to large scale data integration  Tens to hundreds of datasets  A first step to achieving this goal is to automate the mechanical processes associated with data ingestion into analytical software

 We want to:  Identify a dataset in the LTER Network Information System  Download it  Write a R statistical program to read the data  Produce basic statistical summaries of the ingested data  How long should that process take?  With our tools we can do that in less than 1 minute!

ToolDescriptionWorks with Metacat Works with PASTA TFRI – R module Web-form-based system takes you through a multistep process to ingest data, do a basic quality assurance analysis and simple analyses Manual data download StatProg Web-form-based system that generates R, SAS, SPSS or Matlab programs that can be edited to process data Manual data download PASTAprog Web service – returns ready-to-use R, SAS or Matlab program. Can be run directly from inside R for 1-minute analyses! Variable – some automated, some manual Fully- automated download

Note: You do NOT need to have R installed on your PC to use this. It is entirely web-based. Don’t be worried by the buttons! A fully English version is available at the URL above

Metadata Display Statistical Functions Raw Data Upload Select number type of the field Incude the field in R code ( select at least one ) ˇ EML metadata transform into HTML by XSL Stylesheet

No field header Upload

Only for numerical attributes! Data Check Functions Correct domain (real, integer) Range Checks Action Options:  Edit records with bad values  Set all the bad values to missing ( NA )  Eliminate all the records with bad values  Ignore all the range check problems (Just for value range error)

Data Type Error : Value Range Error : Select 'Set all the bad values to missing ( NA )' option 3 Update The message for No data error

This line can not be modified Rest of the R program CAN be modified to reflect your analyses

Select program type Specify Metadata Document to Use You can get the Package ID from the LTER Metadata catalog. Download a copy of the data, while you are there! Or, you can specify a metadata document on a site server by giving the full URL

Importantly, you need to edit the program to point to where the data is stored on YOUR computer, so the program can find it!

 The previous form-based programs have been available for several years  Their performance has improved as Metadata has gotten better  But they still can be slower to use than we would like, requiring manual editing and steps  The advent of the LTER PASTA system makes possible truly automated ingestion and analysis using a web service

R “source” function specifying the web service URL and that we want to “echo” our commands to the screen Package ID from the PASTA Data Portal

DONE! Our analysis has been run, and basic statistical summaries have been created for each of the attributes.

You can now add additional commands to generate graphics etc. or merge to other datasets

 Base URL:  Plus – a Package ID (available on the PASTA portal)  E.g., knb-lter-vcr  Scope: knb-lter-vcr  ID: 26  Revision: 14  Plus – A suffix indicating the type of program you want (e.g.,.r,.sas,.spss,.m) for R, SAS, SPSS or Matlab

knb-lter-vcr r You can also use the web service URL in a web browser to get a text copy of your program Note: There are other options that will let you use the web service for data OUTSIDE PASTA by specifying the URL of the EML metadata separately

 Problems with Metadata  Lead to lack of congruency between the description of the data and the data itself*  Bad practices in metadata - e.g., using special characters, spaces or mathematical operations as part of the attribute names  Links to data in the metadata may not properly lead directly to data *  Problems with Data  Inconsistent coding (character data where numbers are expected) – causes conversion of numerical data into R “factors”  Dates – often are handled in different ways  ????? – these systems need additional testing on a wide array of data – and you can help! * Much improved by PASTA system over earlier Metacat