Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007

Slides:



Advertisements
Similar presentations
Overview of the Science Environment for Ecological Knowledge (SEEK) Ricardo Scachetti Pereira.
Advertisements

MATLAB – What is it? Computing environment / programming language Tool for manipulating matrices Many applications, you just need to get some numbers in.
UCSD SAN DIEGO SUPERCOMPUTER CENTER Ilkay Altintas Scientific Workflow Automation Technologies Provenance Collection Support in the Kepler Scientific Workflow.
AN INTRODUCTION TO PRAAT Tina John M.A. Institute of Phonetics and digital Speech Processing - University Kiel Institute of Phonetics and Speech Processing.
Maths for Computer Graphics
R for Research Data Analysis using R Day1: Basic R Baburao Kamble University of Nebraska-Lincoln.
1 Chapter 12 Working With Access 2000 on the Internet.
Chad Berkley National Center for Ecological Analysis and Synthesis (NCEAS), University of California, Santa Barbara February.
Experiences in Integration of the 'R' System into Kepler Dan Higgins – National Center for Ecological Analysis and Synthesis (NCEAS), UC Santa Barbara.
Computational Physics Kepler Dr. Guy Tel-Zur. This presentations follows “The Getting Started with Kepler” guide. A tutorial style manual for scientists.
1 Workshop 20: Teaching a Hands-on Undergraduate Grid Computing Course SIGCSE The 41st ACM Technical Symposium on Computer Science Education Friday.
GIS Actors in Kepler - Java-based, GDAL-JNI, and C++(Grass) Routines Dan Higgins - UC Santa Barbara (NCEAS) Chad Berkley – UC Santa Barbara (NCEAS) Jianting.
Leveraging semantic metadata for ecological data discovery and integration for analysis and modeling Matthew B. Jones Mark P. Schildhauer with contributions.
The Kepler Project Overview, Status, and Future Directions Matthew B. Jones on behalf of the Kepler Project team National Center for Ecological Analysis.
Biology.sdsc.edu CIPRes in Kepler: An integrative workflow package for streamlining phylogenetic data analyses Zhijie Guan 1, Alex Borchers 1, Timothy.
January, 23, 2006 Ilkay Altintas
Introduction for BEAM Ecological Niche Modeling Working Meeting Deana Pennington University of New Mexico December 14, 2004.
Design of Bio-Medical Virtual Instrumentation Tutorial 3.
Kepler Exercise Deana Pennington University of New Mexico January 9, 2007.
1 Databound Controls. 2 Objectives You will be able to use design time data binding to display and update SQL Server database data without writing any.
1 Data Bound Controls II Chapter Objectives You will be able to Use a Data Source control to get data from a SQL database and make it available.
Math 15 Lecture 10 University of California, Merced Scilab Programming – No. 1.
Knb.ecoinformatics.org LTER EML Best Practices Data Discovery in the Biological Sciences 7-9 February 2005 Mark Servilla LTER Network Office University.
Galaxy for Bioinformatics Analysis An Introduction TCD Bioinformatics Support Team Fiona Roche, PhD Date: 31/08/15.
Directions in observational data organization: from schemas to ontologies Matthew B. Jones 1 Chad Berkley 1 Shawn Bowers 2 Joshua Madin 3 Mark Schildhauer.
Ecological Metadata Language (EML) and Morpho
Science Environment for Ecological Knowledge: EcoGrid Matthew B. Jones National Center for.
Introduction to MATLAB Session 3 Simopekka Vänskä, THL Department of Mathematics and Statistics University of Helsinki 2011.
What is MATLAB? MATLAB is one of a number of commercially available, sophisticated mathematical computation tools. Others include Maple Mathematica MathCad.
Data, Metadata, and Ontology in Ecology Matthew B. Jones National Center for Ecological Analysis and Synthesis (NCEAS) University of California Santa Barbara.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
Introduction to MATLAB 7 Engineering 161 Engineering Practices II Joe Mixsell Spring 2010.
1 Ilkay ALTINTAS - July 24th, 2007 Ilkay ALTINTAS Director, Scientific Workflow Automation Technologies Laboratory San Diego Supercomputer Center, UCSD.
10/24/20151 Chapter 2 Review: MATLAB Environment Introduction to MATLAB 7 Engineering 161.
Grid Technologies Arcot Rajasekar (SEEK) Paul Watson (North East eScience Centre)
United Nations Economic Commission for Europe Statistical Division The Importance of Databases in the Dissemination Process Steven Vale, UNECE.
What does C store? >>A = [1 2 3] >>B = [1 1] >>[C,D]=meshgrid(A,B) c) a) d) b)
Ecoinformatics Workshop Summary SEEK, LTER Network Main Office University of New Mexico Aluquerque, NM.
DATABASE TOOLS CS 260 Database Systems. Overview  Database accounts  Oracle SQL Developer  MySQL Workbench.
Using Desktop Data in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
Kepler Deana Pennington LTER Network Office. Download Kepler Kepler website: website:
NEES Cyberinfrastructure Center at the San Diego Supercomputer Center, UCSD George E. Brown, Jr. Network for Earthquake Engineering Simulation Analyzing.
ENG College of Engineering Engineering Education Innovation Center 1 More Script Files in MATLAB Script File I/O : Chapter 4 1.Global Variables.
EScience Workshop on Scientific Workflows Matthew B. Jones National Center for Ecological Analysis and Synthesis University of California Santa Barbara.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
Performing statistical analyses using the Rshell processor Original material by Peter Li, University of Birmingham, UK Adapted by Norman.
Introduction to Taverna Online and Interaction service Aleksandra Pawlik University of Manchester.
Creating a simple database This shows you how to set up a database using PHPMyAdmin (installed with WAMP)
Introduction to Matlab  Matlab is a software package for technical computation.  Matlab allows you to solve many numerical problems including - arrays.
MySQL and GRID status Gabriele Carcassi 9 September 2002.
Kepler Exercise Deana Pennington University of New Mexico December 10, 2004.
Satisfying Requirements BPF for DRA shall address: –DAQ Environment (Eclipse RCP): Gumtree ISEE workbench integration; –Design Composing and Configurability,
Visualization in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
Math 252: Math Modeling Eli Goldwyn Introduction to MATLAB.
R Roger Barlow HEP Computing seminar 21 st February 2008.
Introduction to Programming on MATLAB Ecological Modeling Course Sep 11th, 2006.
Workflow-Driven Science using Kepler Ilkay Altintas, PhD San Diego Supercomputer Center, UCSD words.sdsc.edu.
Kepler BEAM Workshop Samantha Romanello LTER Network Office.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
Pinellas County Schools
EcoGrid in SEEK A Data Grid System for Ecology Bertram Ludaescher University of California, Davis Arcot Rajasekar San Diego Supercomputer Center, University.
Scientific workflow in Kepler – hands on tutorial
Performing statistical analyses using the Rshell processor
Matlab Training Session 4: Control, Flow and Functions
Use of Mathematics using Technology (Maltlab)
Computational Physics Kepler
Computer Science Projects Database Theory / Prototypes
A Semantic Type System and Propagation
Computer Basics Applications.
The FRAME Routine Functions
Presentation transcript:

Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12,

What is R? R is a language and environment for data manipulation, statistical computing, and graphics. R is open-source and thus can be freely downloaded and used at no cost The R Project for Statistical Computing – NCEAS R Programming Language Resource Center – –

RGui

RGui (Windows)

R Example With only 3 lines, one can read a data table, plot all combinations of column data, and summarize the data

Kepler and R R language has many similarities to the Kepler expression language R language emphasizes operations on vectors, matrices, and tables (‘data frames’) rather than scalars. (This eliminates many explicit looping statements) Many detailed statistical operations and data manipulation routines already exist in R R has ability to create sophisticated graphic displays Being able to call R routines from Kepler greatly simplifies many workflows

Simple R Workflow Just drag an RExpression actor to the work area, add a director, and connect the outputs to a display and imageJ actors Display is the same as one sees running the R script from the command line

RExpression Actor Parameters

Arrays and Graphical Output R Script: ccc <- aaa + bbb ccc plot(aaa,bbb) Adding ports automatically creates R objects with the port name [e.g. aaa <- c(1,2,3,4)] Graphics automatically saved as images and sent to ‘graphicsFileName’ output port (as file name) R text output automatically sent to ‘output’ port

Adding ports creates R objects from Kepler tokens R script is a parameter of the RExpression actor which uses port names RExpression – Ports & Parameters

Tables are represented as ‘Data Frame’ objects in ‘R’ A Ptolemy ‘Record of Arrays’ can also represent a table R Script: summary(df) where ‘ df ’ is the R dataframe created automatically when a record of arrays is passed to an input port AAABBB one1 two2 three3 four4 Array Records and Data Frames

R Dataframes AAABBB one1 two2 three3 four4 In R, a ‘dataframe’ represents a table A dataframe is a list of column vectors Each column has the same kind of data (e.g. a number or a string) Each column can have a name (e.g. ‘AAA’ or ‘BBB’ ) AAA <- c(“one”,”two”,”three”,”four”) BBB <- c(1,2,3,4) df <- data.frame(aaa,bbb) Creating a dataframe df 1st Column: df[1,], df[‘AAA’,] 2nd Row: df[,2], df[AAA==‘two’,] Selecting Parts of a dataframe

Using Multiple R Actors

Using Multiple R Actors - Result

R Summarize Table By Species

R Pairs Plot

Configuring an EML Datasource for Use in RExpression Use “As Column Vector” to pass the entire column at one time (i.e. an R vector)

Custom RExpression Actors RExpression actors with pre-built R scripts can be added to the Kepler actor list. Examples of current customs actors are shown here. This provides tools for users that are unfamiliar with R scripts

Acknowledgements This material is based upon work supported by: The National Science Foundation under Grant Numbers , , , , , and Collaborators: NCEAS (UC Santa Barbara), University of New Mexico (Long Term Ecological Research Network Office), San Diego Supercomputer Center, University of Kansas (Center for Biodiversity Research), University of Vermont, University of North Carolina, Napier University, Arizona State University, UC Davis The National Center for Ecological Analysis and Synthesis, a Center funded by NSF (Grant Number ), the University of California, and the UC Santa Barbara campus. The Andrew W. Mellon Foundation. Kepler contributors: SEEK, Ptolemy II, SDM/SciDAC, GEON