Experiences in Integration of the 'R' System into Kepler Dan Higgins – National Center for Ecological Analysis and Synthesis (NCEAS), UC Santa Barbara.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

How to improve your Data Analysis Processes in your Web Application / ERP using RClass Juan Antonio Breña Moral
Input and Output ENGR 1181 MATLAB 5. Input and Output In The Real World Script files (which provide outputs given inputs) are important tools in MATLAB.
Programming Types of Testing.
AN INTRODUCTION TO PRAAT Tina John M.A. Institute of Phonetics and digital Speech Processing - University Kiel Institute of Phonetics and Speech Processing.
Chapter 10.
Chad Berkley National Center for Ecological Analysis and Synthesis (NCEAS), University of California, Santa Barbara February.
A Guide to Oracle9i1 Introduction To Forms Builder Chapter 5.
Program Design and Development
Computational Physics Kepler Dr. Guy Tel-Zur. This presentations follows “The Getting Started with Kepler” guide. A tutorial style manual for scientists.
DT228/3 Web Development JSP: Directives and Scripting elements.
GIS Actors in Kepler - Java-based, GDAL-JNI, and C++(Grass) Routines Dan Higgins - UC Santa Barbara (NCEAS) Chad Berkley – UC Santa Barbara (NCEAS) Jianting.
Guide To UNIX Using Linux Third Edition
Russell Taylor Lecturer in Computing & Business Studies.
About the Presentations The presentations cover the objectives found in the opening of each chapter. All chapter objectives are listed in the beginning.
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
Introduction to Array The fundamental unit of data in any MATLAB program is the array. 1. An array is a collection of data values organized into rows and.
What is R Muhammad Omer. What is R  R is the programing language software for statistical computing and data analysis  The R language is extensively.
Graphical Tree-Based Scientific Calculator: CalcuWiz Will Ryan Christian Braunlich.
What is R By: Wase Siddiqui. Introduction R is a programming language which is used for statistical computing and graphics. “R is a language and environment.
A First Program Using C#
Introduction to FORTRAN
Web Application Architecture and Communication. Displaying a Web page in a Browser
Intro to R R is a free version of S-plus R is a free version of S-plus Can be used interactively but script or syntax files are commonly used to record.
Parser-Driven Games Tool programming © Allan C. Milne Abertay University v
Outline Overview Video Format Conversion Connection with An authentication Streaming media Transferring media.
Introduction of Geoprocessing Topic 7a 4/10/2007.
© 2001 Business & Information Systems 2/e1 Chapter 8 Personal Productivity and Problem Solving.
Introduction to Computer Programming Using C Session 23 - Review.
What is MATLAB? MATLAB is one of a number of commercially available, sophisticated mathematical computation tools. Others include Maple Mathematica MathCad.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
Chapter 6 Server-side Programming: Java Servlets
Term 2, 2011 Week 1. CONTENTS Problem-solving methodology Programming and scripting languages – Programming languages Programming languages – Scripting.
Convert generic gUSE Portal into a science gateway Akos Balasko 02/07/
240-Current Research Easily Extensible Systems, Octave, Input Formats, SOA.
XP Tutorial 10New Perspectives on HTML and XHTML, Comprehensive 1 Working with JavaScript Creating a Programmable Web Page for North Pole Novelties Tutorial.
C Language: Introduction
Using R in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
Using Desktop Data in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
Lecture 20: Choosing the Right Tool for the Job. What is MATLAB? MATLAB is one of a number of commercially available, sophisticated mathematical computation.
Intermediate 2 Computing Unit 2 - Software Development.
Dharmen Mehta (Project Manager) Nimai Buch (Language Guru) Yash Parikh (System Architect) Amol Joshi (System Integrator) Chaitanya Korgaonkar (Verifier.
Internet Applications (Cont’d) Basic Internet Applications – World Wide Web (WWW) Browser Architecture Static Documents Dynamic Documents Active Documents.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
ASP-2-1 SERVER AND CLIENT SIDE SCRITPING Colorado Technical University IT420 Tim Peterson.
General Computer Science for Engineers CISC 106 Lecture 12 James Atlas Computer and Information Sciences 08/03/2009.
Lecture1 Instructor: Amal Hussain ALshardy. Introduce students to the basics of writing software programs including variables, types, arrays, control.
PHP Form Processing * referenced from
Visualization in Kepler Dan Higgins – NCEAS Prepared for: Ecoinformatics Training for Ecologists LTER (Albuquerque) January 8-12, 2007
Project Planning Defining the project Software specification Development stages Software testing.
Introduction of Geoprocessing Lecture 9 3/24/2008.
JavaScript Introduction and Background. 2 Web languages Three formal languages HTML JavaScript CSS Three different tasks Document description Client-side.
CIS 595 MATLAB First Impressions. MATLAB This introduction will give Some basic ideas Main advantages and drawbacks compared to other languages.
“Moh’d Sami” AshhabSummer 2008University of Jordan MATLAB By (Mohammed Sami) Ashhab University of Jordan Summer 2008.
Computer Science: A Structured Programming Approach Using C1 Objectives ❏ To understand the differences between text and binary files ❏ To write programs.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
Your Interactive Guide to the Digital World Discovering Computers 2012 Chapter 13 Computer Programs and Programming Languages.
Foundations of Programming: Java
R programming language
Objectives You should be able to describe: Interactive Keyboard Input
Matlab Training Session 4: Control, Flow and Functions
Introduction to R Programming with AzureML
R Programming.
Course Name: QTP Trainer: Laxmi Duration: 25 Hrs Session: Daily 1 Hr.
Fall 2010 Slide 1.
Use of Mathematics using Technology (Maltlab)
Computational Physics Kepler
Scientific Workflows Lecture 15
Presentation transcript:

Experiences in Integration of the 'R' System into Kepler Dan Higgins – National Center for Ecological Analysis and Synthesis (NCEAS), UC Santa Barbara Prepared for Sixth Biennial Ptolemy Miniconference, May 12, 2005 at UC Berkeley SEEK: Science Environment for Ecological Knowledge This material is based upon work supported by the National Science Foundation under award

What is ‘R’ ? “R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.” “R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering,...) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.” From the R Project Web page -

Ptolemy/Kepler and R R language has many similarities to the PTII/Kepler expression language R language emphasizes operations on vectors, matrices, and tables (in R, ‘data frames’) rather than scalars. (This eliminates many explicit looping statements) Many detailed statistical operations and data manipulation routines already exist in R R has ability to create sophisticated graphic displays Being able to call R routines from Kepler would greatly simplify many workflows

R Example Show a table, R script commands, and resulting output With only 3 lines, one can read a data table, plot all combinations of column data, and summarize the data

Interactive R in Kepler

Efforts to R in Kepler First Effort --- Interactive R actor –No real advantage over existing R console Use of Command Line Actor –Problems: R initialization –How to get data in/out ? (files) –How to display graphics ? (files) RExpression actor –Use concepts from Kepler/PT Expression language/actor Using RServer

RExpression Actor R Script: ccc <- aaa + bbb ccc plot(aaa,bbb) Adding ports automatically creates R objects with the port name [e.g. aaa <- c(1,2,3,4)] Graphics automatically saved as images and sent to ‘graphicsFileName’ output port (as file name) R text output automatically sent to ‘output’ port

RExpression – Ports & Parameters Adding ports creates R objects from Kepler tokens R script is a parameter of the RExpression actor which uses port names

Array Records and Data Frames Tables are represented as ‘Data Frame’ objects in ‘R’ A Ptolemy ‘Record of Arrays’ can also represent a table R Script: summary(df) where ‘ df ’ is the R dataframe created automatically when a record of arrays is passed to an input port AAABBB one1 two2 three3 four4

RExpression Output Ports R vectors can also be assigned to output ports R Script: CCC <- df$BBB where ‘ CCC ’ is the R name of the second column of the dataframe

EML DataSource Sequence Inputs EML DataSource actor provides table data from SEEK Ecogrid Column data from table can be supplied in various ways Sequences of tokens from EML DataSource can be converted to arrays and then to a Record for input to RExpression

EML DataSource as Column Record EML DataSource can be configured to create a “Column Based Record’ directly for input to RExpression

R Regression Analysis Example

R Summarize Table By Species

RExpression Implementation Input Ports Kepler tokens are converted to R string expressions e.g. If port AAA has token {1,2,3} it is converted to the R expression ‘ AAA <- c(1,2,3) ’ Automatically handles strings, numbers, arrays, and records with arrays of the same length 2. R Command Line Process R is started as a Java subprocess with text streams attached to standard in, out, and error

RExpression Implementation Input Block of R Commands A set of R commands are sent to the input stream of the R subprocess Initialization User Script Finalization Create graphics device (jpeg file); Create input port objects Whatever is in user’s script BBB <- 2 * AAA R commands for output ports (e.g. BBB)

RExpression Implementation Execute R Send input block to R subprocess and get output 5. Put R results on appropriate output ports Graphics Device File Name R output stream (text) ‘BBB’ R object converted to Kepler token e.g. {2,4,6} User script BBB <- 2 * AAA AAA = {1,2,3)

RServe RServe “Rserve is a TCP/IP server which allows other programs to use facilities of R without the need to initialize R or link against R library.” Client-side implementations are available for C/C++ and Java Java code example ---- Rconnection c = new Rconnection(); double d[]=c.eval("rnorm(10)").asDoubleArray(); Use of RServe would avoid each actor ‘re-starting’ R and allow remote execution of R scripts

Summary An RExpression actor that operates similarly to the existing Expression actor looks like a good way of integrating R into Kepler Using R in Kepler provides powerful extensions to the Ptolemy expression language that allows operations on complex structures (e.g. tables) Existing implementation is inefficient in some ways and incomplete, but is relatively easy to use and does not require detailed knowledge of R for simple operations