The R language and its Dynamic Runtime

Slides:



Advertisements
Similar presentations
Overview of programming in C C is a fast, efficient, flexible programming language Paradigm: C is procedural (like Fortran, Pascal), not object oriented.
Advertisements

Data Analytics and Dynamic Languages Lee E. Edlefsen, Ph.D. VP of Engineering 1.
Working with JavaScript. 2 Objectives Introducing JavaScript Inserting JavaScript into a Web Page File Writing Output to the Web Page Working with Variables.
Scripting Languages For Virtual Worlds. Outline Necessary Features Classes, Prototypes, and Mixins Static vs. Dynamic Typing Concurrency Versioning Distribution.
XP 1 Working with JavaScript Creating a Programmable Web Page for North Pole Novelties Tutorial 10.
Guide To UNIX Using Linux Third Edition
Language Issues Misunderstimated? Sublimable? Hopefuller? "I know how hard it is for you to put food on your family.” "I know the human being and fish.
Lecture 1: Overview of Java. What is java? Developed by Sun Microsystems (James Gosling) A general-purpose object-oriented language Based on C/C++ Designed.
Python Introduction.
Introduction to High-Level Language Programming
CSC 142 A 1 CSC 142 Introduction to Java [Reading: chapter 0]
CS 350 Operating Systems & Programming Languages Ethan Race Oren Rasekh Christopher Roberts Christopher Rogers Anthony Simon Benjamin Ramos.
1 Lecture 2 : Computer System and Programming. Computer? a programmable machine that  Receives input  Stores and manipulates data  Provides output.
C++ Programming. Table of Contents History What is C++? Development of C++ Standardized C++ What are the features of C++? What is Object Orientation?
PHP TUTORIAL. HISTORY OF PHP  PHP as it's known today is actually the successor to a product named PHP/FI.  Created in 1994 by Rasmus Lerdorf, the very.
XP Tutorial 10New Perspectives on Creating Web Pages with HTML, XHTML, and XML 1 Working with JavaScript Creating a Programmable Web Page for North Pole.
Introduction and Features of Java. What is java? Developed by Sun Microsystems (James Gosling) A general-purpose object-oriented language Based on C/C++
1 COMP 3438 – Part II-Lecture 1: Overview of Compiler Design Dr. Zili Shao Department of Computing The Hong Kong Polytechnic Univ.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
1 Lecture 2 : Computer System and Programming. Computer? a programmable machine that  Receives input  Stores and manipulates data  Provides output.
National Taiwan University Department of Computer Science and Information Engineering National Taiwan University Department of Computer Science and Information.
XP Tutorial 10New Perspectives on HTML and XHTML, Comprehensive 1 Working with JavaScript Creating a Programmable Web Page for North Pole Novelties Tutorial.
Big Data Analytics Carlos Ordonez. Big Data Analytics research Input? BIG DATA (large data sets, large files, many documents, many tables, fast growing)
1. An Introduction A Programming Language A Technology Java Development Kit Java API One Language: Three Editions Standard Edition Enterprise Edition.
Chapter 18 Object Database Management Systems. Outline Motivation for object database management Object-oriented principles Architectures for object database.
COP 3275 Chapter 01 course website: Jonathan C.L. Liu, Ph.D. CISE Department University of Florida.
CIS 595 MATLAB First Impressions. MATLAB This introduction will give Some basic ideas Main advantages and drawbacks compared to other languages.
Presented by : A best website designer company. Chapter 1 Introduction Prof Chung. 1.
XP Tutorial 10New Perspectives on HTML, XHTML, and DHTML, Comprehensive 1 Working with JavaScript Creating a Programmable Web Page for North Pole Novelties.
Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.
Sung-Dong Kim, Dept. of Computer Engineering, Hansung University Java - Introduction.
Chapter 1 Introduction Samuel College of Computer Science & Technology Harbin Engineering University.
Chapter Goals Describe the application development process and the role of methodologies, models, and tools Compare and contrast programming language generations.
Secure Coding Rules for C++ Copyright © 2016 Curt Hill
Functional Programming
Python Programming Unit -1.
Static Code Analysis What it is and does. Copyright © 2016 Curt Hill.
Integrating the R Language Runtime System with a Data Stream Warehouse
Zuse’s Plankalkül – 1945 Never implemented Problems Zuse Solved
The Machine Model Memory
Concepts of Programming Languages
Prof: Dr. Shu-Ching Chen TA: Samira Pouyanfar Spring 2017
Chapter 1 Introduction.
Scripting Languages Info derived largely from Programming Language Pragmatics, by Michael Scott.
ITCS-3190.
Learning to Program D is for Digital.
Compiler Construction (CS-636)
Bridging the Data Science and SQL Divide for Practitioners
Introduction Python is an interpreted, object-oriented and high-level programming language, which is different from a compiled one like C/C++/Java. Its.
Spark Presentation.
Computer System and Programming
Chapter 1 Introduction.
Lecture 1 Runtime environments.
Database Performance Tuning and Query Optimization
Many-core Software Development Platforms
Chapter 2: Operating-System Structures
Introduction to MATLAB
Chapter 2: System Structures
Pointers C#, pointers can only be declared to hold the memory addresses of value types int i = 5; int *p; p = &i; *p = 10; // changes the value of i to.
Parallel Analytic Systems
Overview of big data tools
Introduction to Computer Programming
Chapter 11 Database Performance Tuning and Query Optimization
Big Data Analytics: Exploring Graphs with Optimized SQL Queries
CSC 142 Introduction to Java [Reading: chapters 1 & 2]
Simulation And Modeling
The C Language: Intro.
The Gamma Operator for Big Data Summarization
COP3530- Data Structures Introduction
SPL – PS1 Introduction to C++.
Presentation transcript:

The R language and its Dynamic Runtime Carlos Ordonez

Acknowledgments ATT Labs Simon Urbanek, (ATT Labs, R core team) Mike Stonebraker (MIT) Hadley Wickam (formerly at Rice U) Bryan Lewis (SciDB team) Divesh Srivastava (my “boss” at ATT)

Outline History R features R runtime R programming Research: analyzing streams

History Originally S language, invented at ATT Bell Labs (Chambers got Turing award) The core runtime subsystem is still based on S expressions 1st solid version 1979: ported to Unix and programmed in C Two branches: commercial=S-plus open-source=R (NZ)

Other analytic systems SAS: more a script language, but well tested libraries and external tools Matlab: numerical analysis, optimization, mathematical modeling DBMSs interacting with math libraries: SQL #1 to write queries Spark: new generation of MapReduce Pure C or C++; Java; Python growing (flat files)

Features Interpreted Functional; Recursion Object-oriented Lists, vectors and matrices Goal: Statistical computing, but also numerical analysis, data pre-processing Garbage collector

Pros Robust core interpreter system; portable More RAM => easier, 64-bit memory addresing (but still 32 bit ints) Growing user population: expected to surpass SAS in 2015; already passed S-plus Machine learning now uses R instead of Matlab, but Julia (MIT) growing Scalable systems and libraries exist Revolution bought by Microsoft pBDR snow, biglm

Drawbacks Syntax OK, but run-time R semantics not formally specified: GNU is the current standard Can be slow, especially because there are many ways to program the same task Difficult to integrate data structures (e.g. trees, hash tables, binary files) String manipulation acceptable, but sometimes cumbersome Dynamically typed: unexpected errors Highly variable quality of libraries in CRAN Does not scale well for large n; block-based processing feasible, but needs to be reprogrammed per library (IO tools)

R runtime Single threaded Text file I/O Garbage collector Environments; variable generations

R internals S expressions Data types: integer (32 bit), real, string, Posix timestamp Memory allocation: lists, vectors, matrices, data frames (most general) Memory deallocation: automatic, but can force calls to garbage collector in embedded Bash script-based interpreter: easy integration into diverse Unix environments

Programming in R Examples Interactive debugging Reusable and maintenable code Faster processing Extending R

Examples

Debugging Tracking variable contents List, vector, matrix sizes Ranges Environments

Tracking variable content Initialization commonly not needed; Data type can change any time with new assignment

Sizes

Reusable and maintanable code Functions Closures Functionals named arguments, defaults Libraries R embedded R embedded C

Functional

Faster processing Profiling code Direct calls to C math library Vectorized code Avoid type casting Chunk-based processing

Faster processing

Extending R New functions Libraries Embedded code

Research goal: analyzing network data streams Stream data warehouse, constantly refreshed every 1-5 minutes from multiples streams Time windows Intermittent feeds Enable complex analytics for network monitoring

Embedded code Main motivation: bypass ODBC, JDBC. JSON Embedding R code inside C code Vectors and matrices Exploit existing R functions May be faster than host language Embedding C code inside R code better performance more flexibility algorithm already programmed in C or C++

Embedded R inside C Setup libraries Setup Unix environment Convert external data to list, vector or data frame: memcpy() when possible retrieve results: transformed data set (most common) model (harder) associated statistical metrics (model-specific)

Embedded R inside C main guidelines Avoid reprogramming an existing R function Consider tradeoffs between data set size and RAM Two subsystems will compete for RAM Single threaded, but feasible to call R multiple times as different Unix processes

Embedded R

Embeded R generate time series

Embedded R create data frame

Embedded R final: call R from C

Embedded R direct binding to DBMS

Embedded R main

Embedded C code guidelines Identify bottlenecks Substitute nested interpreted loops Eliminate or reduce dynamic type checking

Embedded C code programming Understand data type manipulation, especially C arrays and ** pointers Memory management Function argument binding Linker

Improve efficiency of R Alternative 1: built-in matrix ops

Improve R efficiency Alternative 2: C code for the operator: 10X faster