Dr Andy Pryke - The Data Mine Ltd An Introduction to R Free software for repeatable statistics, visualisation and modeling Dr Andy Pryke, The Data Mine.

Slides:



Advertisements
Similar presentations
Process Management System Economical Automation with Standard Software
Advertisements

Best of UseR! 2011 A personal & biased view with an emphasis on data visualisation Andy Pryke Birmingham.
The Complete Technical Analysis and Development Environment An attractive alternative to MATLAB and GAUSS - Physics World.
© 2012 Association for Computing Machinery Intro to the ACM Digital Library February 24, 2012 Intro to the ACM Digital Library February 24, 2012.
Foundations of Programming and Problem Solving Introduction.
1 Adding a statistics package Module 2 Session 7.
Auto-test Tools: Sahi and Rational Robot Ting Yu Xia Liu University of Ottawa.
Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
A very short introduction to R Pia Wohland. R is… -A statistical software -Programming language -Free! -Very good in handling and manipulating data sets.
Copyright Hub Software Engineering Ltd 2010All rights reserved Hub Workflow Product Overview.
The MEMOPS Programming Framework Wayne Boucher, Cambridge
October 2007 NOVUS Consulting Group JMP Introduction Presented by Alex Filimon.
P2-WIREFRAME Presented by Rahul Potghan Sonal Kulkarni.
Geographic Information Systems and Science SECOND EDITION Paul A. Longley, Michael F. Goodchild, David J. Maguire, David W. Rhind © 2005 John Wiley and.
Russell Taylor Lecturer in Computing & Business Studies.
Aleksi Kallio CSC – IT Center for Science Chipster and collaboration with other bioinformatics platforms.
Seven good reasons why everyone should be using R.
7. GIS Software. Overview Evolution of GIS software Architecture of GIS software Building GIS software systems Types of software Example products GIS.
Selection and use of appropriate software: Applications software
ISYS 350 Building Business Applications David Chao.
Website Development for the Leave/Travel System and for Interfacing a Database PresentationBy Linda M’mayi Fort Valley State University.
8. GIS Software © John Wiley & Sons Ltd.
Project Proposal: Academic Job Market and Application Tracker Website Project designed by: Cengiz Gunay Client: Cengiz Gunay Audience: PhD candidates and.
Arc: Programming Options Dr Andy Evans. Programming ArcGIS ArcGIS: Most popular commercial GIS. Out of the box functionality good, but occasionally: You.
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
Making Good Use of Data at Hand: Open Source Tools Mark C. Cooke, Ph.D. Tax Management Associates, Inc.
Introduction to R By Robert Biddle. About Me Data Professional with over 10 years experience. Hilton Grand Vacations, Orlando Data Architect MCITP Database.
Applications Software. Applications software is designed to perform specific tasks. There are three main types of application software: Applications packages.
PHP and MySQL by Example COMP YL Professor Mattos.
2. Introduction to the Visual Studio.NET IDE. Chapter Outline Overview of the Visual Studio.NET IDE Overview of the Visual Studio.NET IDE Menu Bar and.
UWG 2013 Meeting PO.DAAC Web Services Demo. What are PO.DAAC Web Services?
ISYS 350 Building Business Applications David Chao.
Programming for Geographical Information Analysis: Advanced Skills Lecture 1: Introduction Programming Arc Dr Andy Evans.
GCE Data Toolbox -- metadata-based tools for automated data processing and analysis Wade Sheldon University of Georgia GCE-LTER.
A Web-Enabled Aircraft Scheduler Michael Wallette 20 Nov
ISYS 350 Building Business Applications David Chao.
CERN - IT Department CH-1211 Genève 23 Switzerland t DB Development Tools Benthic SQL Developer Application Express WLCG Service Reliability.
1 of 8Prof. Dr. Dr. h.c. Otto Spaniol Computer Science, Informatik 4 Communication and Distributed Systems 25/10/2015 Distributed Applications and Network.
ISYS 350 Business Application Development David Chao.
ISYS 350 Building Business Applications David Chao.
RUBRIC IP1 Ruben Botero Web Design III. The different approaches to accessing data in a database through client-side scripting languages. – On the client.
Mantid Stakeholder Review Nick Draper 01/11/2007.
DATA MINING Pandas. Python Data Analysis Library A library for data analysis of (mostly) tabular data Gives capabilities similar to Excel and SQL but.
KING SAUD UNIVERSITY – COLLAGE OF COMPUTER AND INFORMATION SCIENCES CSC 113 JAVA ONLINE DOCUMENTATION.
BIF713 Operating System Concepts MS Windows. Agenda 1. What is an Operating System (definition)? 2. Types of Operating Systems 3. Basic Operations: –
2 Software.
ISYS 350 Building Business Applications David Chao.
Guided tour of Business Strategy: an introduction Campbell, Edgar & Stonehouse Website overview.
Chris Knight Beginners’ workshop.
.. Altova Visual Studio Industry Partner Altova NEXT STEPS Contact us at: Altova® is a software company specializing in tools that.
A SCRIPT FOR ARCHIVING DIGITAL RESEARCH DATA IMPROVING ACCURACY AND EFFICIENCY IN THE DATAVERSE NETWORK ABSTRACT SUMMARY Rachel Carriere, Thu-Mai Christian,
Introduction to Data Manipulation, Analysis, and Visualization with R Patrick Grof-Tisza.
ISYS 350 Building Business Applications
Geo 318 – Introduction to GIS Programming
Computer Software: Programming
ISYS 350 Building Business Applications
ISYS 350 Building Business Applications
Introduction to R.
Programming for Geographical Information Analysis: Advanced Skills
ISYS 350 Building Business Applications
ISYS 350 Building Business Applications
ISYS 350 Building Business Applications
Programming for Geographical Information Analysis: Advanced Skills
Java Online documentation
ISYS 350 Building Business Applications
An introduction to the Linux environment v
ISYS 350 Building Business Applications
Business concentration, minor and certificate programs
C++/Java/COM Interoperability
CSCE156: Introduction to Computer Science II
Presentation transcript:

Dr Andy Pryke - The Data Mine Ltd An Introduction to R Free software for repeatable statistics, visualisation and modeling Dr Andy Pryke, The Data Mine Ltd

Dr Andy Pryke - The Data Mine Ltd Outline 1. Overview What is R? When to use R? Wot no GUI? Help and Support 2. Examples Simple Commands Statistics Graphics Modeling and Mining SQL Database Interface 3. Going Forward Relevant Libraries Online Courses etc.

Dr Andy Pryke - The Data Mine Ltd What is R? Open source, well supported, command line driven, statistics package 100s of extra packages available free Large number of users - particularly in bio-informatics and social science Good Design - John Chambers received the ACM 1998 Software System Award for S Dr. Chambers' work "will forever alter the way people analyze, visualize, and manipulate data…

Dr Andy Pryke - The Data Mine Ltd When Should I Use R? To do a full cycle of: –data import –data pre-processing –exploratory statistics and graphics, –modeling and data mining –report production –integration into other systems. Or any one of these steps - i.e. just to standardise pre-processing of data

Dr Andy Pryke - The Data Mine Ltd Wot no GUI? or The Advantages of Scripting Repeatable Debug-able Documentable Build on previous work Automation –Report generation –Website or system integration –Links from Perl, Python, Java, C, TCP/IP….

Dr Andy Pryke - The Data Mine Ltd Help and Support Built in help/example system (e.g. type ?plot) Many tutorials available free R-Help mailing list -Archived online -Key R developers respond -Contributors understand statistical concepts Large User Community

Dr Andy Pryke - The Data Mine Ltd Simple Commands *3 30 c(1,2,3) c(1,2,3)* x <- 5 x*x 25 exp(1) q() Save workspace image? [y/n/c]: n

Dr Andy Pryke - The Data Mine Ltd colnames(iris) "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width "Species" plot(iris$Sepal.Length, iris$Petal.Length) # Pearson Correlation cor(iris$Sepal.Length, iris$Petal.Length) # Spearman Correlation cor(rank(iris$Sepal.Length), rank(iris$Petal.Length)) Simple Statistics

Dr Andy Pryke - The Data Mine Ltd Blueberry Cherry Apple Boston Cream Other Vanilla Cream January Pie Sales Graphics

Linear Models ## Scatterplot of Sepal and Petal Length plot(iris$Sepal.Length, iris$Petal.Length) ## Make a Model of Petals in terms of Sepals irisModel <- lm(iris$Petal.Length ~ iris$Sepal.Length) ## plot the model as a line abline(irisModel)

Dr Andy Pryke - The Data Mine Ltd Classification Trees # Model Species irisct <- ctree(Species ~., data = iris) # Show the model tree plot(irisct) # Compare predictions table(predict(irisct), iris$Species)

Dr Andy Pryke - The Data Mine Ltd SQL Interface Connect to databases with ODBC library("RODBC") channel <- odbcConnect("PostgreSQL30w", case="postgresql") sqlSave(channel,iris, tablename="iris") myIris <- sqlQuery(channel, "select * from iris")

Dr Andy Pryke - The Data Mine Ltd Data Mining Libraries (i) RandomForest –Random forests - Robust prediction Party –Conditional inference trees - Statistically principled –Model-based partitioning - Advanced regression –cForests - Random Forests with ctrees e1071 –Naïve Bayes, Support Vector Machines, Fuzzy Clustering and more...

Dr Andy Pryke - The Data Mine Ltd Data Mining Libraries (ii) nnets –Feed-forward Neural Networks –Multinomial Log-Linear Models BayesTree –Bayesian Additive Regression Trees gafit & rgenoud –Genetic Algorithm based optimisation varSelRF –Variable selection using random forests

Dr Andy Pryke - The Data Mine Ltd Data Mining Libraries (iii) arules –Association Rules (links to C code) Rweka library –Access to the many data mining algorithms found in open source package Weka dprep –Data pre-processing –You can easily write your own functions too. Bioconductor –Multiple packages for analysis of genomic (and biological) data

Dr Andy Pryke - The Data Mine Ltd Sources of Further Information Download these slides + the examples & find links to online courses in R here:

Dr Andy Pryke - The Data Mine Ltd

Editors which Link to R Rgui (not really a GUI) Emacs (with ESS mode) RCmdr Tinn-R jgr - Ja SciViews and more...