Download presentation
Presentation is loading. Please wait.
Published byAldous Davis Modified over 8 years ago
1
Lecy ∙ R MeetUp Group LECTURE 00 R Overview
2
MOTIVATING THE MATERIAL
3
WHAT IS R ?
4
R Two guys in New Zealand who do not know how to program invent a language, give it away for free. It develops a cult following and takes on billion dollar industry giants like SAS and Stata.
5
R IS MANY THINGS R is a hybrid of a programming language and a stats package R is a platform –Operating system (environment) for programs (packages) written by users –Data engine –Graphing engine R is an ecosystem –Packages can build on each other, code can be adapted R is a community R is a response to the commercialization of scientific knowledge at the expense of science
6
R IS GOOD AT SOME THINGS Rapid development and deployment of programs Customized professional graphics Open-source paradigm allows you to build on others work –For example, the “fix” command Breaking through cost barriers for small companies and students There is an amazing variety of packages and datasets (over 7000) –http://cran.r-project.org/web/views/http://cran.r-project.org/web/views/ Documentation is fairly good
7
R IS NOT GOOD AT OTHERS R is not built for large datasets (although there are now many ways to adapt it to these purposes) R is not as fast as compiled programming languages Distributed development means that uniform conventions are often not followed concerning function names, arguments, and documentation Output is not automatically pretty, so takes some extra time to format (though there are good packages for these purposes)
8
R EMBRACES OBJECT-ORIENTED PROGRAMMING # example of plot O-O behavior x <- 1:100 y <- 2*x + rnorm(100,0,10) plot( x, y ) x2 <- cut( x, 5 ) plot( x2, y ) m.01 <- lm( y ~ x ) plot(m.01) # example with variance O-O behavior: dat <- data.frame( x, y ) var( x ) var( dat )
9
WHY R ?
10
Statistics Network Analysis Machine Learning Text Analysis GIS Dynamic Reports
11
http://r4stats.com/articles/popularity/ R IS GROWING
12
API Shiny
13
MEETUP OBJECTIVES Expose you to new and interesting developments in the data programming world. Ability to use R Studio, read R documentation, and write R scripts. Ability to write technical notes and report results using R Markdown docs. Familiarity with R conventions and the Object Oriented framework. Understanding of core data structures of R. Understanding of core data programming operations. Comfort with the R graphics engine. Work with raw data using text functions. Understanding of programming fundamentals. Create a data dashboard using R Shiny. Collaborate in teams using GitHub.
14
MY DDM COURSE OVERVIEW: Weeks 1-5: Core Data Operations 1 – Intro 2 – Data Structures 3 – Merge Data 4 – Descriptive Statistics 5 – Data Input Weeks 6-9: Visualization 6 – Principles of Visualization 7 – Core Graphics 8 – Advanced Graphics 9 – Maps and GIS Weeks 10-12: Programming and Text 10 – Basic Programming 11 – Text Analysis 12 – Text Analysis 13 – Thanksgiving Break Weeks 14-15: Building a Dashboard in Shiny 14 – Intro to Shiny & GitHub 15 – More Shiny http://www.lecy.info/data-driven-management
15
HELPFUL TEXTS R Cookbook The Art of Programming in R
16
REQUIRED SOFTWARE
17
WE WILL BE USING The latest version of R (3.2.2 or higher) R Studio development environment GitHub (as much as we can) R Shiny web toolkit Packages: –The Lahman Package – data structures and visualization –devtools – integration with GitHub –shiny – build shiny apps –maps / ggmap / maptools – GIS operations
18
github “Software engineers will pay monthly fees for the rest of their lives in order to create free software out of other free software!” Some examples: A short tutorial for using the ‘twitteR’ package: https://sites.google.com/site/miningtwitter/questions/talking-about https://github.com/gastonstat/Mining_Twitter Hadley Wickam (he created R Studio): https://github.com/hadley
19
VERSION CONTROL 101
20
This code was added This code was deleted
21
SUPPORTS CONCURRENT DEVELOPMENT
22
GRAPHICS
23
Two population density measures compared.Migration patterns of birds.
25
OBJECTIVES Reflect on good visualization practices Understand ground, figure, and narrative on charts Learn the core functions of the graphics suite Learn how to customize graphs and create high quality images Touch on some nice mapping packages
26
WRITING CLEAR CODE
27
Donaudampfschiffahrtsgesellschaftskapitän “Danube steamship company captain” summary(lm(dat$crime[20:50]~bin(dat[20:50],”pop”],10))) VS. y.sub <- dat[ 20:50, “crime” ] x.sub <- dat[ 20:50, “pop” ] x.bin <- bin( x.sub, 10 ) lm.01 <- lm( y.sub ~ x.bin ) summary( lm.01 ) THE R STYLE GUIDE
28
THE ‘LAHMAN’ PACKAGE
29
THE ART OF CREATING GRAPHICS: http://chartsnthings.tumblr.com/post/22471358872/sketches-how-mariano-rivera-compares-to-baseballs
30
FROM THE NTY BLOG, CHARTSNTHINGS http://chartsnthings.tumblr.com/post/47670081904/climate-change-crowbars-and-strikeouts
31
MISCELLANEOUS ANALYSIS
33
WHAT IS object-oriented ?
34
R EMBRACES OBJECT-ORIENTED PROGRAMMING # A function to make cookies: make.cookies <- function( flour, eggs, sugar ) { # these steps give the operations batter <- mix( flours, eggs, sugar ) baked.goods <- bake( batter, temp=450 ) return( baked.goods ) } # Each step of the recipe is a separate # function. Here "mix" and "bake" are # defined elsewhere as “mix.R” and “bake.R”.
35
# When you want to call the function you give # specific instances of the inputs cookies.01 <- make.cookies( flour.01, eggs.01, sugar.01) # Because R is object-oriented, you not only need # to call the function but you need to give a name # to the final product. A new data object is created # after each function is performed. R EMBRACES OBJECT-ORIENTED PROGRAMMING
36
# example of plot O-O behavior x <- 1:100 y <- 2*x + rnorm(100,0,10) plot( x, y ) x2 <- cut( x, 5 ) plot( x2, y ) m.01 <- lm( y ~ x ) plot(m.01) # example with variance O-O behavior: dat <- data.frame( x, y ) var( x ) var( dat )
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.