Download presentation
Presentation is loading. Please wait.
Published byJair Pownell Modified over 10 years ago
2
© 2011 Deloitte Touche Tohmatsu About me Educational background – Applied Econometrics 4 years statistical modelling experience R experience – 2 years Currently Senior Analyst at Deloitte Hobby – rock climbing, data mining competitions Why? - Early retirement Current interest – Text analytics
3
© 2011 Deloitte Touche Tohmatsu 2 Topic: The benefits of R from a data mining competitor’s point of view and from the point of view of an employee at Deloitte Work Professional and pragmatic Home The playful scientist
4
© 2011 Deloitte Touche Tohmatsu 3 Agenda 1.Quick introduction to R 2.What I use R for 3.R at work Introduction to Deloitte Frequently used tools Some of the work we do using R Examples Challenges: Data Storage Challenges: Standardisation How Deloitte is addressing this issue 4.R at home: Some of the work I do using R, at home Flexibility and convenience Examples Prototyping and experimenting Examples 5.Questions 6.Essential R packages for everyday use
5
© 2011 Deloitte Touche Tohmatsu 4 Quick introduction to R “A statistical software created by statisticians, for statisticians” Personally, I use R for data analysis and statistical modelling Unique features worth noting: Open source – free, easy to find help in the active community Understands mathematical computations and matrix operations naturally Thousands of packages, implementations of almost any algorithm
6
© 2011 Deloitte Touche Tohmatsu 5 Introduction to R Thousands of packages, implementations of almost any algorithm ggplot2 EBImage randomForest etc N = 500+ Packages
7
© 2011 Deloitte Touche Tohmatsu 6 R at work
8
© 2011 Deloitte Touche Tohmatsu 7 Introduction to Deloitte 1.We help clients capture, manage and analyse data to help solve important business problems to make informed decisions 2.A holistic process of data mining
9
© 2011 Deloitte Touche Tohmatsu 8 Introduction to Deloitte: Typical activity involved in a project at Deloitte Initiating processes Planning processes Modeling Closing processes Level of Activity Time line Data loading Data preparation But not everything is R 20% - 40% time spent on modelling
10
© 2011 Deloitte Touche Tohmatsu 9 Frequently used tools Geospatial analytics - Tactician Segmentation - Self Organising maps Modelling Visualisation SQL server
11
© 2011 Deloitte Touche Tohmatsu 10 Some of the work we do using R In Deloitte Statistical Analysis and Predictive modelling Time series analysis Social Network Analysis Data visualisation Text analytics (NEW!)
12
© 2011 Deloitte Touche Tohmatsu 11 Examples: Time Series y – retail activity? Time (days) --- Estimate Actual Fitted R package: forecast
13
© 2011 Deloitte Touche Tohmatsu 12 Challenges: Data Storage We have a dedicated tool to store and clean data – SQL R cannot handle large data sets Error: cannot allocate vector of size 2097151 Kb
14
© 2011 Deloitte Touche Tohmatsu 13 Challenges: Standardisation ‘You’re not the only one using it” One of the reason’s why other commercial tools are preferred over R Transferable skills across the team Reliability of packages Standardised functions and procedures
15
© 2011 Deloitte Touche Tohmatsu 14 How Deloitte is addressing this issue Creating standardised process: R package: RODBC
16
© 2011 Deloitte Touche Tohmatsu 15 How Deloitte is addressing this issue Creating standardised functions: # Density Plot for subject variable DensityPlot <- function(dataset, col) { ds <- data.frame(dataset);ds$c <- ds[,c(col)];a <- ggplot(data=ds, aes(x=c) ) a <- a + geom_density(kernel="biweight");a } DensityPlot (dataset, column number) Retrieving data from the database (RODBC): conn <- odbcDriverConnect("driver=SQL Server; database=DataBaseName; server=servername;") query <- “Select * from TableName” df <- sqlQuery(conn,query ) R package: RODBC
17
© 2011 Deloitte Touche Tohmatsu 16 R at home
18
© 2011 Deloitte Touche Tohmatsu 17 Some of the work I do using R, at home In Deloitte Statistical Analysis and Predictive modelling Time series analysis Social Network Analysis Data visualisation Text analytics (NEW!) (we don’t just use R) At home (data mining competitions) Statistical analysis and Predictive modelling Time series analysis Social Network Analysis Data visualisation Text analytics Image analysis (I mainly use R)
19
© 2011 Deloitte Touche Tohmatsu 18 Flexibility and convenience 1.Is one of the easier programming languages to pick up 2.Dive into the analysis quickly
20
© 2011 Deloitte Touche Tohmatsu 19 Examples Image analysis R package: EBImage
21
© 2011 Deloitte Touche Tohmatsu 20 Examples Image Analysis R package: EBImage
22
© 2011 Deloitte Touche Tohmatsu 21 Prototyping and experimenting 1.Access to the latest most innovative techniques 2.Great for prototyping new algorithms
23
© 2011 Deloitte Touche Tohmatsu 22 Examples: Text analytics + 1 The latest proof that Google can do no wrong | http://t.co/dSUhwVoO (via @Techland) 2 Teen girls look to YouTube for self-image validation | http://t.co/PSfROdi4 (via @TIMEHealthland) 3 Why libraries need us now more than ever #sxsw | http://t.co/OTbutfup (via @Techland) 4 PHOTOS: Amazing Photos of the Sun http://t.co/bmYAtNab via @TIME 5 Why libraries need us now more than ever #sxsw | http://t.co/OTbutfup (via @Techland) 6 R package: twitteR
24
© 2011 Deloitte Touche Tohmatsu 23 Examples: Word cloud of twitter feeds R package: wordcloud
25
© 2011 Deloitte Touche Tohmatsu 24 Examples: Text analytics + = What are the common themes that are being tweeted by Time magazine? ?
26
© 2011 Deloitte Touche Tohmatsu 25 Tweet R package: ggplot2 Top words associated to the classification A B C D A B C D
27
© 2011 Deloitte Touche Tohmatsu 26 Classification results TweetsTopic 1Topic 2Topic 3Topic 4 1The latest proof that Google can do no wrong | http://t.co/dSUhwVoO (via @Techland)40%0% 60% 2Teen girls look to YouTube for self-image validation | http://t.co/PSfROdi4 (via @TIMEHealthland)0% 100% 3Why libraries need us now more than ever #sxsw | http://t.co/OTbutfup (via @Techland)100%0% 4PHOTOS: Amazing Photos of the Sun http://t.co/bmYAtNab via @TIME0%100%0% 5Why libraries need us now more than ever #sxsw | http://t.co/OTbutfup (via @Techland)100%0% 6Why libraries need us now more than ever #sxsw | http://t.co/OTbutfup (via @Techland)100%0% 7 Astrophysicist @neiltyson responds to @TIME's q: "What is the most astounding fact about the Universe?" http://t.co/91565khw | beautiful vid0% 67%33% 8Why libraries need us now more than ever #sxsw | http://t.co/OTbutfup (via @Techland)100%0% 9Living Alone Is The New Norm http://t.co/25BzVSLN (via @TIME) #teamhermit0% 100% 10PHOTOS: Seven days of strange landscapes | http://t.co/oLtxFcp80%100%0% 11Subject for Debate: Are Women People? http://t.co/IRVthFc8 via @TIME0% 100% 12 PHOTOS: Seven days of strange landscapes | http://t.co/oLtxFcp80%100%0% 13 @Time Israel's bogus case for bombing Gaza obscures political motives | Al Akhbar English http://t.co/Fud0mDNN via @AlakhbarEnglish 0% 100%0%
28
© 2011 Deloitte Touche Tohmatsu 27 Questions?
29
© 2011 Deloitte Touche Tohmatsu 28 Essential R packages for everyday use Essential ggplot2 reshape RODBC randomForest rpart Nice to have caret forecast tm
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.