R For The SQL Developer Kevin Feasel Manager, Predictive Analytics ChannelAdvisor
Who Am I? What Am I Doing Here? Curated SQL https://curatedsql.com Tribal SQL http://tribalsql.com @feaselkl
What Is R? R is a language focused around performing statistical analysis, predictive modeling, and data cleansing. R is an off-shoot of the S language and is built on top of C.
What Is R? There are two major branches of R of interest to us: base R and Microsoft R. "Base" R is managed by the R Consortium and is entirely open-source. Microsoft takes base R and adds additional libraries and support.
Why Use R? R provides several advantages as a data analysis Domain Specific Language (DSL): R has a large number of built-in functions for performing operations, including aggregates, statistical analysis, and graphing and plotting. The R ecosystem is vast. With CRAN (CPAN or NuGet for R), thousands of open-source packages are available to help you solve common data cleansing, data analysis, and plotting problems. R language constructs make set-based analysis and operation easy, improving performance and reducing the transition period for SQL Server developers. R helps you go well beyond simple Excel analysis and pivot tables
Notebooks R integrates well with the idea of notebooks. Notebooks are a way of mixing Markdown-enabled text and language snippets to make your thoughts clear to others. You can create and share notebooks, allowing others easily to test your process and follow along. Notebooks are also an excellent teaching mechanism. Today's talk will look at Jupyter Notebooks. Jupyter (which name derives from a combination of the languages Julia, Python, and R) is a great framework because it has support for dozens of languages. Microsoft uses Jupyter Notebooks for its Azure Machine Learning product.
Motivation My goals in this talk: Introduce you to the R ecosystem, including programs, libraries, and places to learn more. Introduce you to the R language and show how to connect to SQL Server, as well as a few things you can do with R. Introduce you to notebooks and show how they can serve as pedagogical or scientific purposes. Get you thinking about ways you could use R in your environment today. Note that R is not the only data analysis language you could learn. Julia and Python are also great languages, and there are very good closed- source, commercial tools like SAS
Motivation Call logging plot:
Motivation CPU usage plot:
Motivation Columnstore index updates in SQL Server 2016:
Introducing R Installing R and Tools Learning the Basics Connecting to SQL Server Getting a Taste of R
Getting the Right Version of R There are two core versions of R: open-source base R and Microsoft R (nee Revolution R). Selected features: Version Parallelism Data Size Deployment Base R Parallel library Memory Shiny Microsoft R Open MKL w/o ScaleR Microsoft R Client ScaleR, 2 threads Memory; can connect to R Server Microsoft R Server Full parallelism Memory or disk DeplyR/Shiny
Choosing Your IDE There is one big IDE available: RStudio. RStudio is a standalone installation and provides a nice development interface for R. Microsoft has also made available R Tools for Visual Studio (RTVS), a Visual Studio plug-in. It offers some interesting features like making SQL Server R Services integration easier, and it integrates with other Visual Studio projects.
Jupyter We will also install Jupyer Notebooks and use it during this talk. Installing Jupyter takes a few steps, but the links for this talk include a step-by-step walkthrough. The easiest way to install Jupyter is to use Anaconda, a data science suite for Python. Jupyter also comes with Visual Studio 2017 if you install the Data Science tools.
Introducing R Installing R and Tools Learning the Basics Connecting to SQL Server Getting a Taste of R
Learning About Notebooks Instead of spending a lot of time talking theory, let's investigate R using notebooks. Notebooks allow us to combine code and explanatory text (using Markdown to help with formatting). The most important thing about notebooks is that they are repeatable, meaning that I should be able to hand you a notebook and have you run it all the way through, getting the same results I do. Notebooks help scientists defend their hypotheses and allow others to replicate their experiments.
Demo Time
Introducing R Installing R and Tools Learning the Basics Connecting to SQL Server Getting a Taste of R
Connecting to SQL Server Connecting to a SQL Server database (or any other relational database) is easy with R. The first step is to install the RODBC pacakage to give your R code ODBC support. From there, you can connect to a system data source that you've defined in your ODBC Data Sources. You could also pass in a connection string if you don't want to set up a DSN.
Demo Time
Introducing R Installing R and Tools Learning the Basics Connecting to SQL Server Getting a Taste of R
Getting a Taste of R No single talk will expose the full gamut of what you can do with R, but this next section will try to hit a few of the highlights. If this feels a bit overwhelming, don't fret: you can grab the notebook and try it out yourself. This notebook will cover the analysis of restaurant data for Wake County, North Carolina over a multi-year period.
Demo Time
Wrapping Up R is a powerful language for performing analysis. We've seen just a few of the many valuable uses of R. To learn more, go here: https://CSmore.info/on/r And for help, contact me: feasel@catallaxyservices.com | @feaselkl