Statistical computing tools: What are the hot skills out there?

Slides:



Advertisements
Similar presentations
Copyright © 2010 SAS Institute Inc. All rights reserved. A Quick Introduction to JMP Dara Hammond JMP Account Rep.
Advertisements

Linux vs. Windows. Linux  Linux was originally built by Linus Torvalds at the University of Helsinki in  Linux is a Unix-like, Kernal-based, fully.
Department of Mathematics and Computer Science
System Software Chapter Describe the differences between system software and application software Discuss the four types of system software Discuss.
1 DOS with Windows 3.1 and 3.11 Operating Environments n Designed to allow applications to have a graphical interface DOS runs in the background as the.
Technology Guide 2 Software.
By Godfrey Aziyo Department of LIS Telephone:
Computer Jobs 2013 Bob Nielson. Average Wage The average wages of all jobs in America >>>> $45,790 > $80,180.
Computer Jobs 2014 Bob Nielson. Average Wage The average wages of all jobs in America >>>> $45,790 > $80,180.
Almost 4 decades of Advanced Analytics & DM expertise.
Career Opportunities in Information Technology There are four main categories of IT jobs, grouped by the main focus of the job: Sales and support Software.
Objectives Overview Identify the qualities of valuable information Describe various information systems used in an enterprise Identify the components of.
CS480 Computer Science Seminar Introduction to Microsoft Solutions Framework (MSF)
IBM SPSS Information Factory A SELECT INTERNATIONAL COMPANY.
ITGS Application Software, pt. 3. ITGS Business Software Alliance (BSA) and Federation Against Software Theft (FAST) –Represent software companies and.
MIS 105 LECTURE 1 INTRODUCTION TO COMPUTER HARDWARE CHAPTER REFERENCE- CHP. 1.
CS 127 Introduction to Computer Science. What is a computer?  “A machine that stores and manipulates information under the control of a changeable program”
1 Title: Introduction to Computer Instructor: I LTAF M EHDI.
Computer Software Types Three layers of software Operation.
CHAPTER 4 Data Warehousing, Access, Analysis, Mining, and Visualization 2 1.
Copyright (c) 2003 by Prentice Hall Chapter 2 Applications Software: Getting the Work Done Computers: Tools for an Information Age BSM025 Computers.
A computer contains two major sets of tools, software and hardware. Software is generally divided into Systems software and Applications software. Systems.
Genie Pal A Versatile Intelligent Assistant To Help Both Work And Personal life.
What Business Analytics Can Do For You!
INTRO. To I.T Razan N. AlShihabi
A quick guide to other statistical software
Popular Database Management Systems
Bhakthi Liyanage SQL Saturday Atlanta 15 July 2017
Chapter 1 Computer Technology: Your Need to Know
Information Systems in Organizations 1.1 Introduction to MIS
Week 1: Ungraded review questions
Explain the five parts of an information system: people, procedures, software, hardware, and data.
Presenter Date | Location
Research Computing Support
Value of Microsoft Office Certification
Introduction to Visual Basic 2008 Programming
Computer Software Lecture 5.
Career JEOPARDY! Cluster: Information Technology
Course: Introduction to Computers
Computer Technology Notes #3
LINUX WINDOWS Vs..
5 SYSTEM SOFTWARE CHAPTER
Chong Ho (Alex) Yu 512 Orientation.
Computers Are Your Future
Tableau Overview  Tableau is widely used data visualization and BI tool. Tableau is simple to use and has extensive visualization capability that make.
Information Systems in Organizations 1.1 Introduction to MIS
Information Systems in Organizations 1.1 Introduction to MIS
What are Digital Careers?
Cambridge National Certificate in Information Technologies
Chapter 4.
Business Intelligence
System And Application Software
Information Systems in Organizations 1.1 Introduction to MIS
Information Systems in Organizations 1.1 Introduction to MIS
Networks Software.
CIS16 Application Development – Programming with Visual Basic
Microsoft Virtual Academy
National Diploma in Computer Studies
5 SYSTEM SOFTWARE CHAPTER
11 Business and Technology 11-1 Computer Systems
Copyright © JanBask Training. All rights reserved Top 10 Charming IT jobs that would be High in Demand in 2019.
Information Systems in Organizations 1.1 Introduction to MIS
My Position: I am an Intern in the Psychology Industrial Internship Program. I am working in Connie Varnhagen’s Instructional Technologies Lab. The.
Data Analysis and R : Technology & Opportunity
Information Systems in Organizations 1.1 Introduction to MIS
DSS Architecture MBA 572 Craig K. Tyran Fall 2002.
Microsoft Virtual Academy
Ungraded quiz Unit 1.
Chapter 3 Software.
Computer Science Dr Hwang Chair, Computer Science Department
Presentation transcript:

Statistical computing tools: What are the hot skills out there? Chong ho Alex Yu 2019

What is the most crucial technology? https://www.insideprivacy.com/data-privacy/president-trump-signs-executive- order-on-artificial-intelligence/

US might ban exporting AI to China https://www.pymnts.com/news/regulation/2019/artificial-intelligence-china-digital- technology/

US might ban exporting AI to China https://www.pymnts.com/news/regulation/2019/artificial-intelligence-china-digital- technology/

These are the buzzwords Machine learning (Artificial science) Deep learning Big data analytics Data mining Data science Data visualization Business intelligence All are new and emerging technologies. We will cover them in detail in STAT 553. In this course (STAT 551 we focus on data visualization)

What is the best job in 2019? How can I get there? What skill set do I need?

General-purpose vs. specialized General-purpose beyond data analysis/analytics (DAA): e.g. Python for developing desktop GUI applications, websites, web applications, and data analysis. General-purpose in DAA: SAS, SPSS, Stata Semi-general-purpose in data analysis: Excel Specialized in specific DAA e.g. Tableau: Data visualization Mplus, EQS: Structural equation modeling Bilog, Winsteps, RUMM: Item response theory and Rasch modeling

  Open source Commercial Example R, Python SAS, SPSS, Stata, Tableau Cost Free, but might need to hire programmers to create and maintain the system $$$$$$$ Architecture Open and can be modified Proprietary User-friendliness Unusually user-interface is not tested; R and Python require coding. Usability is tested by experts; most include graphical user interface. Security Less secure because it Is not developed in a controlled environment; no peer-review More secure because it is developed in a controlled environment and is validated before relase. Service & Support By the open source community, but no one is ultimately responsible. By the tech support; guaranteed response. Compatibility May not fully compatible with each other when there are many cooks in the kitchen. Like Apple, all components that are made by the same vendor can work seamlessly. And many types of hardware require specific drivers. Longevity Overall open source packages tend to stick around, but some are outdated, too (e.g. Tophat) Some companies ceased to exist (e.g. BMDP) and some products were discontinued by the company.

Myth 1 Open source software becomes more and more popular. Eventually it will replace commercial software packages. Fact: It is true that usage of R had surpassed SAS (Statistical Analysis System) in small and medium companies, but big corporations still count on the tech support of commercial systems. If you are the chief information officer (CIO) of a big corporation, would you hand over your crucial data to open source? SAS is still used by most Fortune 500 companies and highly regulated industries (e.g. banking, healthcare/pharma)

Analogy Wikipedia is free. It is contributed and supported by the open community. But probably your professors do not allow you to cite Wiki as your reference in academic papers. You still need references from commercial publishers. If you dare to submit a wiki- based paper to me…

Myth 2 Open source is completely free. You need to hire programmers to develop applications. Software modules built on open source cost $$$$ 

Myth 3 SAS is a programming language. It is very difficult to learn and use. Fact: SAS consists of a suite of products. Many SAS packages have graphical user interface (GUI). Base SAS: Traditional programming environment Enterprise Guide SAS Studio Enterprise Miner Visual Statistics/Visual Analytics SAS Viya

SAS programming environment

SAS Enterprise Guide If you don’t like programming… Drag-n-drop, point-n- click Flow chart interface: a diagram of the sequence of actions in a complex system (e.g. computer program). Auto-documentation You can go back easily.

SAS Enterprise Guide If you want to see the SAS syntax and the result, double-click the icon.

SAS Studio Use a Web browser Cross-platform (Windows and Mac) A good learning tool Drag-n-drop, point- n-click on the left Studio generates the syntax on the right (on the fly).

SAS Enterprise miner Built for data mining and predictive modeling Use a flow-chart interface Each step is depicted by a node (icon).

SAS Visual Statistics/Visual Analytics Multi-panel Dynamic graph: All panels are inter-linked. Changing one would update others Interactive exploration by asking what-if question.

SAS Viya (via: From here to there) Coexist with R, Python…etc. In-memory Analytics platform for cloud computing In-memory analysis: Traditionally, data analysis is done with data in a hard drive. When you have extremely big data, transferring the data from one server to another one is time-consuming. In- memory analysis is done in the server's random access memory (RAM). Cloud computing: In the past we stored data in the hard drive. When the data analyst is on vacation, oooop! In cloud computing, storage, analytics, and more are done over the Internet. https://www.zdnet.com/article/sas-is-on-the-brink- of-generation-change/

JMP A product of SAS; fully compatible with SAS Different versions: JMP: offer basic predictive modeling tools JMP Pro: Includes and advanced predictive analysis

JMP Include traditional statistical procedures and tools for exploratory data analysis (EDA), data visualization, data mining, and predictive modeling.

SAS Certification Exams SAS offers 23 credentials across seven categories Foundation Tools Advanced Analytics Business Intelligence and Analytics Data Management Administration JMP Partners e.g. SAS Certified Specialist Base Programming using SAS 9.4​ e.g. SAS Certified Data Scientist Using SAS 9​ e.g. SAS Certified BI Content Developer for SAS 9.4 e.g. SAS Certified Big Data Professional Using SAS 9 e.g. SAS Certified Platform Administrator for SAS 9 e.g. JMP Certified Specialist: JMP Scripting Using JMP 14​ e.g. SAS Certified Deployment Specialist for Visual Analytics 7.3 https://www.sas.com/en_us/certification.html https://www.businessnewsdaily.com/10716-sas-certification-guide.html

SPSS Statistics SPSS Statistics Base SPSS Statistics Standard Include traditional statistical procedures, such as t-test, ANOVA, correlation, regression, Chi-square…etc. APU has the standard version SPSS Statistics Premium Advanced Statistics, Custom Tables, Data Preparation, Missing Values, Forecasting, Decision Trees, Direct Marketing, Complex Samples, Conjoint, Neural Networks, Bootstrapping, Categories, Exact Tests, Visualization Designer, SamplePower, and AMOS or Structural Equation Modeling. The Mac version does not have Visualization Designer, SamplePower, and AMOS.

Price difference between SPSS Standard and Premium: $99.99 – $86.99 = $13

You need to go beyond SPSS Statistics! You need modeling! Not this kind of modeling.

IBM SPSS Modeler For data mining and predictive analysis Use a flow-chart interface.

Tableau Fairly new, founded in 2003. Powerful software for data visualization Include advanced dynamic graphing tools, such as Geographical Information System (GIS), time-series, and dashboard.

R An interpreted programming language: Run the instruction directly without compiling the program into an executive program e.g. type “mean (2, 3, 4, 5)” at the prompt and it returns “2” PowerPoint, PhotoShop…etc. are compiled (e.g. PPTX.exe, PhotoShop.exe). You cannot see the original source code. R Studio offers a nicer interface, but still no drag and drop.

Graphical User Interface for R Some developers created GUI-based statistical programs e.g. JASP

GUI for R R has a very steep learning curve. Good news: Behind the scene, many statistical computing in JASP are performing by calling R package.

Python Also an interpreted language. Besides data analytics, it is a train for all tracks; it is used in almost everything: Webpages, multimedia, databases, networking, automation, image processing.

Be careful of some information from the Internet When you compare between tools, don’t assume that everything on the Web is accurate. According to IntlliPaat, “SAS is not great at graphical capabilities. Though Base SAS has some graphical capabilities improvisation, these capabilities are not widely known, and so R gets a clear lead in this aspect.” Source: https://intellipaat.com/blog/sas-versus-r/

Be careful of information from the Internet And JMP is very advanced in data visualization!

21 most valuable job skills in 2016 A study by MONEY Magazine and Payscal.com in 2016 Top skills related to data analysis and their average pay boost: 1. SAS (Statistical Analysis System): +6.1% 2. Data Mining/Data Warehousing: +5.1% 4. Data Modeling: +5% Source: http://time.com/money/4328180/most-valuable-career-skills/

2017-18 most popular analytical tools Source: https://analyticstraining.com/10-most-popular-analytic-tools-in- business/

Most used data science tools for 2019 Source: https://data-flair.training/blogs/data-science-tools/

Most used data science tools for 2019 Source: https://data-flair.training/blogs/data-science-tools/

Most used data science tools for 2019 Source: https://data-flair.training/blogs/data-science-tools/

Most used data science tools for 2019 Source: https://data-flair.training/blogs/data-science-tools/

Most used data science tools for 2019 Source: https://data-flair.training/blogs/data-science-tools/

Most used data science tools for 2019 Source: https://data-flair.training/blogs/data-science-tools/

Most used data science tools for 2019 Source: https://data-flair.training/blogs/data-science-tools/

Market share of Business intelligence (BI) Source: https://www.appsruntheworld.co m/top-10-analytics-and-bi- software-vendors-and-market- forecast/ What is BI? Technological applications for the collection, integration, analysis, and presentation of business information. Common tools: data mining and data visualization

2018 Burtch Works survey Which do you prefer to use – The Trinity: SAS, R, or Python? A tie! https://www.burtchworks.com/2018/07/16/2018-sas-r-or-python- survey-results-which-do-data-scientists-analytics-pros-prefer/

2017 Burtch Works Survey 1 year before (2017) R has an upper hand (40%). SAS share remains constant (34%) but more users shift from R to Python in 2018. https://www.burtchworks.com/2017/06/19/2017-sas-r-python-flash-survey- results/

2018 Burtch Works survey In marketing, finance and healthcare/pharma, SAS is the winner. I n Telecom, consulting, Python is the winner. In retail and all others, R is the winner.

2018 Burtch Works Survey People who have more experience (16+ years) prefer SAS. People who have less experience (5 years or less) prefer Python.

2018 Burtch Works Survey People who hold bachelor’s or master’s degrees tend to use SAS. People who hold doctoral degrees tend to use Python.

Top statistical software (n.d.) Scored and ranked by Pat Research The list is more about classical statistics. Source: https://www.predictiveanalyticstoday.com/top-statistical- software/ 1. IBM SPSS Modeler 2. Minitab 3. TIBCO Spotfire 4. Statistica 5. Analyse-it 6. AcaStat 7. Stata 8. SAS Visual Statistics 9. Forecast Pro 10. Regression analysis of Time Series

Glassdoor in summer 2019

Glassdoor in summer 2019 Python is a general-purposed tool. Without the key word “data science” or “data analytics”, it will return a much larger number. The letter “R” is too vague. You need to use “R language” or “R data science,” otherwise the website would return many jobs irrelevant to R programming

Indeed.com in summer 2019

Indeed.com in summer 2019

Indeed.com in summer 2019

Pay Scale of SPSS in Summer 2019

Pay Scale of SAS in Summer 2019

Pay Scale of Tableau in Summer 2019

Pay Scale of R in Summer 2019

Pay scale of Python Python is a general-purpose tool. Please look at “Data scientists” only

Conclusion You don’t have to choose either this or that. To obtain a skill set to meet what the job market needs, it is better to learn all of them. Commercial software packages (e.g. SAS and SPSS) can work with open source (e.g. Python and R) side by side e.g. You can run a SAS program inside the Python environment. You can also run a Python program inside IBM SPSS modeler.