Technology of Data Analytics
INTRODUCTION OBJECTIVE Data Analytics mindset – shallow and wide, deep when you need it Quick overview, useful tidbits, provide a jumping off point AGENDA/ TOPICS Excel VBA Access SQL Tableua Hadoop Analytical Packages: R/ SAS/ SPSS/ Minitab
SQUARE 1 Business and Technology Entity Attributes Schema Relational Database ETL - Extract Transform Load Data Mining
START WITH EXCEL It’s the easiest and most available platform Can teach others to maintain Collect Data Validation Drop Downs Store vLookups Analyze Formulas If, And Pivot Table Report/ Visualize Charts Conditional Formatting Offset
VISUAL BASIC FOR APPLICATIONS Microsoft language Object Oriented- noun.verb; noun.adjective=“adjective” Record macro and play around Modules and Userforms Cell Referencing - cells(x,y).select For loop – for index = startingnumber to ending number If logicalstuff then stuff that happens end if Use it for: Moving data Changing charts
GOOGLE DOCS: COLLECTION Somebody already did everything for you Google people are smarter than you You can use the interwebs: instead of local drive
ACCESS Beginning of databasing Table Like Excel spreadsheet Tightly defined values allowed View Pulling info from tables using logic A lasting query that is used to populate reports Form Data input Report Generates reports
SQL Big Boy Access Same as Access without the bumpers and hand holding Real deal use in software world Can be used for maintenance and diagnosing software back ends Table Like Excel spreadsheet Tightly defined values allowed View Pulling info from tables using logic A lasting query that is used to populate reports Query Viewing data Stored Procedures Loading and moving data I don’t really know SRS Web based reports
TABLEAU Connections Worksheets Views Dashboards Stories
HADOOP Virtualizes multiple computers/ servers to create a cloud computing unit Hadoop Common – contains libraries and utilities needed by other Hadoop modules. Hadoop Distributed File System (HDFS) – a distributed file-system that stores data on commodity machines, providing very high aggregate bandwidth across the cluster. Hadoop YARN – a resource-management platform responsible for managing compute resources in clusters and using them for scheduling of users' applications. Hadoop MapReduce – a programming model for large scale data processing. Get started at:
Analyze: SAS/ R/ SPSS/ Minitab S.A.S. Academic/ Common R Open source S.P.S.S IBM Minitab Analytical Excel
Other iTunes U: Data Visualization CoursEra: Introduction to Data Science Code Academy: other programming languages
EDUCATION PROJECTS Open Source Education – BDAA Book of Knowledge Stats Cheat Sheet Excel Guide SQL Guide How to Guides in General….