1 Peter Fox Data Analytics – ITWS-4963/ITWS-6965 Week 1b, January 24, 2014 Relevant software and getting it installed.
Admin info (keep/ print this slide) Class: ITWS-4963/ITWS 6965 Hours: 12:00pm-1:50pm Tuesday/ Friday Location: SAGE 3101 Instructor: Peter Fox Instructor contact: (do not leave a Contact hours: Monday** 3:00-4:00pm (or by appt) Contact location: Winslow 2120 (sometimes Lally 207A announced by ) TA: Lakshmi Chenicheri Web site: –Schedule, lectures, syllabus, reading, assignments, etc. 2
Today Install application software Get some data and read, explore, etc. Install data technology and related software 3
Gnu R R Studio – see R-intro.html in manualshttp:// /– / –Manuals - –Libraries – at the command line – library(), or select the packages tab, and check/ uncheck as needed – 4
Scipy/numpy/ iPython (NB) Windows/Linux – If you have a Mac –Anaconda – (preferred) Use Launcher to install Spyder (and iP Qt) –Do you have macports installed? ‘$ which port’ –No? (sorry – ask me for details…) Install Xcode (from - you will need to register - academic) Also see individual packages on the install page.. 5
Matlab Student version License works within RPI network, so may have to use VPN if outside r.html R for Matlab usershttp://mathesaurus.sourceforge.net/octave- r.html 6
Files This is where the files for assignments, exercise will be placed 7
Exercises – getting data in Rstudio –read in csv file (two ways to do this) - GPW3_GRUMP_SummaryInformation_2010.csv –Read in excel file (directly or by csv convert) EPI_data.xls (2010EPI_data tab) –See if you can plot some variables –Anything in common between them? 8
Exercises Scipy –In Spyder read in a matlab file: import scipy.io as sio mat_contents = sio.loadmat(‘Williams40.mat’) mat_contents Explore – plot, etc. –Read in a csv file (your choice) –Write out as matlab file, i.e. sio.savemat (see File I/O help o.html ) o.html – tats.html - start lookinghttp://docs.scipy.org/doc/scipy/reference/tutorial/s tats.html 9
Exercises Matlab –Read in two different datasets: sw40_30s.mat or sw29adcp.mat UChicago30.mat or Williams40.mat –Explore them… –Read in the csv files 10
If time or for fun… se_eqs.xls –Plot it –Fit it PRESSURE.xls –Plot it –Smooth it –Fit it … 11
Install-fest… continues ml#databasehttp://projects.apache.org/indexes/category.ht ml#database –Hadoop (MapReduce) –Pig ( ) –HIVE ( ) gStartedhttps://cwiki.apache.org/confluence/display/Hive/Gettin gStarted alhttps://cwiki.apache.org/confluence/display/Hive/Tutori al ageManualhttps://cwiki.apache.org/confluence/display/Hive/Langu ageManual –Cassandra (binaries from DataStax) And MongoDB
Objective Get a good feel for the complexity and maturity of the data and tools environments See some real data and start to consider what it will take to work with it Big and complex - means time and memory and laptops only can do so much We’ll soon look at the intersections like RHadoop: op/wiki op/wiki 13
No more reading this week Complete the installs as best you can Pick your preferred application and data software and read up on them, try some examples 14