Download presentation
Presentation is loading. Please wait.
1
Big data toolbox
2
R for statistical computing
Free software for statistical computing, data manipulation and graphics Comprehensive R Archive Network (CRAN) – binary distributions of the base system and contributed packages GitHub: open source access to R packages that are not available on CRAN
3
Python Free computing software
SciPy is a collection of packages for mathematics, science and engineering Pandas is a data analysis and modeling library IPython used for sharing, visualization and parallel computing
4
Educational resources
Statistical Learning and Data Mining Workshop (Hastie and Tibshirani of Stanford) Coursera: Python; Machine learning; R Google Developer R videos on YouTube: Master of Information and Data Science (MIDS), UC Berkeley
5
Cloud resources Microsoft Azure Amazon Elastic MapReduce (EMR)
Storage Distributed cloud computing (Hadoop) Server R Machine learning Amazon Elastic MapReduce (EMR) Google Cloud Platform
6
Dimension reduction Singular Value Decomposition (SVD)
Used to represent data efficiently Reduces number of attributes that are used in data analysis/mining Removes unnecessary data that are linearly dependent Principal Components Analysis (PCA) Maps correlated variables into uncorrelated variables called principal components (PC) First PC accounts for largest amount of variability in data Second PC accounts for largest amount of remaining variability available, and so on These components can then replace the individual variables that create the components for analysis
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.