Collecting, Analyzing, and Visualizing Data with Python Part I

Slides:



Advertisements
Similar presentations
Guy Griffiths. General purpose interpreted programming language Widely used by scientists and programmers of all stripes Supported by many 3 rd -party.
Advertisements

Sarah Reonomy OSCON 2014 ANALYZING DATA WITH PYTHON.
Jonathan Huelman CSC 415 – Programming Languages
Microsoft Visual Basic 2012 CHAPTER ONE Introduction to Visual Basic 2012 Programming.
CSC 110 A 1 CSC 110 Introduction to Python [Reading: chapter 1]
OVERVIEW- What is GIS? A geographic information system (GIS) integrates hardware, software, and data for capturing, managing, analyzing, and displaying.
A Powerful Python Library for Data Analysis BY BADRI PRUDHVI BADRI PRUDHVI.
This material is based upon work supported by the National Science Foundation under Grant No. ANT Any opinions, findings, and conclusions or recommendations.
Google Map API The Google Maps API lets you embed Google Maps in your own web pages with JavaScript The API provides a number of utilities for manipulating.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
Microsoft Visual Basic 2015 CHAPTER ONE Introduction to Visual Basic 2015 Programming.
Modular Abstraction of Complex Real Time Analysis Benjamin P. Campbell Faculty Advisor: Dr Joel Henry, Ph.D. Department of Computer Science University.
Using Python to Retrieve Data from the CUAHSI HIS Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2015 This work was funded by National Science.
1 Support for Scientific Computing People: Bruce Beckles Nick Maclaren Services: Courses Advice.
Extending ArcGIS with Python Clinton Dow – Geoprocessing Product Esri.
Big data toolbox.
Python for data analysis Prakhar Amlathe Utah State University
The language focusses on ease of use
Andrew White, Brian Freitag, Udaysankar Nair, and Arastoo Pour Biazar
PyTimber & CO M. Betz, R. De Maria, M. Fitterer, C. Hernalsteens, T. Levens Install: $ pip install pytimber Sources:
Metis Data Science Meetup:
INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER
Why API?.
Introduction to Visual Basic 2008 Programming
An Introduction to the IVC Software Framework
Leveraging BI in SharePoint with PowerPivot and Power View
IPYTHON AND MATPLOTLIB Python for computational science
Lawrence Livermore National Laboratory
DATA MINING Python.
2) Platform independent 3) Predefined functions
External libraries A very complete list can be found at PyPi the Python Package Index: To install, use pip, which comes with.
Crawling the Web for Job Knowledge
Prepared by Kimberly Sayre and Jinbo Bi
7 Best Programming Languages Based as per Earnings & Opportunities
Top Reasons to Choose Angular. Angular is well known for developing robust and adaptable Single Page Applications (SPA). The Application structure is.
Network Visualization
Torch 02/27/2018 Hyeri Kim Good afternoon, everyone. I’m Hyeri. Today, I’m gonna talk about Torch.
Introduction to MATLAB
Python Visualization Tools: Pandas, Seaborn, ggplot
Introduction to pandas
Homework 4 Responses There was no expectation of completeness assignment – this was meant to give you the opportunities to a) start and manage a github.
Python for Scientific Computing
Chapter 6 System and Application Software
Twitter Movie Sentiment Using Python, SQL Server, Azure SQL DB, Azure ML, & Power BI Bradley Ball
Use of Mathematics using Technology (Maltlab)
Brief Intro to Python for Statistics
Twitter Movie Sentiment Using Python, SQL Server, Azure SQL DB, Azure ML, & Power BI Bradley Ball
Data Science with Python
D2 Indices and matrices Matrix representation in computer systems:
Communication and Coding Theory Lab(CS491)
The Web Wizard’s Guide To JavaScript
Lecture 6: Data Quality and Pandas
More data. More information. Conclusion and Future Work
Database Connectivity and Web Development
BUILDING A DIGITAL REPOSITORY FOR LEARNING RESOURCES
THE REAL WORLD APPLICATIONS OF PYTHON. INTRODUCTION Python is an interpreted, high-level, general-purpose programming language. Created by Guido van Rossum.
商業智慧實務 Practices of Business Intelligence
Recitation #1 Tel Aviv University 2017/2018 Slava Novgorodov
Chapter 6 System and Application Software
Recitation #1 Tel Aviv University 2016/2017 Slava Novgorodov
Chapter 6 System and Application Software
Python for Data Analysis
Analyzing Massive Graphs - ParT I
Chapter 6 System and Application Software
Working with GEOLocation Data
Collecting, Analyzing, and Visualizing Data with Python Part II
Web Application Development Using PHP
Igor Stančin, Alan Jović to: {igor.stancin,
Beyond orchestration with Azure Data Factory
An Introduction to Data Science using Python
Presentation transcript:

Collecting, Analyzing, and Visualizing Data with Python Part I Dr. Michael Fire

There several ways to collect data: Collecting Data There several ways to collect data: Using existing datasets Create/Simulate your own dataset Using Web scraping Using API

Web Scraping We can collect data using web scraping using one of the following methods: Using simple tools like wget Using Selenium for dynamic loaded pages Using web scraping frameworks like Scrapy Writing your own code

Using Application Programming Interfaces We can use various websites’ Application Programming Interfaces (APIs) to collect data from various platforms, such as: Twitter Reddit Google Maps Kaggle Github

Recommended Read Python Data Science Handbook, Chapter 1 IPython: Beyond Normal Python by Jake VanderPlas The Unix Shell by Software Carpentry Foundation Practical Introduction to Web Scraping in Python by Colin OKeefe

Image source: https://pixabay ManIPULATING DATA

Numerical Python (NumPY)

Source: Python Data Science Handbook, Chapter 1 IPython: Beyond Normal Python by Jake VanderPlas

NumPy - The Basics Supports large multi-dimensional arrays and matrices Contains large collection of high-level mathematical functions to operate on these arrays Tools for reading / writing array data to disk Useful Reading: Chapter 4. NumPy Basics: Arrays and Vectorized Computation, Python for Data Analysis by Wes McKinney Chapter 2. Introduction to Numpy, Python Data Science Handbook, by Jake VanderPlas

Working With PANDAS & DataFrames

Pandas Pros: Provides flexible and expressive data structures Easy to handle missing data Columns can easily be added and deleted Cons: Good for several gigabytes of data Mostly single threaded Complex Group By operations

“My rule of thumb for pandas is that you should have 5 to 10 times as much RAM as the size of your dataset” Wes McKinney, 2017

Pandas Objects NumPy Series DataFrame Values Column

Let’s move to the Notebook