Download presentation
Presentation is loading. Please wait.
Published byMartin Bell Modified over 6 years ago
1
Python for data analysis Prakhar Amlathe Utah State University
Pandas Python for data analysis Prakhar Amlathe Utah State University
2
Introduction to Pandas :
Created by Wes McKinney in 2008, now maintained by Jeff Reback and many others. Powerful and productive Python data analysis and Management Library Panel Data System Its an open source product.
3
Overview Python Library to provide data analysis features similar to : R, MATLB, SAS Rich data structures and functions to make working with data structure fast, easy and expressive. It is built on top of NumPy which provides it agility Key components provided by Pandas : Two new data structures to Python Series DataFrame
4
Installation Steps anaconda is a distribution of Python containing Python, the conda package and environment manager and many software packages for data analytics, data science and scientific computing. Anaconda is free to use and redistribute under the terms of Anaconda End User License Agreement.
5
What problem does panda solve ?
Python has long been great for data munging and preparation, but less so for data analysis and modeling. pandas helps fill this gap, enabling you to carry out your entire data analysis workflow in Python without having to switch to a more domain specific language like R. Combined with the excellent IPython toolkit and other libraries, the environment for doing data analysis in Python excels in performance, productivity, and the ability to collaborate.
6
The ideal tool for Data Scientists
Munging data Cleaning data Analyzing data Modelling data Plotting graphs or Tabular displays of organized results.
7
Speed Comparison of Python Vs NumPy
8
Series : Pandas Data Structure
One dimensional array like object like an array or list It contains array of data (of any NumPy data type) with associated indexes. By default , the series will get indexing from 0 to N where N = size -1
9
DataFrame : Pandas Data Structure
A dataFrame is a tabular data structure comprised of rows and columns, akin to a spreadsheet or database table. It can be treated as a series of objects sharing common index
10
Operations that can be performed on data structures
Filtering Summarizing Group by – split apply combine Merge, join, aggregate Time series/ Data functionality Plotting with Matplotlib and many more…
11
Links to important websites
python-data-manipulation/
12
Any questions ??
13
Thanks !
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.