Python Visualization Tools: Pandas, Seaborn, ggplot 2015-01-29 郝蕊
Pandas the fundamental high-level building block for doing practical, real world data analysis in Python get data from csv, excel, hdf, sql, json, html, stata basic plot function, may need to learn matplotlib to customize pandas + other visualization library
Pandas - Data Structures Series one-dimensional labeled array s = Series(data, index=index) python dict ndarray scalar value ndarray-like dict-like vectorized operation Series(randn(5), index=['a', 'b', 'c', 'd', 'e']) a -2.783 b 0.426 c -0.650 d 1.146 e -0.663 d = {'a' : 0., 'b' : 1., 'c' : 2.} Series(d, index=['b', 'c', 'd', 'a']) b 1 c 2 d NaN a 0
Pandas – Data Structures DataFrame 2-dimensional labeled columns, index df = DataFrame(data, index=index) dict of series or dicts dict of ndarrays / lists list of dicts … d = {'one' : Series([1., 2.], index=['a', 'b']), 'two' : Series([1., 2., 3.], index=['a', 'b', 'c'])} DataFrame(d, index=[‘c', 'a'], columns=['two', 'three']) two three c 3 NaN a 1 NaN d = {'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]} DataFrame(d, index=['a', 'b', 'c', 'd']) one two a 1 4 b 2 3 c 3 2 d 4 1
Pandas – Data Structures Panel 3-dimensional data wp = Panel(data, items,major_axis,minor_axis) 3D ndarray dict of dataframe wp = Panel(randn(2, 5, 4), items=['Item1','Item2'], major_axis=date_range('1/1/2000', periods=5), minor_axis=['A', 'B', 'C', 'D']) A B C D 2000-01-01 1.026683 1.078142 1.052085 -0.887711 2000-01-02 -0.767984 1.050011 1.081298 -0.179630 2000-01-03 -1.287704 -0.886675 -0.391356 -0.256049 2000-01-04 0.905988 -0.894942 -0.093016 1.720936 2000-01-05 -1.362452 0.888813 0.065038 -2.012759
Seaborn Python visualization library based on matplotlib making more complicated plots simpler to create, does not do much for simple chart built in styles to quickly change the color theme support for numpy, pandas data structures support for scipy, statsmodels statictical routines
Seaborn – Plot Gallery
Seaborn – Plot Types Linear model plots quantitative data categorical data regression: simple or multiple faceted linear model nonlinear, logistic regression outliers marginal distributions examining model residuals pairwise relationship Residuals: 残差
Seaborn – Plot Types Matrix plots Timeseries plots Miscellaneous plots cluster map heat map Timeseries plots Miscellaneous plots
Seaborn - Example
ggplot improve the visual appeal of matplotlib visualizations in a simple way port of ggplot2 of R, some API is non-pythonic but very powerful support pandas
ggplot – Plot Gallery bar density facetgrid histogram line scatter smooth
ggplot - Example