Presentation is loading. Please wait.

Presentation is loading. Please wait.

External libraries A very complete list can be found at PyPi the Python Package Index: https://pypi.python.org/pypi To install, use pip, which comes with.

Similar presentations


Presentation on theme: "External libraries A very complete list can be found at PyPi the Python Package Index: https://pypi.python.org/pypi To install, use pip, which comes with."— Presentation transcript:

1 External libraries A very complete list can be found at PyPi the Python Package Index: To install, use pip, which comes with Python: pip install package or download, unzip, and run the installer directly from the directory: python setup.py install If you have Python 2 and Python 3 installed, use pip3 (though not with Anaconda) or make sure the right version is first in your PATH.

2 Numpy Mathematics and statistics, especially multi-dimensional array manipulation for data processing. Good introductory tutorials by Software Carpentry:

3 Numpy data Perhaps the nicest thing about numpy is its handling of complicated 2D datasets. It has its own array types which overload the indexing operators. Note the difference in the below from the standard [1d][2d] notation: import numpy data = numpy.int_([ [1,2,3,4,5], [10,20,30,40,50], [100,200,300,400,500] ]) print(data[0,0]) # 1 print(data[1:3,1:3]) # [[20 30][ ]] On a standard list, data[1:3][1:3] wouldn't work, at best data[1:3][0][1:3] would give you [20][30]

4 Numpy operations You can additionally do maths on the arrays, including matrix manipulation. import numpy data = numpy.int_([ [1,2,3,4,5], [10,20,30,40,50], [100,200,300,400,500] ]) print(data[1:3,1:3] - 10) # [[10 20],[ ]] print(numpy.transpose(data[1:3,1:3])) # [[20 200],[30 300]] There's a nice numpy cheatsheet from datacamp at:

5 Pandas Data analysis. Based on Numpy, but adds more sophistication.

6 Pandas data Pandas data focuses around DataFrames, 2D arrays with addition abilities to name and use rows and columns. import pandas df = pandas.DataFrame( data, # numpy array from before. index=['i','ii','iii'], columns=['A','B','C','D','E'] ) print (data['A']) print(df.mean(0)['A']) print(df.mean(1)['i']) Prints: i 1 ii 10 iii 100 Name: A, dtype: int32 37.0 3.0

7 scikit-learn Scientific analysis and machine learning. Used for machine learning. Founded on Numpy data formats.

8 Beautiful Soup Web analysis. Need other packages to actually download pages like the library requests. BeautifulSoup navigates the Document Object Model: Not a library, but a nice intro to web programming with Python.

9 Tweepy Downloading Tweets for analysis. You'll also need a developer key: access-key-for-twitter-oauth/994/ Most social media sites have equivalent APIs (functions to access them) and modules to use those.

10 NLTK Natural Language Toolkit. Parse text and analyse everything from Parts Of Speech to positivity or negativity of statements (sentiment analysis).

11 Celery Concurrent computing / parallelisation. For splitting up programs and running them on multiple computers e.g. to remove memory limits. See also:


Download ppt "External libraries A very complete list can be found at PyPi the Python Package Index: https://pypi.python.org/pypi To install, use pip, which comes with."

Similar presentations


Ads by Google