Scipy 'Ecosystem' containing a variety of scientific packages including iPython, numpy, matplotlib, and pandas. numpy is both a system for constructing.

Slides:



Advertisements
Similar presentations
Transformations We want to be able to make changes to the image larger/smaller rotate move This can be efficiently achieved through mathematical operations.
Advertisements

Matlab tutorial course Lesson 2: Arrays and data types
ARRAY REFERENCING 1 1. II. Array Referencing Assume an array has values. It is useful to “refer to” the elements contained within it – as smaller portions.
418512: Computer Programming Languages Lecture 7 Pramook Khungurn TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A AAAA.
Scientific Computing with NumPy & SciPy NumPy Installation and Documentation  Not much on the home page—don’t buy the guide, it’s.
Arrays 1 Multiple values per variable. Why arrays? Can you collect one value from the user? How about two? Twenty? Two hundred? How about… I need to collect.
Python Crash Course Numpy 3 rd year Bachelors V1.0 dd Hour 5.
INTRODUCTION TO MATLAB DAVID COOPER SUMMER Course Layout SundayMondayTuesdayWednesdayThursdayFridaySaturday 67 Intro 89 Scripts 1011 Work
CS100A, Fall 1998, Lecture 201 CS100A, Fall 1998 Lecture 20, Tuesday Nov 10 More Matlab Concepts: plotting (cont.) 2-D arrays Control structures: while,
Computer Graphics Mathematical Fundamentals Lecture 10 Taqdees A. Siddiqi
Python Scripting for Computational Science CPS 5401 Fall 2014 Shirley Moore, Instructor October 6,
PH2150 Scientific Computing Skills
INTRODUCTION TO STATISTICS
Linear Algebra Review.
Matrix. Matrix Matrix Matrix (plural matrices) . a collection of numbers Matrix (plural matrices)  a collection of numbers arranged in a rectangle.
Matrices Rules & Operations.
Numpy (Numerical Python)
Chapter 7 Matrix Mathematics
1.5 Matricies.
Module 5 Working with Data
Chapter 3 Arrays and Vectors
CMSC201 Computer Science I for Majors Lecture 12 – Lists (cont)
Matrix 2015/11/18 Hongfei Yan zip(*a) is matrix transposition
Matrix 2016/11/30 Hongfei Yan zip(*a) is matrix transposition
2-D Lists Taken from notes by Dr. Neil Moore
JavaScript: Functions.
Introduction to Summary Statistics
Introduction to Summary Statistics
Python NumPy AILab Batselem Jagvaral 2016 March.
INTRODUCTION TO BASIC MATLAB
C-Programming, continued
Arrays … The Sequel Applications and Extensions
CSE Social Media & Text Analytics
Numerical Computing in Python
Python for Scientific Computing
Introduction to Summary Statistics
OOP Paradigms There are four main aspects of Object-Orientated Programming Inheritance Polymorphism Abstraction Encapsulation We’ve seen Encapsulation.
Giving instructions on how to do something
Introduction to Summary Statistics
Bryan Burlingame 03 October 2018
مبانی برنامه‌سازی Fundamentals of Programming
Unit-2 Divide and Conquer
Matlab tutorial course
Properties of the Real Numbers Part I
Introduction to Summary Statistics
Coding Concepts (Basics)
Numpy (Numerical Python)
MATLAB Programming Indexing Copyright © Software Carpentry 2011
Numpy and Pandas Dr Andy Evans
Data Types and Data Structures
Topics Sequences Introduction to Lists List Slicing
References.
Introduction to Summary Statistics
MATLAB Programming Basics Copyright © Software Carpentry 2011
Pandas Based on: Series: 1D labelled single-type arrays DataFrames: 2D labelled multi-type arrays Generally in 2D arrays, one can have the first dimension.
Data Intensive and Cloud Computing Matrices and Arrays Lecture 9
Announcements P3 due today
Simulation And Modeling
AS-Level Maths: Core 2 for Edexcel
Dr. Sampath Jayarathna Cal Poly Pomona
Topics Sequences Introduction to Lists List Slicing
Dr. Sampath Jayarathna Old Dominion University
Math review - scalars, vectors, and matrices
Introduction to Summary Statistics
Working with Arrays in MATLAB
2-D Lists Taken from notes by Dr. Neil Moore
DATAFRAME.
Introduction to Computer Science
Presentation transcript:

Scipy 'Ecosystem' containing a variety of scientific packages including iPython, numpy, matplotlib, and pandas. numpy is both a system for constructing multi-dimensional data structures and a scientific library. http://www.numpy.org/ The overarching collection for the numpy and pandas packages is scipy, a grouping of packages that includes matplotlib and the IPython project. At the root of many of these packages is compatibility with numpy, which is a data analysis library, but, perhaps more importantly, provides a structure for the construction of multi-dimensional data arrays.

ndarray ndarray or numpy.array (alias) is the basic data format. a = numpy.array([2,3,4]) Make array with list or tuple (NOT numbers) a = numpy.fromfile(file, dtype=float, count=-1, sep='') dtype Allows the construction of multi-type arrays count Number of values to read (-1 == all) sep Separator To generate using a function that acts on each element of a shape: numpy.fromfunction(function, shape, **kwargs) The core data type is the ndarray, or its alias numpy.array. As we'll see, the key advantage of this data format is the ability to do multi-dimensional slices. Note the potential confusion between numpy.array and array.array in Python, if you import both. The above slide shows three ways of constructing an ndarray. The usual way is from lists or a file. For a dtype example, see: https://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.fromfile.html#numpy.fromfile For a function example, see: https://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.fromfunction.html#numpy.fromfunction

Built in functions a = numpy.zeros( (3,4) ) array([[ 0., 0., 0., 0.], [ 0., 0., 0., 0.], [ 0., 0., 0., 0.]]) Also numpy.ones and numpy.empty (generates very small floats) numpy.random.random((2,3)) More generic : ndarray.fill(value) numpy.putmask() Put values based on a Boolean mask array (True == replace) There are a variety of functions that produce standardised arrays. For example, numpy.zeros takes in the size of an array as a tuple or dimension sizes, and makes an array containing zeros of the right size. numpy.empty does the same thing but doesn't set the contents. In practice this means the array is full of very small floats. If you have an array and you want to fill it with a specific number, use .fill().

arange Like range but generates arrays: a = np.arange( 1, 10, 2 ) array([1, 3, 5, 7, 9]) Can use with floating point numbers, but precision issues mean better to use: a = np.linspace(start, end, numberOfNumbersBetween) Note that with linspace "end" is generated. For a range of numbers, use arrange(). This generates a sequence like the standard "range", but in a 1D ndarray. We'll see in a bit how to then convert that to a multi-dimension array. While this can be used with floating point numbers, because of precision issues, it is better to use linspace to construct a set number of floats falling within a definite interval.

ndarray Options set with numpy.set_printoptions, including ndarray.ndim Number of axes (dimensions) ndarray.shape Length of different dimensions ndarray.size Total data amount ndarray.dtype Data type in the array (standard or numpy) print(array) Will print the array nicely, but if too larger to print nicely will print with "…," across central points. Options set with numpy.set_printoptions, including numpy.set_printoptions(threshold = None) Each ndarray has a set of attributes automatically set up, as above, which can be accessed to determine information about it. Printing an array will 'pretty print' it, more specifically, if it is too large, the middle numbers will be replaced by "…". This can be turned off with the set_printoptions function, as shown. https://docs.scipy.org/doc/numpy/reference/generated/numpy.set_printoptions.html

Platform independent save Save/Load data in numpy .npy / .npz format numpy.save(file, arr, allow_pickle=True, fix_imports=True) arr = numpy.load(file, mmap_mode=None, allow_pickle=True, fix_imports=True, encoding='ASCII') Although one can write out such arrays with standard text methods, there is also a platform-independent format for quick and effective data storage for use with numpy and related packages specifically. Note that in the above "arr" is the array to save/load into. https://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.save.html#numpy.save https://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.load.html#numpy.load

Indexing Data locations are referenced using [row, col] (for 2D): arrayA[1,2] not arrayA[1][2] This means we can slice across multiple dimensions, one of the most useful aspects on numpy arrays: a = arrayA[1:3,:] array of the 2nd and 3rd row, all columns. b = arrayA[:, 1] array of all values in second column. You can also use … to represent "the rest": a[4,...,5,:] == a[4,:,:,5,:]. The most obvious difference between ndarrays and standard Python lists and tuples is that the indexing is with a list of multiple dimensions, not multiple lists of single dimensions. The second is that, using this format, slices can be done across multiple dimesnions. Numpy also allocates the optional (but unused in standard Python) symbol "…" to mean "all the rest". For example, a[4,…] means "all the rest in the 4th row of the 2D array "a". Note that you have to avoid ambiguities with this, so, instead of: a[4,:,:,5,:]. We can say: a[4,...,5,:] but not a[4,...,5,…] As it wouldn't be clear which dimension the "5" referred to, whereas in a[4,...,5,:] it is clearly the second to last.

Indexing Can use numpy arrays to pull out values: j = np.array( [ [ 3, 4], [ 9, 7 ] ] ) a[j] Can also use Boolean arrays, with "True" values indicating values we want: mask = numpy.array([False,True,False]) a = numpy.array([1,2,3]) a[mask] == [2] Numpy has something called 'Structured arrays' which allow named columns, but these are better done through their wrappers in Pandas. We can also pull multiple values out of specific locations within arrays. For more on structured arrays, see: https://docs.scipy.org/doc/numpy-dev/user/basics.rec.html#structured-arrays

Shape changing To take the current values and force them into a different shape, use reshape, for example: a = numpy.arange(12).reshape(3,4) resize changes the underlying array. numpy.squeeze() removes empty dimensions arrayA.flat gives you all elements as an iterator arrayA.ravel() gives you the array flattened arrayA.T gives you array with rows and columns transposed (note not a function) There are a variety of functions for altering the shape of arrays. Note that reshape is how we would generate multi-dimensional arrays using arange.

Concatinating More generic is which allows you to say which axis. a = numpy.vstack((arrayA,arrayB)) Stack arrays vertically a = numpy.hstack((arrayA,arrayB)) Stack arrays horizontally column_stack stacks 1D arrays as columns. More generic is numpy.concatenate((a1, a2, ...), axis=0) which allows you to say which axis. To add arrays together, use the above functions.

Broadcasting The way data is filled or reused if arrays being used together are different shapes. For example, small arrays will usually be "stretched" - the data in them repeated. See: https://docs.scipy.org/doc/numpy-dev/user/basics.broadcasting.html http://scipy.github.io/old-wiki/pages/EricsBroadcastingDoc One of the most Pythonic aspects of ndarrays is that if you pass them into functions that expect arrays of a specific size (for example because it works with two arrays of the same size), arrays will resize temporarily so they work in an intuitive manner. The filling process is known as 'broadcasting'. By and large, it is better not to rely on broadcasting - understand your array sizes and make sure they are right. However, occasionally (for example when multiplying a number of different arrays by a single value array) it means useful shorthands can be used.

Data copies b = a.view() # New array, but referencing the old data This is what a slice returns. c = a.copy() # New array and data Arrays are generally filled with mutable values and respond to the appropriate pass-by-reference style rules we looked at in the core course. However, this isn't always the case, and you want to check out what functions return closely. Are they returning a copy of the original data, or the original data itself. Quite often, functions will return a "view" which is a new array, but containing references to the original data (this allows, for example, for resizing of arrays without affecting the original data). If you want a "deep" copy - that is, a copy where not only the array, but the data in it is copied, use array_name.copy().

Maths on arrays Maths done elementwise and generates a new array. a = arrayb + arrayc a = arrayb * arrayc a = arrayb.dot(arrayc) Matrix dot product (for matrix maths see also numpy.identity and numpy.eye) *= and += can be used to manipulate arrays in place: a += 3 arraya += arrayb You'll remember from the core course that operators like "+" actually call functions in one or other of the variables either side of them. These functions can be overridden to change the functionality of core operators. Here we see an example of how this, apparently confusing, idea comes to fruition. In numpy, the standard operators are overridden to work on ndarrays elementwise - that is, they run through the arrays and operate on each value separately. There are also standard functions for matrix mathematics. We're not going to go into matrix mathematics here, but there's a good introduction for those who need a refresher, here: https://www.mathsisfun.com/algebra/matrix-introduction.html If you want to do matrix maths, you may also want to know how to generate some of the standard matrices that are used in matrix maths: https://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.identity.html#numpy.identity https://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.eye.html#numpy.eye

Built in maths functions a.sum(); a.min(); a.max() a.sum(axis=0) # Array containing sum of columns a.sum(axis=1) # Array containing sum of rows b.cumsum(axis=1) # Array of the start size containing cumulative sums across rows. There are also elementwise functions like numpy.sqrt(arraya) numpy.sin(arraya) There are a large number of functions for data analysis in numpy (and even more in the associated scipy ecosystem). Here are some simple examples (we'll come to where you can find more shortly). Note that many will run over a whole ndarray, but can also be set to generate arrays of values per-row or per-column, depending on the axis set. Here we're assumed the first dimension is taken as rows and the second as columns. In some cases, functions work elementwise to generate new arrays of the same size.

Maths Basic Statistics cov, mean, std, var Basic Linear Algebra cross, dot, outer, linalg.svd, vdot Histogram: generates 1D arrays of counts and bins from array (counts, bins) = np.histogram(arrayIn, bins=50, normed=True) Full list of maths functions: https://docs.scipy.org/doc/numpy/reference/routines.html Here are some of the more useful basic functions.

Scipy functions Special functions (scipy.special) Integration (scipy.integrate) Optimization (scipy.optimize) Interpolation (scipy.interpolate) Fourier Transforms (scipy.fftpack) Signal Processing (scipy.signal) Linear Algebra (scipy.linalg) Sparse Eigenvalue Problems with ARPACK Compressed Sparse Graph Routines (scipy.sparse.csgraph) Spatial data structures and algorithms (scipy.spatial) Statistics (scipy.stats) Multidimensional image processing (scipy.ndimage) <-- Useful for kernel operations File IO (scipy.io) However, as mentioned, across the scipy ecosystem, there are some very powerful libraries. The scipy library, which is one library in the scipy ecosystem, contains a vast number of sub-packages for scientific analysis.