Scipy 'Ecosystem' containing a variety of scientific packages including iPython, numpy, matplotlib, and pandas. numpy is both a system for constructing.

Slides:

Advertisements

Similar presentations

Transformations We want to be able to make changes to the image larger/smaller rotate move This can be efficiently achieved through mathematical operations.

Advertisements

Matlab tutorial course Lesson 2: Arrays and data types

ARRAY REFERENCING 1 1. II. Array Referencing Assume an array has values. It is useful to “refer to” the elements contained within it – as smaller portions.

418512: Computer Programming Languages Lecture 7 Pramook Khungurn TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A AAAA.

Scientific Computing with NumPy & SciPy NumPy Installation and Documentation  Not much on the home page—don’t buy the guide, it’s.

Arrays 1 Multiple values per variable. Why arrays? Can you collect one value from the user? How about two? Twenty? Two hundred? How about… I need to collect.

Python Crash Course Numpy 3 rd year Bachelors V1.0 dd Hour 5.

INTRODUCTION TO MATLAB DAVID COOPER SUMMER Course Layout SundayMondayTuesdayWednesdayThursdayFridaySaturday 67 Intro 89 Scripts 1011 Work

CS100A, Fall 1998, Lecture 201 CS100A, Fall 1998 Lecture 20, Tuesday Nov 10 More Matlab Concepts: plotting (cont.) 2-D arrays Control structures: while,

Computer Graphics Mathematical Fundamentals Lecture 10 Taqdees A. Siddiqi

Python Scripting for Computational Science CPS 5401 Fall 2014 Shirley Moore, Instructor October 6,

PH2150 Scientific Computing Skills

INTRODUCTION TO STATISTICS

Linear Algebra Review.

Matrix. Matrix Matrix Matrix (plural matrices) . a collection of numbers Matrix (plural matrices)  a collection of numbers arranged in a rectangle.

Matrices Rules & Operations.

Numpy (Numerical Python)

Chapter 7 Matrix Mathematics

Module 5 Working with Data

Chapter 3 Arrays and Vectors

CMSC201 Computer Science I for Majors Lecture 12 – Lists (cont)

Matrix 2015/11/18 Hongfei Yan zip(*a) is matrix transposition

Matrix 2016/11/30 Hongfei Yan zip(*a) is matrix transposition

2-D Lists Taken from notes by Dr. Neil Moore

JavaScript: Functions.

Introduction to Summary Statistics

Introduction to Summary Statistics

Python NumPy AILab Batselem Jagvaral 2016 March.

INTRODUCTION TO BASIC MATLAB

C-Programming, continued

Arrays … The Sequel Applications and Extensions

CSE Social Media & Text Analytics

Numerical Computing in Python

Python for Scientific Computing

Introduction to Summary Statistics

OOP Paradigms There are four main aspects of Object-Orientated Programming Inheritance Polymorphism Abstraction Encapsulation We’ve seen Encapsulation.

Giving instructions on how to do something

Introduction to Summary Statistics

Bryan Burlingame 03 October 2018

مبانی برنامه‌سازی Fundamentals of Programming

Unit-2 Divide and Conquer

Matlab tutorial course

Properties of the Real Numbers Part I

Introduction to Summary Statistics

Coding Concepts (Basics)

Numpy (Numerical Python)

MATLAB Programming Indexing Copyright © Software Carpentry 2011

Numpy and Pandas Dr Andy Evans

Data Types and Data Structures

Topics Sequences Introduction to Lists List Slicing

Introduction to Summary Statistics

MATLAB Programming Basics Copyright © Software Carpentry 2011

Pandas Based on: Series: 1D labelled single-type arrays DataFrames: 2D labelled multi-type arrays Generally in 2D arrays, one can have the first dimension.

Data Intensive and Cloud Computing Matrices and Arrays Lecture 9

Announcements P3 due today

Simulation And Modeling

AS-Level Maths: Core 2 for Edexcel

Dr. Sampath Jayarathna Cal Poly Pomona

Topics Sequences Introduction to Lists List Slicing

Dr. Sampath Jayarathna Old Dominion University

Math review - scalars, vectors, and matrices

Introduction to Summary Statistics

Working with Arrays in MATLAB

2-D Lists Taken from notes by Dr. Neil Moore

Introduction to Computer Science

Presentation transcript:

Scipy 'Ecosystem' containing a variety of scientific packages including iPython, numpy, matplotlib, and pandas. numpy is both a system for constructing multi-dimensional data structures and a scientific library. http://www.numpy.org/ The overarching collection for the numpy and pandas packages is scipy, a grouping of packages that includes matplotlib and the IPython project. At the root of many of these packages is compatibility with numpy, which is a data analysis library, but, perhaps more importantly, provides a structure for the construction of multi-dimensional data arrays.

ndarray ndarray or numpy.array (alias) is the basic data format. a = numpy.array([2,3,4]) Make array with list or tuple (NOT numbers) a = numpy.fromfile(file, dtype=float, count=-1, sep='') dtype Allows the construction of multi-type arrays count Number of values to read (-1 == all) sep Separator To generate using a function that acts on each element of a shape: numpy.fromfunction(function, shape, **kwargs) The core data type is the ndarray, or its alias numpy.array. As we'll see, the key advantage of this data format is the ability to do multi-dimensional slices. Note the potential confusion between numpy.array and array.array in Python, if you import both. The above slide shows three ways of constructing an ndarray. The usual way is from lists or a file. For a dtype example, see: https://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.fromfile.html#numpy.fromfile For a function example, see: https://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.fromfunction.html#numpy.fromfunction

Built in functions a = numpy.zeros( (3,4) ) array([[ 0., 0., 0., 0.], [ 0., 0., 0., 0.], [ 0., 0., 0., 0.]]) Also numpy.ones and numpy.empty (generates very small floats) numpy.random.random((2,3)) More generic : ndarray.fill(value) numpy.putmask() Put values based on a Boolean mask array (True == replace) There are a variety of functions that produce standardised arrays. For example, numpy.zeros takes in the size of an array as a tuple or dimension sizes, and makes an array containing zeros of the right size. numpy.empty does the same thing but doesn't set the contents. In practice this means the array is full of very small floats. If you have an array and you want to fill it with a specific number, use .fill().

arange Like range but generates arrays: a = np.arange( 1, 10, 2 ) array([1, 3, 5, 7, 9]) Can use with floating point numbers, but precision issues mean better to use: a = np.linspace(start, end, numberOfNumbersBetween) Note that with linspace "end" is generated. For a range of numbers, use arrange(). This generates a sequence like the standard "range", but in a 1D ndarray. We'll see in a bit how to then convert that to a multi-dimension array. While this can be used with floating point numbers, because of precision issues, it is better to use linspace to construct a set number of floats falling within a definite interval.

ndarray Options set with numpy.set_printoptions, including ndarray.ndim Number of axes (dimensions) ndarray.shape Length of different dimensions ndarray.size Total data amount ndarray.dtype Data type in the array (standard or numpy) print(array) Will print the array nicely, but if too larger to print nicely will print with "…," across central points. Options set with numpy.set_printoptions, including numpy.set_printoptions(threshold = None) Each ndarray has a set of attributes automatically set up, as above, which can be accessed to determine information about it. Printing an array will 'pretty print' it, more specifically, if it is too large, the middle numbers will be replaced by "…". This can be turned off with the set_printoptions function, as shown. https://docs.scipy.org/doc/numpy/reference/generated/numpy.set_printoptions.html

Platform independent save Save/Load data in numpy .npy / .npz format numpy.save(file, arr, allow_pickle=True, fix_imports=True) arr = numpy.load(file, mmap_mode=None, allow_pickle=True, fix_imports=True, encoding='ASCII') Although one can write out such arrays with standard text methods, there is also a platform-independent format for quick and effective data storage for use with numpy and related packages specifically. Note that in the above "arr" is the array to save/load into. https://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.save.html#numpy.save https://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.load.html#numpy.load

Indexing Data locations are referenced using [row, col] (for 2D): arrayA[1,2] not arrayA[1][2] This means we can slice across multiple dimensions, one of the most useful aspects on numpy arrays: a = arrayA[1:3,:] array of the 2nd and 3rd row, all columns. b = arrayA[:, 1] array of all values in second column. You can also use … to represent "the rest": a[4,...,5,:] == a[4,:,:,5,:]. The most obvious difference between ndarrays and standard Python lists and tuples is that the indexing is with a list of multiple dimensions, not multiple lists of single dimensions. The second is that, using this format, slices can be done across multiple dimesnions. Numpy also allocates the optional (but unused in standard Python) symbol "…" to mean "all the rest". For example, a[4,…] means "all the rest in the 4th row of the 2D array "a". Note that you have to avoid ambiguities with this, so, instead of: a[4,:,:,5,:]. We can say: a[4,...,5,:] but not a[4,...,5,…] As it wouldn't be clear which dimension the "5" referred to, whereas in a[4,...,5,:] it is clearly the second to last.

Indexing Can use numpy arrays to pull out values: j = np.array( [ [ 3, 4], [ 9, 7 ] ] ) a[j] Can also use Boolean arrays, with "True" values indicating values we want: mask = numpy.array([False,True,False]) a = numpy.array([1,2,3]) a[mask] == [2] Numpy has something called 'Structured arrays' which allow named columns, but these are better done through their wrappers in Pandas. We can also pull multiple values out of specific locations within arrays. For more on structured arrays, see: https://docs.scipy.org/doc/numpy-dev/user/basics.rec.html#structured-arrays

Shape changing To take the current values and force them into a different shape, use reshape, for example: a = numpy.arange(12).reshape(3,4) resize changes the underlying array. numpy.squeeze() removes empty dimensions arrayA.flat gives you all elements as an iterator arrayA.ravel() gives you the array flattened arrayA.T gives you array with rows and columns transposed (note not a function) There are a variety of functions for altering the shape of arrays. Note that reshape is how we would generate multi-dimensional arrays using arange.

Concatinating More generic is which allows you to say which axis. a = numpy.vstack((arrayA,arrayB)) Stack arrays vertically a = numpy.hstack((arrayA,arrayB)) Stack arrays horizontally column_stack stacks 1D arrays as columns. More generic is numpy.concatenate((a1, a2, ...), axis=0) which allows you to say which axis. To add arrays together, use the above functions.

Broadcasting The way data is filled or reused if arrays being used together are different shapes. For example, small arrays will usually be "stretched" - the data in them repeated. See: https://docs.scipy.org/doc/numpy-dev/user/basics.broadcasting.html http://scipy.github.io/old-wiki/pages/EricsBroadcastingDoc One of the most Pythonic aspects of ndarrays is that if you pass them into functions that expect arrays of a specific size (for example because it works with two arrays of the same size), arrays will resize temporarily so they work in an intuitive manner. The filling process is known as 'broadcasting'. By and large, it is better not to rely on broadcasting - understand your array sizes and make sure they are right. However, occasionally (for example when multiplying a number of different arrays by a single value array) it means useful shorthands can be used.

Data copies b = a.view() # New array, but referencing the old data This is what a slice returns. c = a.copy() # New array and data Arrays are generally filled with mutable values and respond to the appropriate pass-by-reference style rules we looked at in the core course. However, this isn't always the case, and you want to check out what functions return closely. Are they returning a copy of the original data, or the original data itself. Quite often, functions will return a "view" which is a new array, but containing references to the original data (this allows, for example, for resizing of arrays without affecting the original data). If you want a "deep" copy - that is, a copy where not only the array, but the data in it is copied, use array_name.copy().

Maths on arrays Maths done elementwise and generates a new array. a = arrayb + arrayc a = arrayb * arrayc a = arrayb.dot(arrayc) Matrix dot product (for matrix maths see also numpy.identity and numpy.eye) *= and += can be used to manipulate arrays in place: a += 3 arraya += arrayb You'll remember from the core course that operators like "+" actually call functions in one or other of the variables either side of them. These functions can be overridden to change the functionality of core operators. Here we see an example of how this, apparently confusing, idea comes to fruition. In numpy, the standard operators are overridden to work on ndarrays elementwise - that is, they run through the arrays and operate on each value separately. There are also standard functions for matrix mathematics. We're not going to go into matrix mathematics here, but there's a good introduction for those who need a refresher, here: https://www.mathsisfun.com/algebra/matrix-introduction.html If you want to do matrix maths, you may also want to know how to generate some of the standard matrices that are used in matrix maths: https://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.identity.html#numpy.identity https://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.eye.html#numpy.eye

Built in maths functions a.sum(); a.min(); a.max() a.sum(axis=0) # Array containing sum of columns a.sum(axis=1) # Array containing sum of rows b.cumsum(axis=1) # Array of the start size containing cumulative sums across rows. There are also elementwise functions like numpy.sqrt(arraya) numpy.sin(arraya) There are a large number of functions for data analysis in numpy (and even more in the associated scipy ecosystem). Here are some simple examples (we'll come to where you can find more shortly). Note that many will run over a whole ndarray, but can also be set to generate arrays of values per-row or per-column, depending on the axis set. Here we're assumed the first dimension is taken as rows and the second as columns. In some cases, functions work elementwise to generate new arrays of the same size.

Maths Basic Statistics cov, mean, std, var Basic Linear Algebra cross, dot, outer, linalg.svd, vdot Histogram: generates 1D arrays of counts and bins from array (counts, bins) = np.histogram(arrayIn, bins=50, normed=True) Full list of maths functions: https://docs.scipy.org/doc/numpy/reference/routines.html Here are some of the more useful basic functions.

Scipy functions Special functions (scipy.special) Integration (scipy.integrate) Optimization (scipy.optimize) Interpolation (scipy.interpolate) Fourier Transforms (scipy.fftpack) Signal Processing (scipy.signal) Linear Algebra (scipy.linalg) Sparse Eigenvalue Problems with ARPACK Compressed Sparse Graph Routines (scipy.sparse.csgraph) Spatial data structures and algorithms (scipy.spatial) Statistics (scipy.stats) Multidimensional image processing (scipy.ndimage) <-- Useful for kernel operations File IO (scipy.io) However, as mentioned, across the scipy ecosystem, there are some very powerful libraries. The scipy library, which is one library in the scipy ecosystem, contains a vast number of sub-packages for scientific analysis.