Download presentation
Presentation is loading. Please wait.
1
Transforming Data (Python®)
Computer Science and Software Engineering © 2014 Project Lead The Way, Inc.
2
Transforming Data: Why?
Presentation Name Course Name Unit # – Lesson #.# – Lesson Name Transforming Data: Why? Examples: Change representation ['$1', '$2', '$3', …] [1, 2, 3, …] Arithmetic to convert units or calculate a result [1, 2, 3, …] [1.05, 2.10, 3.15, …] What if you want to increase all your data by 5% to account for inflation? How would you do some arithmetic or other operation to each data value? In Python the for loop comes to mind. A column of formulas in Excel® sounds appropriate too. This presentation will show three other ways to transform a list of data using Python.
3
Presentation Name Course Name Unit # – Lesson #.# – Lesson Name Solution 1: for Loop old_data = [1, 2, 3, 4, 5] new_data = [ ] for element in old_data: new_data.append(element*1.05) In [ ]:new_data Out[ ]:[1.05, 2.10, 3.15, 4.20, 5.25] To do this in Python with a for loop, change the elements in place or use an aggregator to create a new list for the results, as shown here.
4
Solution 2: Array Operations
Presentation Name Course Name Unit # – Lesson #.# – Lesson Name Solution 2: Array Operations old_data = [1, 2, 3, 4, 5] data = numpy.array(old_data) new_data = 1.05 * data In [ ]:new_data Out[ ]:array([1.05, 2.10, 3.15, 4.20, 5.25]) The numpy library contains the "array" data type and defines addition and multiplication for arrays like this. We'll say more about arrays in numpy, and arrays in general, in a few slides.
5
Solution 3: map(function, list)
Presentation Name Course Name Unit # – Lesson #.# – Lesson Name Solution 3: map(function, list) map() applies a function to each element in list old_data = [1, 2, 3, 4, 5] def inflate(x): return x * 1.05 new_data = map(inflate, old_data) In [ ]:new_data Out[ ]:[1.05, 2.10, 3.15, 4.20, 5.25] The Python built-in function map() will do the trick here, too. Here we've defined a new function: inflate. We use map to apply inflate() to each element.
6
[ <expression> for <element> in <list> ]
Presentation Name Course Name Unit # – Lesson #.# – Lesson Name Solution 4: Python Generator Expression [ <expression> for <element> in <list> ] [f(x) for x in list] old_data = [1, 2, 3, 4, 5] new_data = [x*1.05 for x in old_data] In [ ]:new_data Out[ ]:[1.05, 2.10, 3.15, 4.20, 5.25] For loops and arrays are common across nearly all programming languages. The solution shown here, however, uses a syntax unique to Python. The "lazy generator expression" looks like a for loop inside of a list and has lazy properties that conserve memory, processing power, and storage access times – details not relevant to us here but one of the more powerful aspects of Python.
7
Transforming Data Examples x int(x) min(0,(int(x))
Presentation Name Course Name Unit # – Lesson #.# – Lesson Name Transforming Data Examples x int(x) min(0,(int(x)) How do these calculations connect to the picture that tells a thousand words, the histogram? A normal distribution of sample data is shown in the left plot. These thousand values are floats. Taking the int() function on each value in the data set transforms the data. The int() function sends all values toward 0. In the third plot, all negative values have been changed to 0.
8
A Bit More About Arrays in Python
Presentation Name Course Name Unit # – Lesson #.# – Lesson Name A Bit More About Arrays in Python list + list appends instead of transforms In [ ]: a = [1, 2, 3] In [ ]: b = [4, 5, 6] In [ ]: a + b Out[ ]: [1, 2, 3, 4, 5, 6] But numpy arrays allow + and * array + array In [ ]: np.array(a) + np.array(b) Out[ ]: np.array([5,7,9]) scalar + array and scalar * array In [ ]: 2*np.array(a) + 100 Out[ ]: np.array([102, 104, 106]) The earlier example with numpy arrays multiplied all elements by A number like 1.05 is called a scalar. If you multiply an array by a scalar, you get another array. Add an array to a scalar, and you still get an array. So the [1, 2, 3] in array a have been doubled and then increased by 100. You can also add two arrays together, item by item. The 5 is the sum of the first elements of a and b, 1 and 4. You can't add lists that way; Python will just concatenate them together.
9
A Lot More About Arrays – All Languages
Presentation Name Course Name Unit # – Lesson #.# – Lesson Name A Lot More About Arrays – All Languages Arrays are faster than lists An array has elements of one data type Binding table Additional memory Name Starting Address Address Increment Class Element Type foo 0x32F2 0x0008 array int The last two slides here are more advanced, but they might help you understand what is going on in the computer at a lower level. Arrays are different than lists. Arrays are faster than lists because they are stored differently. In an array, each element takes up the same number of bits in memory. The address of the 10th element can easily be calculated. The computer can access any element quickly with simple arithmetic to calculate the address of the element. Address 0x32F2 0x32FA 0x3302 Contents foo[0] foo[1] foo[2]
10
Lists – All Languages Lists store an address and type for each element
Presentation Name Course Name Unit # – Lesson #.# – Lesson Name Lists – All Languages Lists store an address and type for each element Binding table Additional memory Name Starting Address Class foo 0x32F2 list The references are each the same size, e.g., 64 bit (0x0040) Lists are more flexible since any data type can be stored in any element, but they're slower. Good data skills include thinking about how to write code that will work in reasonable time even when scaled up to terabyte data. Address 0x32F2 0x3332 0x3372 Contents:Starting Address Contents:Element Type 0x8C32 int 0xE333 float 0x9A12 tuple Address 0x8C32 Contents foo[0] Address 0xE333 Contents foo[1] Address 0x9A12 Contents foo[2]
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.