Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 10 Data Collections

Similar presentations


Presentation on theme: "Lecture 10 Data Collections"— Presentation transcript:

1 Lecture 10 Data Collections
Xiaojuan Cai(蔡小娟) Fall, 2015

2 Objective To be familiar with the functions for manipulating lists.
To be able to write programs that use lists to manage a collection of information To be able to write programs that use lists and classes to structure complex data. To understand the use of Python dictionaries for storing nonsequential collections.

3 Roadmap List basics List sort method Dictionary
Example: word frequency

4 Large data Many programs deal with large collections of similar information. Words in a document Students in a course Data from an experiment Customers of a business Graphics objects drawn on the screen Cards in a deck

5 Sample: median Let’s review average4.py in lecture 8.
It inputs a sequence of numbers and output the average. What if we want to know the median of a sequence of numbers. Median is the data value that splits the data into equal- sized parts. [2, 9, 4, 6, 11]: median is 6, the number in the middle after sorted [2, 9, 6, 4]: median is 5, the average of the two numbers in the middle after sorted.

6 Sample: median One way to determine the median is to store all the numbers, sort them, and identify the middle value. We need some way to remember these values as they are entered. Lists and arrays!

7 List Python lists are sequences of items.
Almost all computer languages have a sequence structure like this, sometimes called an array. Usually, array is homogeneous (store data with the same type) and fixed size, Python list is heterogeneous (store data with different types) and dynamic (can grow or shrink).

8 Basic list principles A list is a sequence of items stored as a single object. Items in a list can be accessed by indexing, and sublists can be accessed by slicing. Lists are mutable; individual items or entire slices can be replaced through assignment statements.

9 List operators (built-in)
Meaning <seq> + <seq> Concatenation <seq> * <int-expr> Repetition <seq>[] Indexing len(<seq>) Length <seq>[:] Slicing for <var> in <seq>: Iteration <expr> in <seq> Membership (Boolean)

10 List operations Method Meaning <list>.append(x)
Add element x to end of list. <list>.sort() Sort (order) the list. A comparison function may be passed as a parameter. <list>.reverse() Reverse the list. <list>.index(x) Returns index of first occurrence of x. <list>.insert(i, x) Insert x into list at index i. <list>.count(x) Returns the number of occurrences of x in list. <list>.remove(x) Deletes the first occurrence of x in list. <list>.pop(i) Deletes the ith element of the list and returns its value.

11 Roadmap List basics List sort method Dictionary
Example: word frequency

12 Sort the list mylist.sort(): return None
sorted(mylist): return sorted mylist sort(reverse = True) Lists and classes can express very complex data tyoes. What if we want to sort a list of objects, such as students? Sorting HOW TO:

13 Sort the list sort(key = func, reverse = True): new
sort(cmp = func, reverse = True) key is a function that given an element returns a value for cmp function. cmp is a function that given two elements return 1,-1,or 0 lambda expression cmp = lambda s,t: cmp(s.getScore(), t.getScore) key = lambda s: s.getScore()

14 Tuples Tuples are non-mutable lists (like strings), e.g., (1,’a’,3)
Indexing and slicing Empty tuple (), one-element tuple (1,). Note that it is not enough to use (1) to represent one-element tuple

15 Roadmap List basics List sort method Dictionary
Example: word frequency

16 Dictionary In sequences, we look up data by positions, or index.
Sometimes, we need more flexible way. For example, retrieve students based on ID number. The data (ID number) we used for indexing is called key, other parts (student info) are called value. A collection associated with arbitrary keys is called a mapping. Python dictionary is a mapping. Other languages have hashes or associative arrays for mapping.

17 Dictionary basics Syntax:
pwd = {“guido”: “supercomputer”, “turing”: “genius”, “bill”: “monopoly”} Keys should be non-mutable built-in data type, such as strings, numbers, tuples, etc. Values can be any data type. Indexing: mydic[key], e.g., pwd[“turing”]

18 Dictionary is mutable Change value Insert key-value pair
pwd[“bill”] = “bluescreen” Insert key-value pair pwd[“zelle”] = “goodwriter” Delete key-value pair del pwd[“turing”] Delete all entries (return an empty dictionary) pwd.clear()

19 Dictionary operator Method Meaning <dict>.has_key(<key>)
Returns true if dictionary contains the key, false otherwise. <key> in <dict> Same as has_key <dict>.keys() Returns a list of keys <dict>.values() Returns a list of values <dict>.items() Returns a list of tuples (key,value), representing the key-value pairs <dcit>.get(<key>,<default>) If dictionary has key returns its value, otherwise, returns default.

20 Example: word frequency
Input: a text document and a number n Output: the first n frequent words in the document Algorithm: use a dictionary to accumulate the frequency of each word. Get items of the dictionary Sort all the items Print the first n pairs

21 wordfreq.py counts[w] = counts.get(w,0)+1
if w is in the dictionary: add one to the count of w else: set count for w to 1 counts[w] = counts.get(w,0)+1 compareItems((w1,c1),(w2,c2)) counts.items().sort(compareItems)

22 Conclusion Python lists are very useful data structures.
We can use it as other important data structures such as Stack, Queue Python lists are dynamic (heterogeneous, can shrink and grow arbitrarily) Lists and classes can represent complex data. Dictionary is an important non-sequencing data structure. It is also called mapping, hashing, etc..


Download ppt "Lecture 10 Data Collections"

Similar presentations


Ads by Google