Download presentation
Presentation is loading. Please wait.
Published byMagdalen Kennedy Modified over 9 years ago
1
Storage Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See http://software-carpentry.org/license.html for more information. Sets and Dictionaries
2
Introduction Let's try an experiment
3
Sets and DictionariesIntroduction >>> things = set() >>> things.add('a string') >>> print things set(['a string']) Let's try an experiment
4
Sets and DictionariesIntroduction >>> things = set() >>> things.add('a string') >>> print things set(['a string']) >>> things.add([1, 2, 3]) TypeError: unhashable type: 'list' Let's try an experiment
5
Sets and DictionariesIntroduction >>> things = set() >>> things.add('a string') >>> print things set(['a string']) >>> things.add([1, 2, 3]) TypeError: unhashable type: 'list' Let's try an experiment What's wrong?
6
Sets and DictionariesIntroduction >>> things = set() >>> things.add('a string') >>> print things set(['a string']) >>> things.add([1, 2, 3]) TypeError: unhashable type: 'list' Let's try an experiment What's wrong? And what does the error message mean?
7
Sets and DictionariesIntroduction How are sets stored in a computer's memory?
8
Sets and DictionariesIntroduction How are sets stored in a computer's memory? Could use a list
9
Sets and DictionariesIntroduction def set_create(): return [] How are sets stored in a computer's memory? Could use a list
10
Sets and DictionariesIntroduction def set_create(): return [] def set_in(set_list, item): for thing in set_list: if thing == item: return True return False How are sets stored in a computer's memory? Could use a list
11
Sets and DictionariesIntroduction def set_add(set_list, item): for thing in set_list: if thing == item: return set.append(item)
12
Sets and DictionariesIntroduction def set_add(set_list, item): for thing in set_list: if thing == item: return set.append(item) How efficient is this?
13
Sets and DictionariesIntroduction def set_add(set_list, item): for thing in set_list: if thing == item: return set.append(item) How efficient is this? With N items in the set, in and add take 1 to N steps
14
Sets and DictionariesIntroduction def set_add(set_list, item): for thing in set_list: if thing == item: return set.append(item) How efficient is this? With N items in the set, in and add take 1 to N steps "Average" is N/2
15
Sets and DictionariesIntroduction def set_add(set_list, item): for thing in set_list: if thing == item: return set.append(item) How efficient is this? With N items in the set, in and add take 1 to N steps "Average" is N/2 It's possible to do much better
16
Sets and DictionariesIntroduction def set_add(set_list, item): for thing in set_list: if thing == item: return set.append(item) How efficient is this? With N items in the set, in and add take 1 to N steps "Average" is N/2 It's possible to do much better But the solution puts some constraints on programs
17
Sets and DictionariesIntroduction Start simple: how do we store a set of integers?
18
Sets and DictionariesIntroduction Start simple: how do we store a set of integers? If the range of possible values is small and fixed, use a list of Boolean flags ("present" or "absent")
19
Sets and DictionariesIntroduction Start simple: how do we store a set of integers? If the range of possible values is small and fixed, use a list of Boolean flags ("present" or "absent") True Fals e True 0 1 2 == {0, 2}
20
Sets and DictionariesIntroduction Start simple: how do we store a set of integers? If the range of possible values is small and fixed, use a list of Boolean flags ("present" or "absent") But what if the range of values is large, or can change over time? True Fals e True 0 1 2 == {0, 2}
21
Sets and DictionariesIntroduction Use a fixed-size hash table of length L
22
Sets and DictionariesIntroduction Use a fixed-size hash table of length L Store the integer I at location I % L
23
Sets and DictionariesIntroduction Use a fixed-size hash table of length L Store the integer I at location I % L '%' is the remainder operator
24
Sets and DictionariesIntroduction Use a fixed-size hash table of length L Store the integer I at location I % L '%' is the remainder operator 0 1 2 3 4 1625 101 3378 == {3378, 1625, 101}
25
Sets and DictionariesIntroduction Time to insert or look up is constant (!)
26
Sets and DictionariesIntroduction Time to insert or look up is constant (!) But what do we do when there's a collision?
27
Sets and DictionariesIntroduction Time to insert or look up is constant(!) But what do we do when there's a collision? 0 1 2 3 4 1625 101 3378 + 206
28
Sets and DictionariesIntroduction Option #1: store it in the next empty slot 0 1 2 3 4 1625 101 3378 206
29
Sets and DictionariesIntroduction Option #2: chain values together 0 1 2 3 4 1625 101 3378 206
30
Sets and DictionariesIntroduction Either works well until the table is about 3/4 full
31
Sets and DictionariesIntroduction Either works well until the table is about 3/4 full Then average time to look up/insert rises rapidly
32
Sets and DictionariesIntroduction Either works well until the table is about 3/4 full Then average time to look up/insert rises rapidly So enlarge the table
33
Sets and DictionariesIntroduction 0 1 2 3 4 1625 101 3378 Either works well until the table is about 3/4 full Then average time to look up/insert rises rapidly So enlarge the table
34
Sets and DictionariesIntroduction 0 1 2 3 4 1625 101 3378 + 206 Either works well until the table is about 3/4 full Then average time to look up/insert rises rapidly So enlarge the table
35
Sets and DictionariesIntroduction 0 1 2 3 4 1625 101 3378 + 206 0 1 2 3 4 1625 101 3378 5 6 7 8 => Either works well until the table is about 3/4 full Then average time to look up/insert rises rapidly So enlarge the table
36
Sets and DictionariesIntroduction 0 1 2 3 4 1625 101 3378 + 206 0 1 2 3 4 1625 101 3378 5 6 7 8 => 206 Either works well until the table is about 3/4 full Then average time to look up/insert rises rapidly So enlarge the table
37
Sets and DictionariesIntroduction How do we store strings?
38
Sets and DictionariesIntroduction How do we store strings? Use a hash function to generate an integer index based on the characters in the string
39
Sets and DictionariesIntroduction "zebra "
40
Sets and DictionariesIntroduction "zebra " z e b r a ==
41
Sets and DictionariesIntroduction "zebra " z e b r a == 122 101 98 114 97 ==
42
Sets and DictionariesIntroduction "zebra " z e b r a == 122 101 98 114 97 == 532
43
Sets and DictionariesIntroduction 0 1 2 3 4 "zebra " z e b r a == 122 101 98 114 97 == 532% 5
44
Sets and DictionariesIntroduction 0 1 2 3 4 "zebra " z e b r a == 122 101 98 114 97 == 532% 5 If we can define a hash function for something, we can store it in a set
45
Sets and DictionariesIntroduction 0 1 2 3 4 "zebra " z e b r a == 122 101 98 114 97 == 532% 5 If we can define a hash function for something, we can store it in a set So long as nothing changes behind our back
46
Sets and DictionariesIntroduction 0 1 2 3 4 z e b r a This is what the previous example really looks like in memory
47
Sets and DictionariesIntroduction 0 1 2 3 4 z e b r a This is what the previous example really looks like in memory Let's take a look at what happens if we use a list
48
Sets and DictionariesIntroduction ['z','e','b','r','a ']
49
Sets and DictionariesIntroduction ['z','e','b','r','a '] 0 1 2 3 4 'z' 'e' 'r' == 'b' 'a'
50
Sets and DictionariesIntroduction 0 1 2 3 4 ['z','e','b','r','a '] 0 1 2 3 4 'z' 'e' 'r' == 'b' 'a' 523 % 5
51
Sets and DictionariesIntroduction 0 1 2 3 4 ['z','e','b','r','a '] 0 1 2 3 4 'z' 'e' 'r' == 'b' 'a' 523 % 5
52
Sets and DictionariesIntroduction 0 1 2 3 4 0 1 2 3 4 'z' 'e' 'r' 'b' 'a' This is what's actually in memory
53
Sets and DictionariesIntroduction 0 1 2 3 4 0 1 2 3 4 'z' 'e' 'r' 'b' 'a' This is what's actually in memory What happens if we change the values in the list?
54
Sets and DictionariesIntroduction 0 1 2 3 4 0 1 2 3 4 'z' 'e' 'r' 'b' 'a'
55
Sets and DictionariesIntroduction 0 1 2 3 4 0 1 2 3 4 's' 'e' 'r' 'b' 'a'
56
Sets and DictionariesIntroduction 0 1 2 3 4 0 1 2 3 4 's' 'e' 'r' 'b' 'a' 523 % 5
57
Sets and DictionariesIntroduction 0 1 2 3 4 0 1 2 3 4 's' 'e' 'r' 'b' 'a' 523 % 5 The list is stored in the wrong place!
58
Sets and DictionariesIntroduction 0 1 2 3 4 0 1 2 3 4 's' 'e' 'r' 'b' 'a' 523 % 5 The list is stored in the wrong place! ['s','e','b','r','a'] in S
59
Sets and DictionariesIntroduction 0 1 2 3 4 0 1 2 3 4 's' 'e' 'r' 'b' 'a' 523 % 5 The list is stored in the wrong place! ['s','e','b','r','a'] in S looks at index 0 and says False
60
Sets and DictionariesIntroduction 0 1 2 3 4 0 1 2 3 4 's' 'e' 'r' 'b' 'a' 523 % 5 The list is stored in the wrong place! ['s','e','b','r','a'] in S looks at index 0 and says False ['z','e','b','r','a'] in S
61
Sets and DictionariesIntroduction 0 1 2 3 4 0 1 2 3 4 's' 'e' 'r' 'b' 'a' 523 % 5 The list is stored in the wrong place! ['s','e','b','r','a'] in S looks at index 0 and says False ['z','e','b','r','a'] in S looks at index 2 and says True
62
Sets and DictionariesIntroduction 0 1 2 3 4 0 1 2 3 4 's' 'e' 'r' 'b' 'a' 523 % 5 The list is stored in the wrong place! ['s','e','b','r','a'] in S looks at index 0 and says False ['z','e','b','r','a'] in S looks at index 2 and says True (or blows up)
63
Sets and DictionariesIntroduction This problem arises with any mutable structure
64
Sets and DictionariesIntroduction This problem arises with any mutable structure Option #1: keep track of the sets an object is in, and update pointers every time the object changes
65
Sets and DictionariesIntroduction This problem arises with any mutable structure Option #1: keep track of the sets an object is in, and update pointers every time the object changes Very expensive
66
Sets and DictionariesIntroduction This problem arises with any mutable structure Option #1: keep track of the sets an object is in, and update pointers every time the object changes Very expensive Option #2: allow it, and blame the programmer
67
Sets and DictionariesIntroduction This problem arises with any mutable structure Option #1: keep track of the sets an object is in, and update pointers every time the object changes Very expensive Option #2: allow it, and blame the programmer Very expensive when it goes wrong
68
Sets and DictionariesIntroduction This problem arises with any mutable structure Option #1: keep track of the sets an object is in, and update pointers every time the object changes Very expensive Option #2: allow it, and blame the programmer Very expensive when it goes wrong Option #3: only permit immutable objects in sets
69
Sets and DictionariesIntroduction This problem arises with any mutable structure Option #1: keep track of the sets an object is in, and update pointers every time the object changes Very expensive Option #2: allow it, and blame the programmer Very expensive when it goes wrong Option #3: only permit immutable objects in sets (If an object can't change, neither can its hash value)
70
Sets and DictionariesIntroduction This problem arises with any mutable structure Option #1: keep track of the sets an object is in, and update pointers every time the object changes Very expensive Option #2: allow it, and blame the programmer Very expensive when it goes wrong Option #3: only permit immutable objects in sets (If an object can't change, neither can its hash value) Slightly restrictive, but never disastrous
71
Sets and DictionariesIntroduction So how do we store values that naturally have several parts, like first name and last name?
72
Sets and DictionariesIntroduction So how do we store values that naturally have several parts, like first name and last name? Option #1: concatenate them
73
Sets and DictionariesIntroduction So how do we store values that naturally have several parts, like first name and last name? Option #1: concatenate them 'Charles' and 'Darwin' stored as 'Charles|Darwin'
74
Sets and DictionariesIntroduction So how do we store values that naturally have several parts, like first name and last name? Option #1: concatenate them 'Charles' and 'Darwin' stored as 'Charles|Darwin' (Can't use space to join 'Paul Antoine' and 'St. Cyr')
75
Sets and DictionariesIntroduction So how do we store values that naturally have several parts, like first name and last name? Option #1: concatenate them 'Charles' and 'Darwin' stored as 'Charles|Darwin' (Can't use space to join 'Paul Antoine' and 'St. Cyr') But data always changes...
76
Sets and DictionariesIntroduction So how do we store values that naturally have several parts, like first name and last name? Option #1: concatenate them 'Charles' and 'Darwin' stored as 'Charles|Darwin' (Can't use space to join 'Paul Antoine' and 'St. Cyr') But data always changes... Code has to be littered with joins and splits
77
Sets and DictionariesIntroduction Option #2 (in Python): use a tuple
78
Sets and DictionariesIntroduction Option #2 (in Python): use a tuple An immutable list
79
Sets and DictionariesIntroduction Option #2 (in Python): use a tuple An immutable list Contents cannot be changed after tuple is created
80
Sets and DictionariesIntroduction >>> full_name = ('Charles', 'Darwin')
81
Sets and DictionariesIntroduction >>> full_name = ('Charles', 'Darwin') Use '()' instead of '[]'
82
Sets and DictionariesIntroduction >>> full_name = ('Charles', 'Darwin') >>> full_name[0] Charles
83
Sets and DictionariesIntroduction >>> full_name = ('Charles', 'Darwin') >>> full_name[0] Charles >>> full_name[0] = 'Erasmus' TypeError: 'tuple' object does not support item assignment
84
Sets and DictionariesIntroduction >>> full_name = ('Charles', 'Darwin) >>> full_name[0] Charles >>> full_name[0] = 'Erasmus' TypeError: 'tuple' object does not support item assignment >>> names = set() >>> names.add(full_name) >>> names set([('Charles', 'Darwin')])
85
Sets and DictionariesIntroduction This episode has been about the science of computer science
86
Sets and DictionariesIntroduction This episode has been about the science of computer science - Designs for hash tables
87
Sets and DictionariesIntroduction This episode has been about the science of computer science - Designs for hash tables - Mutability, usability, and performance
88
Sets and DictionariesIntroduction This episode has been about the science of computer science - Designs for hash tables - Mutability, usability, and performance It's a lot to digest in one go...
89
Sets and DictionariesIntroduction This episode has been about the science of computer science - Designs for hash tables - Mutability, usability, and performance It's a lot to digest in one go......but sometimes you need a little theory to make sense of practice
90
July 2010 created by Greg Wilson Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See http://software-carpentry.org/license.html for more information.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.