Topic 5: Hashes CSE2395/CSE3395 Perl Programming Learning Perl 3rd edition chapter 5, pages Programming Perl 3rd edition pages 76-78, , , perldata manpage
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University 2 In this topic Hashes ► aka associative arrays Hash variables Functions which use hashes Uses of hashes Accessing Perl’s environment Hashes ► aka associative arrays Hash variables Functions which use hashes Uses of hashes Accessing Perl’s environment
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University 3 Arrays Arrays are ► ordered ► indexed by a number (integer) ► dense –if element n exists, so do elements 0 to n-1 Arrays are ► ordered ► indexed by a number (integer) ► dense –if element n exists, so do elements 0 to n-1 indices 42"dog"-0.2undef420
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University 4 Arrays Arrays aren’t always best data structure Imagine array of students’ marks ► indexed by 8-digit student ID number Arrays aren’t always best data structure Imagine array of students’ marks ► indexed by 8-digit student ID undef70 0 Ten million empty elements in here!
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University 5 Arrays Student ID numbers aren’t really numbers anyway ► can’t do arithmetic on them ► order of two student IDs not really important ► really just strings that happen to contain digits Want some data structure where indices are strings ► usually called associative arrays –or dictionary –or (lookup) table –or hash table Student ID numbers aren’t really numbers anyway ► can’t do arithmetic on them ► order of two student IDs not really important ► really just strings that happen to contain digits Want some data structure where indices are strings ► usually called associative arrays –or dictionary –or (lookup) table –or hash table
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University 6 Associative arrays Associative array is an array where ► can locate an array element’s value given index ► indices are strings ► indices are unique ► indices are unordered For example, to look up capital cities of countries Associative array is an array where ► can locate an array element’s value given index ► indices are strings ► indices are unique ► indices are unordered For example, to look up capital cities of countries PeruJapanUKRussiaCanadaEgypt LimaTokyoLondonMoscowOttawaCairo In Perl, associative arrays are called “hashes” (because they’re implemented using hash tables)
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University 7 Hashes in Perl Indices called keys ► strings ► must be unique ► e.g., country names Contents called values ► any scalar ► may be duplicated ► e.g., capital city names Can look up value given key, but not vice versa ► What’s the capital of Egypt? (easy) ► What country is Monrovia the capital of? (hard) Unordered ► You can’t sort a hash! ► Perl stores elements in an order optimized for fast lookup Indices called keys ► strings ► must be unique ► e.g., country names Contents called values ► any scalar ► may be duplicated ► e.g., capital city names Can look up value given key, but not vice versa ► What’s the capital of Egypt? (easy) ► What country is Monrovia the capital of? (hard) Unordered ► You can’t sort a hash! ► Perl stores elements in an order optimized for fast lookup Llama3 pages 73-74; Camel3 pages 51, 76-77; perldata manpage
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University 8 Hash elements Hash key written inside { curly braces } ► contrast with normal arrays using [ square brackets ] ► $capital{"Egypt"} # Equal to "Cairo" ► $capital{$nation} # Depends on $nation Can assign to a hash element ► overwrites the old value, if there was one –or creates a new element, if there wasn’t ► doesn’t change any other element ► $capital{"Australia"} = "Canberra"; Using nonexistent key returns undef ► $capital{"Atlantis"} # No such country Hash key written inside { curly braces } ► contrast with normal arrays using [ square brackets ] ► $capital{"Egypt"} # Equal to "Cairo" ► $capital{$nation} # Depends on $nation Can assign to a hash element ► overwrites the old value, if there was one –or creates a new element, if there wasn’t ► doesn’t change any other element ► $capital{"Australia"} = "Canberra"; Using nonexistent key returns undef ► $capital{"Atlantis"} # No such country Llama3 pages 76-78; Camel3 page 67
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University 9 Testing hash elements Can determine if hash key exists using exists function ► exists $capital{"Canada"} # True ► exists $capital{"Atlantis"} # False Not same as using defined ► key can exist, but value can be undefined ► exists $capital{"Vatican City"} # True ► defined $capital{"Vatican City"} # False Can determine if hash key exists using exists function ► exists $capital{"Canada"} # True ► exists $capital{"Atlantis"} # False Not same as using defined ► key can exist, but value can be undefined ► exists $capital{"Vatican City"} # True ► defined $capital{"Vatican City"} # False Llama3 page 83; Camel3 pages , ; perlfunc manpage
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University 10 Deleting hash elements To remove an entry from a hash, use delete function ► delete $capital{"Czechoslovakia"}; ► exists will now return false for that key To clear a hash, assign empty list to entire hash ► %capital = (); # World anarchy To remove an entry from a hash, use delete function ► delete $capital{"Czechoslovakia"}; ► exists will now return false for that key To clear a hash, assign empty list to entire hash ► %capital = (); # World anarchy Llama3 pages 76-77, 83-84; Camel3 pages ; perlfunc manpage
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University 11 Entire hashes To refer to an entire hash, use %hash ► % instead of $ ► no curly braces Can copy hashes ► %clone = %hash; Can initialize hash with many elements by assigning list to it ► for each element, write key followed by value ► order of key/value pairs not important ► %capital = ("Peru", "Lima", "Japan", "Tokyo", "UK", "London", "Russia", "Moscow", "Canada", "Ottawa", "Egypt", "Cairo"); Hashes flatten back into lists when used in list context ► e.g., when passed to a subroutine To refer to an entire hash, use %hash ► % instead of $ ► no curly braces Can copy hashes ► %clone = %hash; Can initialize hash with many elements by assigning list to it ► for each element, write key followed by value ► order of key/value pairs not important ► %capital = ("Peru", "Lima", "Japan", "Tokyo", "UK", "London", "Russia", "Moscow", "Canada", "Ottawa", "Egypt", "Cairo"); Hashes flatten back into lists when used in list context ► e.g., when passed to a subroutine Llama3 pages 78-79; Camel3 pages 76-78
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University 12 Hash elements Hashes, subroutines, arrays and scalars occupy different namespaces ► %x, $x{... } refer to hash %x $x[... ] refer to ► &x, x(... ) refer to subroutine &x ► $x refers to scalar $x Hash elements interpolate into double-quoted strings ► print "The capital of $nation is $capital{$nation}\n"; Entire hashes don’t interpolate at all. ► print "%capital"; # Prints "%capital" Hashes, subroutines, arrays and scalars occupy different namespaces ► %x, $x{... } refer to hash %x $x[... ] refer to ► &x, x(... ) refer to subroutine &x ► $x refers to scalar $x Hash elements interpolate into double-quoted strings ► print "The capital of $nation is $capital{$nation}\n"; Entire hashes don’t interpolate at all. ► print "%capital"; # Prints "%capital"
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University 13 Functions that use hashes How do you print out the contents of a hash? ► need to know what keys a hash has –from each key, can get value with $hash{key} keys function returns a list of all keys in a hash ► order is indeterminate, but same every time ► every key is unique –by definition of hash ► keys %capital # Returns list ("Canada", "UK", "Egypt", "Japan", "Peru", "Russia") (maybe) values function returns a list of all values in a hash ► order is same as from keys function ► values may be duplicated –values may be any scalar ► values %capital # Returns list ("Ottawa", "London", "Cairo", "Tokyo", "Lima", "Moscow") How do you print out the contents of a hash? ► need to know what keys a hash has –from each key, can get value with $hash{key} keys function returns a list of all keys in a hash ► order is indeterminate, but same every time ► every key is unique –by definition of hash ► keys %capital # Returns list ("Canada", "UK", "Egypt", "Japan", "Peru", "Russia") (maybe) values function returns a list of all values in a hash ► order is same as from keys function ► values may be duplicated –values may be any scalar ► values %capital # Returns list ("Ottawa", "London", "Cairo", "Tokyo", "Lima", "Moscow") Llama3 pages 80-81; Camel3 pages , 824; perlfunc manpage
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University 14 Timeout # Printing an entire hash using keys function. # Initialize the hash. # The => notation is just a pretty-looking # synonym for the, (comma) operator that also quotes # the the word on the left side. Great for hashes. %capital = (Peru => "Lima", Japan => "Tokyo", UK => "London", Russia => "Moscow", Canada => "Ottawa", Egypt => "Cairo"); # Iterate over the hash, once per nation. # Order is indeterminate. foreach $nation (keys %capital) { print "Capital of $nation is $capital{$nation}\n"; } # Printing an entire hash using keys function. # Initialize the hash. # The => notation is just a pretty-looking # synonym for the, (comma) operator that also quotes # the the word on the left side. Great for hashes. %capital = (Peru => "Lima", Japan => "Tokyo", UK => "London", Russia => "Moscow", Canada => "Ottawa", Egypt => "Cairo"); # Iterate over the hash, once per nation. # Order is indeterminate. foreach $nation (keys %capital) { print "Capital of $nation is $capital{$nation}\n"; }
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University 15 Timeout # Printing an entire hash, sorted by country. # Initialize the hash. %capital = (Peru => "Lima", Japan => "Tokyo", UK => "London", Russia => "Moscow", Canada => "Ottawa", Egypt => "Cairo"); # Iterate over the hash, once per nation. # Note that this isn't sorting the hash, # nor even iterating over the hash, but # iterating over a sorted list of the hash's keys. foreach $nation (sort keys %capital) { print "Capital of $nation is $capital{$nation}\n"; } # Printing an entire hash, sorted by country. # Initialize the hash. %capital = (Peru => "Lima", Japan => "Tokyo", UK => "London", Russia => "Moscow", Canada => "Ottawa", Egypt => "Cairo"); # Iterate over the hash, once per nation. # Note that this isn't sorting the hash, # nor even iterating over the hash, but # iterating over a sorted list of the hash's keys. foreach $nation (sort keys %capital) { print "Capital of $nation is $capital{$nation}\n"; }
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University 16 Functions that use hashes keys may return a very large list ► perhaps inefficient if you need only one hash element at a time each function iterates over a hash ► one element at a time ► on first call, returns a two-element list containing one key/value pair ► subsequent calls return other key/value pairs –order indeterminate, but guaranteed not to repeat any pairs ► when all key/value pairs have been returned once, returns empty list ► state is kept by Perl with hidden attribute on hash variable ► much more space-efficient than using keys ► typical use –while (($key, $value) = each %hash) {... } keys may return a very large list ► perhaps inefficient if you need only one hash element at a time each function iterates over a hash ► one element at a time ► on first call, returns a two-element list containing one key/value pair ► subsequent calls return other key/value pairs –order indeterminate, but guaranteed not to repeat any pairs ► when all key/value pairs have been returned once, returns empty list ► state is kept by Perl with hidden attribute on hash variable ► much more space-efficient than using keys ► typical use –while (($key, $value) = each %hash) {... } Llama3 pages 81-82; Camel3 pages ; perlfunc manpage
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University 17 Timeout # Printing an entire hash, using each function. # Initialize the hash. %capital = (Peru => "Lima", Japan => "Tokyo", UK => "London", Russia => "Moscow", Canada => "Ottawa", Egypt => "Cairo"); # Iterate over the hash, once per nation. # No provision for sorting the output here, # because order returned by each function # is indeterminate. while (($nation, $city) = each %capital) { print "Capital of $nation is $city\n"; } # Printing an entire hash, using each function. # Initialize the hash. %capital = (Peru => "Lima", Japan => "Tokyo", UK => "London", Russia => "Moscow", Canada => "Ottawa", Egypt => "Cairo"); # Iterate over the hash, once per nation. # No provision for sorting the output here, # because order returned by each function # is indeterminate. while (($nation, $city) = each %capital) { print "Capital of $nation is $city\n"; }
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University 18 Uses of hashes Hashes useful for ► implementing sparse arrays ► implementing lookup tables/databases ► counting strings ► removing duplicates from a list ► passing named parameters to subroutines Hashes useful for ► implementing sparse arrays ► implementing lookup tables/databases ► counting strings ► removing duplicates from a list ► passing named parameters to subroutines Llama3 pages 75-76
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University 19 Hashes: sparse arrays Normal arrays are dense ► creating $a[10000] too. Hash keys are independent ► creating $h{"10000"} creates no other other elements –only elements that exist need to take up memory ► just have to pretend that keys (really strings) are integers –like student ID numbers ► may have to write some code to fake “order” of elements –foreach $element (sort {$a $b} keys %h) Normal arrays are dense ► creating $a[10000] too. Hash keys are independent ► creating $h{"10000"} creates no other other elements –only elements that exist need to take up memory ► just have to pretend that keys (really strings) are integers –like student ID numbers ► may have to write some code to fake “order” of elements –foreach $element (sort {$a $b} keys %h)
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University 20 Hashes: lookup table Using hash, can look up string (value) given string (key) ► look up the capital of a country –capital of Malaysia is Kuala Lumpur ► look up a word in a dictionary –definition of dog is “domestic canine” ► look up the IP address of machine –slashdot.org’s IP address is ► look up the value of a variable in an interpreter –value of variable x is 5 ► look up the title of a book –book with ISBN is “Programming Perl” ► look up the real name of a student –student is Bart Simpson Any relationship with a one-to-many relationship is perfect for a hash Using hash, can look up string (value) given string (key) ► look up the capital of a country –capital of Malaysia is Kuala Lumpur ► look up a word in a dictionary –definition of dog is “domestic canine” ► look up the IP address of machine –slashdot.org’s IP address is ► look up the value of a variable in an interpreter –value of variable x is 5 ► look up the title of a book –book with ISBN is “Programming Perl” ► look up the real name of a student –student is Bart Simpson Any relationship with a one-to-many relationship is perfect for a hash
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University 21 Timeout # Using the program's environment # All processes have a set of names and values which # they inherit from their parents. These can be # set in the shell by typing NAME=VALUE. print "Your home directory is $ENV{'HOME'}\n"; if ($ENV{'SHELL'} eq "/bin/csh") { # Commiserate with user. print "Your shell is csh. Yuck!"; } print "Commands are looked for in these dirs:\n"; print " $_\n" foreach (split /:/, $ENV{'PATH'}) # split: Topic 7 # Using the program's environment # All processes have a set of names and values which # they inherit from their parents. These can be # set in the shell by typing NAME=VALUE. print "Your home directory is $ENV{'HOME'}\n"; if ($ENV{'SHELL'} eq "/bin/csh") { # Commiserate with user. print "Your shell is csh. Yuck!"; } print "Commands are looked for in these dirs:\n"; print " $_\n" foreach (split /:/, $ENV{'PATH'}) # split: Topic 7
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University 22 Hashes: counting strings Use hash to count frequency of strings ► key is the string (“dog”) ► value (integer) is the count (has been seen 3 times so far) ► increment the value every time a key is read Can be used to find intersection (common elements) between two arrays ► iterate over first array: count elements found ► iterate over second array: include element in result only if it was seen in the first array ► can compute union and difference similarly Use hash to count frequency of strings ► key is the string (“dog”) ► value (integer) is the count (has been seen 3 times so far) ► increment the value every time a key is read Can be used to find intersection (common elements) between two arrays ► iterate over first array: count elements found ► iterate over second array: include element in result only if it was seen in the first array ► can compute union and difference similarly
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University 23 Timeout # Counting strings. %seen = (); # Nothing has been seen so far. while (<>) # Read words from input. { chomp; # Increment the counter with line's text as key. $seen{$_}++; print "$_ has been seen $seen{$_} times so far\n"; } # Final report. while (($line, $count) = each %seen) { print "$line was seen $count times overall\n"; } # Counting strings. %seen = (); # Nothing has been seen so far. while (<>) # Read words from input. { chomp; # Increment the counter with line's text as key. $seen{$_}++; print "$_ has been seen $seen{$_} times so far\n"; } # Final report. while (($line, $count) = each %seen) { print "$line was seen $count times overall\n"; }
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University 24 Timeout # Intersection of two arrays. %seen = = (); foreach # Iterate through first array. { # Remember which elements have been seen. $seen{$_} = 1; # Any true value will do. } foreach # Now iterate through second array. { # Only add to result if was seen $_ if $seen{$_}; } # Intersection of two arrays. %seen = = (); foreach # Iterate through first array. { # Remember which elements have been seen. $seen{$_} = 1; # Any true value will do. } foreach # Now iterate through second array. { # Only add to result if was seen $_ if $seen{$_}; }
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University 25 Hashes: removing duplicates An extension of counting elements in a list ► if this is the first time element seen, include in result ► otherwise, skip this element An extension of counting elements in a list ► if this is the first time element seen, include in result ► otherwise, skip this element
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University 26 Timeout # Simple implementation of Unix sort and sort -u # Was -u (unique) switch given? if ($ARGV[0] eq "-u") { $unique = 1; shift; # Remove -u argument. } # Read all input lines and sort = sort <>; if ($unique) { # Filter out anything already = grep { !$seen{$_}++ } # Output remaining lines. # Simple implementation of Unix sort and sort -u # Was -u (unique) switch given? if ($ARGV[0] eq "-u") { $unique = 1; shift; # Remove -u argument. } # Read all input lines and sort = sort <>; if ($unique) { # Filter out anything already = grep { !$seen{$_}++ } # Output remaining lines.
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University 27 Hashes: named parameters Calling subroutines with many parameters is messy ► printformatted(56, "$", 8, 2, "decimal"); –what did the 8 mean again? ► especially when some parameters are optional and have a reasonable default anyway Can use hash to identify optional parameters and give them values ► printformatted(56, prefix => '$', format => "decimal", precision => 8, places => 2); –self-documenting code –order of parameters no longer matters ► printformatted(56, format => "hex"); –only need to name the parameters with non-default values ► subroutines require a little code to handle this Calling subroutines with many parameters is messy ► printformatted(56, "$", 8, 2, "decimal"); –what did the 8 mean again? ► especially when some parameters are optional and have a reasonable default anyway Can use hash to identify optional parameters and give them values ► printformatted(56, prefix => '$', format => "decimal", precision => 8, places => 2); –self-documenting code –order of parameters no longer matters ► printformatted(56, format => "hex"); –only need to name the parameters with non-default values ► subroutines require a little code to handle this
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University 28 Timeout # Map formats to printf percent-things. %format = (decimal => "d", hex => "x", octal => "o"); # Print a number with a certain format. sub printformatted { my $number = shift; # Value to print. my %param = ( format => "decimal", # Defaults. precision => # Rest of sub params. ); printf( # Build up printf format string. ($param{"prefix"}. "%". $param{"precision"}. ".". $param{"places"}. $format{$param{"format"}}), $number); } # Map formats to printf percent-things. %format = (decimal => "d", hex => "x", octal => "o"); # Print a number with a certain format. sub printformatted { my $number = shift; # Value to print. my %param = ( format => "decimal", # Defaults. precision => # Rest of sub params. ); printf( # Build up printf format string. ($param{"prefix"}. "%". $param{"precision"}. ".". $param{"places"}. $format{$param{"format"}}), $number); }
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University 29 Covered in this topic Hashes Hash variables ► $hash{key}, %hash Functions which use hashes ► keys, values ► each Uses of hashes ► data lookup ► sparse arrays ► counting elements in a list ► removing duplicates from a list ► accessing a process’ environment ► subroutines with optional parameters Hashes Hash variables ► $hash{key}, %hash Functions which use hashes ► keys, values ► each Uses of hashes ► data lookup ► sparse arrays ► counting elements in a list ► removing duplicates from a list ► accessing a process’ environment ► subroutines with optional parameters
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University 30 Going further Tying ► treat an external file (or any other object) like an internal hash (or any other type) ► Camel3 pages Databases ► talking to databases with Perl ► Programming the Perl DBI by Alligator Descartes and Tim Bunce, O’Reilly 2000 Shells ► the Unix command-line interface ► man sh Tying ► treat an external file (or any other object) like an internal hash (or any other type) ► Camel3 pages Databases ► talking to databases with Perl ► Programming the Perl DBI by Alligator Descartes and Tim Bunce, O’Reilly 2000 Shells ► the Unix command-line interface ► man sh
Original Slides by Debbie Pickett, Modified by David Abramson, 2006, Copyright Monash University 31 Next topic Regular expressions ► pattern matching Regular expressions ► pattern matching Llama3 chapters 7-9, pages Camel3 pages perlre manpage