COMP234-Perl Variables, Literals Context, Operators Command Line Input Regex Program template.

1 COMP234-Perl Variables, Literals Context, Operators Command Line Input Regex Program template

2 Variables

3 Variable Names The name of a variable consists of the character $ (or @ or %) followed by at least one letter or underscore, which is followed by any number of letters, digits, or underscore characters. The following are examples of legal variable names: –$x –$_x –$var –$my_variable –$var2 –$a_new_variable

4 Variable Names These, however, are not legal variable names: –variable # the $ character is missing –$ # there must be at least one letter –$47x # second character must be a letter –$variable! # you can't have a ! in a variable name –$new.var # you can't have a. in a variable name

5 Variable Names Variable names should be meaningful –$LineCount – is better than $Count Meaningful names make programs more understandable Therefore easier to debug

6 Good Variable Names Variables answer a question about a thing Names should tell you the what the thing is and what the question is $count –Bad, what is being counted? $RecordCount –Better, but what records? $LogMessageCount –Better still, relates to actual contents of record

7 Variable Names Perl variables are case-sensitive. This means that the following variables are different: –$VAR –$var –$Var Initial caps is a useful convention for long names –$MultiWordName Some prefer Underlines –$Multi_word_name

8 Declaring Variables Perl variables don't have to be pre-declared But if you do, perl can warn you when you spell a name wrong my $variablename Declares $variablename and makes it a private variable This line, put at the start of your program, turns on warnings, and is a good idea. use warnings; You should use this in future programs

9 Literals

10 Types Literals come in types, like variables, but more types Scalars –Numbers –Strings –Boolean and special values Lists (= arrays) Hashes

11 Numbers Stored internally as double precision floating point, or signed integers 12345 # integer 12345.67 # floating point 6.02E23 # scientific notation 0xffff # hexadecimal 0377 # octal 4_294_967_296 # underline for legibility

12 Strings, Single Quote String literals are normally, but not always quoted Single quotes are almost absolute –Except for /' which is interpreted as ' –And // which is interpreted as / 'hello' = hello 'don\'t' = don't '' = the null string (no characters) 'silly\\me' = silly\me

13 Strings, Double Quote Variables and escape sequences are interpolated $Text = 'TEXT' “Some $Text” = SomeTEXT “Some\$Text” = Some$Text “Some\n” = Some, plus newline “Some\”” = Some” “Some\\” = Some\

14 Strings, No Quotes Where it isn't ambiguous, quotes can be omitted –when the string can't be interpreted as something else @days = (Mon,Tue,Wed,Thu,Fri); print STDOUT hello, ' ', world, "\n"; These unquoted strings are called barewords, and are considered a bad thing use warnings; warns about use use strict subs; prohibits barewords use qw( ) makes a list out of barewords qw(foo bar baz) = (“foo”,”bar”,”baz”)

15 Context Behavior of perl operators and values varies according to context Contexts are something like value types: –List Context –Scalar context Numeric String Boolean Void Interpolative

16 Determining Context Often determine by left side of an assignment $Var = qw(a b c) –The scalar variable establishes scalar context so this is the same as $Var = 'c' Evaluating an array in scalar context provides its length $length = @array # gets length of # @array Some operations like print establish a list context

17 Numeric, String, Boolean These scalar sub-contexts sometimes determined by operators or functions Numeric operators == != = Numeric functions like abs() establish numeric context String Operators –eq ne le ge Boolean operators –and && or || not ! xor

18 Precedence of Boolean Operators “and” “or” and “not” have very low precedence compared to && || and ! So this works: open HANDLE, “file” or die “msg” open (HANDLE, “file”) or die “msg” But this doesn't: open HANDLE, “file” || die “msg” open HANDLE, (“file” || die “msg”)

19 Precedence of Boolean Operators On the other hand, this doesn't work $x = $y or $z; means ($x = $y) or $z; But this does: $x = $y || $z; means $x = ($y || $z); Better just to insert your own parenthesis than to trust to precedence

20 Command Line Input

21 Many programs run from the command line take command line parameters –Perldoc -f print –“print” is a parameter passed to the perldoc program on the command line Perl programs can also be passed parameters on the command line perl 1 2 Total is 3

22 Using Command Line Input Command line parameters are passed to the perl program as an array called @ARGV $sum = $ARGV[0] + $ARGV[1]; print "Total is $sum\n" ; cmblap:~/samples # perl 100 5 Total is 105 cmblap:~/samples #

23 How Many Parameters Sometimes a program wants to know how many parameters are passed This is equivalent to knowing the size of the @ARGV array $size = @ARGV gets the size by evaluating @ARGV in a scalar context scalar (@ARGV) does the same $#ARGV is similar –Returns index of last item scalar(@ARGV) = $#ARGV + 1

24 Regular Expressions

25 Widely used in unix world Very high powered “wild cards” Used to describe complex text patterns And then do things with text patterns – Is this pattern in that text? – Extract the text that matches this pattern – Change the text that matches this pattern

26 Very Basic Regex Any string of characters can be a regular expression that “matches” itself To indicate a string is a regex enclose it in slashes – /abc/ – the string “abc” matches the regex /abc/ – So does the string “abcdefg”

27 Regex operators The eq operator tests a whole string for equality with another ( == for numeric context) The =~ operator tests a string and a regex $a =~ /regex/ – tests to see if the string matches the regex Can leave out the operator – if ( /abc/ ) is a short form for – if ( $_ =~ /abc/ )

28 Multipliers /a*/ means 0 or more a's /a+/ means one or more /a?/ means 0 or 1 Notice that /a+/ is the same as /aa*/

29 General Multipliers /a{5,9}/ matches five to nine a's /a{5,}/ matches five or more a's /a{5}/ matches five a's Note a* same as a{0,} a+ same as a{1,} a? same as a{0,1}

30 Wildcards Dot matches anything /.../ matches any three characters /.*/ matches any string of any length /.*a/ matches any string that contains an a –So does /a/ /a.*a/ matches any string with at least two a's

31 Anchors ^ means the start of a string $ means the end of a string /^a.*/ matches any string that starts with “a” /^a..b$/ – matches four character strings starting with “a” and ending with “b” The ^ and $ don't match a character – they just “anchor” the match to the ends of the string

32 Examples $a = “hello world” $a =~ /^he/ $a =~ /l{2}/ $a =~ /l.$/ $a =~ /^.e.* w.{4}$/

33 What matches What? "hello world" =~ /.*l/ – the “l” matches the last l – because.* is “greedy” "hello world" =~ /.*l.*o/ – now it matches the second “l” "hello world" =~ /.*l.{2} / – now it matches the first

34 Non-greedy Multipliers.*? matches the shortest possible string "hello world" =~ /.*l/ “l” matches the last “l” "hello world" =~ /.*?l/ now it matches the first

35 Regular Expressions Used to describe complex text patterns And then do things with text patterns – Is this pattern in that text? – Extract the text that matches this pattern – Change the text that matches this pattern

36 Extracting Strings Parenthesis -- “(“ and “)” – can be used to group parts of a pattern – as long as what they contain is a sensible pattern Parenthesis do not affect the match – if it matches without them it will still match with them But they let you find out what did match

37 $1 $2 $3 The variables $1 $2 etc contain the string that matched the part in parentheses “hello world” =~ /(.*)l/ – $1 contains: hello wor “hello world” =~ /(.*?)l/ – $1 contains: he Can also use this syntax: ($string) = “hello world” =~ /(.*?)l/ –$string contains he Doesn't work without ( ) –Has to be list context

38 \1 \2 \3 \1 has the same value as $1 but can be used in the regex “hello world” =~ /(.) / – $1 contains an “o” “hello world” =~ /(.) (.*)\1/ – is true – \1 matches “o” in “world” – $2 eq “w”

39 Character Classes Fussier than. [0-9] matches one character from 0123456789 [a-z] matches lower case letters [A-Z] upper case [aeiou] matches a vowel [0-9] same as [0123456789]

40 Abbreviations \d –[0-9] \s –Whitespace \w –[0-9a-zA-Z_] \D –[^0-9] \b matches a boundary between a word character and a non-word character

41 Reading Read about regular expressions in your text Or visit – 5.10.0/pod/perlrequick.pod 5.10.0/pod/perlrequick.pod

42 Program Template

43 Programs should start with comment block like this: # Author: # Name: # Version: # Parameters: # Description: # C H A N G E L O G # Date Author Description # Change log could be omitted on ephemeral programs

44 Program Template Create a program template with your name filled in – Copy it to create new programs copy Fill in the rest of the comments after copying

45 This week's lab This week's lab uses regular expressions Read: Or chapter 12 in Perl for Dummies Or chapter 7 in Learning Perl Or Pattern Matching in chapter 2 of Programming Perl Free perl books in pdf format:

