1.0.1.8 – Introduction to Perl 12/13/2015 1.0.1.8.2 - Introduction to Perl - Strings, Truth and Regex 1 1.0.1.8.2 Introduction to Perl Session 2 · manipulating.

Slides:



Advertisements
Similar presentations
Perl & Regular Expressions (RegEx)
Advertisements

Scalar Data Types and Basic I/O
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All rights reserved. 1 Chapter 9 Strings.
Second edition Your UNIX: The Ultimate Guide Das © 2006 The McGraw-Hill Companies, Inc. All rights reserved. UNIX – The Master Manipulator perl Perl is.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
Introduction to Perl Bioinformatics. What is Perl? Practical Extraction and Report Language A scripting language Components an interpreter scripts: text.
CS 898N – Advanced World Wide Web Technologies Lecture 8: PERL Chin-Chih Chang
CS311 – Today's class Perl – Practical Extraction Report Language. Assignment 2 discussion Lecture 071CS Operating Systems I.
COS 381 Day 19. Agenda  Assignment 5 Posted Due April 7  Exam 3 which was originally scheduled for Apr 4 is going to on April 13 XML & Perl (Chap 8-10)
 2008 Pearson Education, Inc. All rights reserved JavaScript: Introduction to Scripting.
More Regular Expressions. List/Scalar Context for m// Last week, we said that m// returns ‘true’ or ‘false’ in scalar context. (really, 1 or 0). In list.
 2004 Prentice Hall, Inc. All rights reserved. Chapter 25 – Perl and CGI (Common Gateway Interface) Outline 25.1 Introduction 25.2 Perl 25.3 String Processing.
Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies Knowledge Engineering : Systems Integration : Web.
Computer Programming for Biologists Class 2 Oct 31 st, 2014 Karsten Hokamp
Lecture 7: Perl pattern handling features. Pattern Matching Recall =~ is the pattern matching operator A first simple match example print “An methionine.
Sort the Elements of an Array Using the ‘sort’ keyword, by default we can sort the elements of an array lexicographically. Elements considered as strings.
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
ASP.NET Programming with C# and SQL Server First Edition Chapter 5 Manipulating Strings with C#
Strings The Basics. Strings can refer to a string variable as one variable or as many different components (characters) string values are delimited by.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
– Introduction to Perl 10/20/ Introduction to Perl - Lists and Arrays Introduction to Perl Session 3 · lists and arrays.
Meet Perl, Part 2 Flow of Control and I/O. Perl Statements Lots of different ways to write similar statements –Can make your code look more like natural.
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
Regular Expressions.
CS 330 Programming Languages 10 / 07 / 2008 Instructor: Michael Eckmann.
Perl: Lecture 1 The language. What Perl is Merger of Unix tools – Very popular under UNIX – shell, sed, awk Programming language – C syntax Scripting.
Chapter 9: Perl Programming Practical Extraction and Report Language Some materials are taken from Sams Teach Yourself Perl 5 in 21 Days, Second Edition.
REGEX. Problems Have big text file, want to extract data – Phone numbers (503)
Overview A regular expression defines a search pattern for strings. Regular expressions can be used to search, edit and manipulate text. The pattern defined.
Prof. Alfred J Bird, Ph.D., NBCT Office – McCormick 3rd floor 607 Office Hours – Tuesday and.
Computer Programming for Biologists Class 3 Nov 13 th, 2014 Karsten Hokamp
CS 105 Perl: Data Types Nathan Clement 15 May 2014.
More Strings CS303E: Elements of Computers and Programming.
– Introduction to Perl 12/12/ Introduction to Perl - Searching and Replacing Text Introduction to Perl Session 7 ·
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
Copyright © Curt Hill Regular Expressions Providing a Search Pattern.
Introduction to Perl October 4, 2004 Class Meeting 7 * Notes on Perl by Lenwood Heath, Virginia Tech © 2004.
1 Lecture 9 Shell Programming – Command substitution Regular expressions and grep Use of exit, for loop and expr commands COP 3353 Introduction to UNIX.
Topic 2: Working with scalars CSE2395/CSE3395 Perl Programming Learning Perl 3rd edition chapter 2, pages 19-38, Programming Perl 3rd edition chapter.
 2008 Pearson Education, Inc. All rights reserved JavaScript: Introduction to Scripting.
Programming Fundamentals. Overview of Previous Lecture Phases of C++ Environment Program statement Vs Preprocessor directive Whitespaces Comments.
Basic Variables & Operators Web Programming1. Review: Perl Basics Syntax ► Comments: start with # (ignored by Perl) ► Statements: ends with ; (performed.
Perl Variables: Array Web Programming1. Review: Perl Variables Scalar ► e.g. $var1 = “Mary”; $var2= 1; ► holds number, character, string Array ► e.g.
Standard Types and Regular Expressions CS 480/680 – Comparative Languages.
2.1 Scalar data - revision numeric e-14 ( = 6.35 × )‏ operators: + (addition) - (subtraction) * (multiplication) / (division)
An Introduction to Programming with C++ Sixth Edition Chapter 13 Strings.
8 1 String Manipulation CGI/Perl Programming By Diane Zak.
Perl for Bioinformatics Part 2 Stuart Brown NYU School of Medicine.
Arrays and Lists. What is an Array? Arrays are linear data structures whose elements are referenced with subscripts. Just about all programming languages.
Variable Variables A variable variable has as its value the name of another variable without $ prefix E.g., if we have $addr, might have a statement $tmp.
The Scripting Programming Language
Dept. of Animal Breeding and Genetics Programming basics & introduction to PERL Mats Pettersson.
Operators. Perl has MANY operators. –Covered in Chapter 3 of Camel –perldoc perlop Many operators have numeric and string version –remember Perl will.
Pattern Matching: Simple Patterns. Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that.
OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.
Winter 2016CISC101 - Prof. McLeod1 CISC101 Reminders Quiz 3 this week – last section on Friday. Assignment 4 is posted. Data mining: –Designing functions.
Built-In Functions. Notations For each function, I will give its name and prototype. –prototype = number and type of arguments ARRAY means an actual named.
Lesson 4 String Manipulation. Lesson 4 In many applications you will need to do some kind of manipulation or parsing of strings, whether you are Attempting.
COMP234-Perl Variables, Literals Context, Operators Command Line Input Regex Program template.
Chapter 9 – String Manipulation
CSC 4630 Meeting 7 February 7, 2007.
Lecture 19 Strings and Regular Expressions
Lecture 9 Shell Programming – Command substitution
Miscellaneous Items Loop control, block labels, unless/until, backwards syntax for “if” statements, split, join, substring, length, logical operators,
Pattern Matching in Strings
Perl Variables: Array Web Programming.
Context.
String Processing 1 MIS 3406 Department of MIS Fox School of Business
Introduction to Computer Science
String methods 26-Apr-19.
Presentation transcript:

– Introduction to Perl 12/13/ Introduction to Perl - Strings, Truth and Regex Introduction to Perl Session 2 · manipulating strings · basic regular expressions

– Introduction to Perl 12/13/ Introduction to Perl - Strings, Truth and Regex 2 administrative · workshop slides are available at mkweb.bcgsc.ca/perlworkshop

– Introduction to Perl 12/13/ Introduction to Perl - Strings, Truth and Regex 3 Recap · scalar variables are prefixed by $ sigil and can contain characters or numbers · Perl interpolates variables in double quotes “ “ but not in single quotes ‘ ‘ · == and eq are the equality test operators for numbers and strings · undef is a special keyword used to undefine a variable double quote operator qq()single quote operator q() $var = 1 qq(var)varq(var)var qq(“var”)“var”q(“var”)“var” qq($var)1q($var)$var qq(${var}2)12q(${var}2)${var}2 qq{\$var}$varq{\$var}\$var qq/‘$var’/‘1’q/‘$var’/‘$var’

– Introduction to Perl 12/13/ Introduction to Perl - Strings, Truth and Regex 4 String Manipulation · manipulating strings in Perl is very easy · large number of functions help you massage, cut, and glue strings together · today we will explore how to · concatenate strings · replace parts of a string · determine the length of a string · change the case of a string · I will also introduce regular expressions, which can be used to · split a string on a boundary · search a string for patterns

– Introduction to Perl 12/13/ Introduction to Perl - Strings, Truth and Regex 5 Concatenating Strings · we’ve already seen one way to concatenate values of scalar variables · concatenation operator. · create a new variable and use interpolation to place strings in appropriate spot $x = “baby”; $y = “is”; $z = “crying”; $s = “ “; $phrase = $x. “ “. $y. “ “. $z; $phrase = $x. $s. $y. $s. $z; $phrase = qq($x $y $z); $phrase = qq($x$s$y$s$z);

– Introduction to Perl 12/13/ Introduction to Perl - Strings, Truth and Regex 6 Concatenating with join · use perldoc –f FUNCTION to learn about a built-in Perl function · given a list of strings, you can glue them together with a given string using join > perldoc -f join join EXPR,LIST Joins the separate strings of LIST into a single string with fields separated by the value of EXPR, and returns that new string. Example: $rec = join(':', $login,$passwd,$uid,$gid,$gcos,$home,$shell); See split. ($x,$y,$z,$s) = (“baby”,”is”,”crying”,” “); $phrase = join(“ “,$x,$y,$z); $phrase = join($s,$x,$y,$z);

– Introduction to Perl 12/13/ Introduction to Perl - Strings, Truth and Regex 7 Concatenating with join · join takes a list as an argument · first element is the glue · all other elements are the things to be glued · we’re drowning in double quotes here · we’re creating a list of strings and need to delimit each string with “ “ or qq( ) ($x,$y,$z,$s) = (“baby”,”is”,”crying”,” “); $phrase = join(“ “,$x,$y,$z,”-”,”make”,”it”,”stop”); baby is crying - make is stop $phrase = join(“ “,1,”+”,1,”=“,2);1 + 1 = 2 (“babies”,”cry”,”a”,”lot”);# noisy syntax (babies cry a lot);# ERROR - barewords

– Introduction to Perl 12/13/ Introduction to Perl - Strings, Truth and Regex 8 Word List Operator qw( ) · qw( STRING ) splits the STRING into words along whitespace characters and evaluates to a list of these words · no quotes are necessary · qw( ) does not interpolate $x = “camels”; $y = “spit”; $z = “far”;... or ($x,$y,$z) = qw(camels spit far); $num = 3; ($w,$x,$y,$z) = qw($num camels spit far); print “$w $x $y $z”; $num camels spit far

– Introduction to Perl 12/13/ Introduction to Perl - Strings, Truth and Regex 9 Use qw( ) for Concise Assignment · assigning values to multiple variables on one line is a good idea · terse · easy to read · even better if the variables are semantically related · we haven’t seen lists formally yet, but we are using them here · a list is an ordered set of things (e.g. the soup) · an array is a variable which holds a list (e.g. the can) · the distinction is important because we can use lists without creating array variables ($w,$x,$y,$z) = qw(blue 1 10$10 5.5);$y  10$10 evaluates to a list  qw(blue 1 10$10 5.5); ($w,$x,$y,$z)  expects a list

– Introduction to Perl 12/13/ Introduction to Perl - Strings, Truth and Regex 10 Extracting Parts of a String · substr is both an interesting and useful function · it demonstrates Perl’s flexibility because it can be used as an l-value · l-value  you can assign the output of substr values · substr takes 2-4 arguments and behaves differently in each case · strings are 0-indexed – first character in the string is indexed by 0 (zero) · substr(STRING,OFFSET) returns the part of the STRING starting at OFFSET $string = “soggy vegetables in the crisper”; ||||||||||||||||||||||||||||||| +’ve index   -’ve index $substring = substr($string,6); soggy vegetables in the crisper $substring = substr($string,0); soggy vegetables in the crisper $substring = substr($string,-1); soggy vegetables in the crisper $substring = substr($string,-7); soggy vegetables in the crisper

– Introduction to Perl 12/13/ Introduction to Perl - Strings, Truth and Regex 11 Extracting Parts of a String · substr(STRING,OFFSET,LEN) extracts LEN characters from the string, starting at OFFSET $substring = substr($string,6,10); soggy vegetables in the crisper $substring = substr($string,6,100); soggy vegetables in the crisper $substring = substr($string,-3); soggy vegetables in the crisper $substring = substr($string,-3,1); soggy vegetables in the crisper $substring = substr($string,-3,2); soggy vegetables in the crisper $substring = substr($string,-3,3); soggy vegetables in the crisper $substring = substr($string,6,5); soggy vegetables in the crisper $substring = substr($string,6,-5); soggy vegetables in the crisper $substring = substr($string,1,-1); soggy vegetables in the crisper $string = “soggy vegetables in the crisper”; ||||||||||||||||||||||||||||||| +’ve index   -’ve index

– Introduction to Perl 12/13/ Introduction to Perl - Strings, Truth and Regex 12 Determining the Length of a String · length( STRING ) returns the number of characters in the string · this includes any special characters like newline · escaped characters like \$ count for +1 $string = “soggy vegetables in the crisper”; ||||||||||||||||||||||||||||||| +’ve index   -’ve index $len = length($string); 31

– Introduction to Perl 12/13/ Introduction to Perl - Strings, Truth and Regex 13 Replacing Parts of a String · substr( ) returns a part of a string · substr( ) is also used to replace parts of a string · substr(STRING,OFFSET,LEN) = VALUE replaces the characters that would normally be returned by substr(STRING,OFFSET,LEN) with VALUE · VALUE can be shorter or longer than LEN – the string shrinks as required $substring = substr($string,0,5); soggy vegetables in the crisper substr($string,0,5) = “very tasty”; very tasty vegetables in the crisper substr($string,0,5) = “no”; no vegetables in the crisper substr($string,0,5) = “tasty”; tasty vegetables in the crisper

– Introduction to Perl 12/13/ Introduction to Perl - Strings, Truth and Regex 14 More on substr( ) · instead of assigning a value to substr( ), use the replacement string as 4th arg · the 4 arg version of substr( ) returns the string that was replaced substr($string,0,5) = “no”; no vegetables in the crisper substr($string,0,5,”no”); no vegetables in the crisper $prev = substr($string,0,5,”no”); no vegetables in the crisper $prev = “soggy” $x = “i have no food in my fridge”; $y = substr($x,0,length($x),”take out!”); $x  ? $y  ?

– Introduction to Perl 12/13/ Introduction to Perl - Strings, Truth and Regex 15 Changing Case · there are four basic case operators in Perl · lc – convert all characters to lower case · uc – convert all characters to upper case · lcfirst – convert first character to lower case · ucfirst – convert first character to upper case $x = “federal case”; $y = uc $x; FEDERAL CASE $y = ucfirst $x; Federal case $y = lcfirst uc $x fEDERAL CASE

– Introduction to Perl 12/13/ Introduction to Perl - Strings, Truth and Regex 16 Converting Case Inline · convert case inline with \U \L \u \l · \L ~ lc\U ~ uc · \l ~ lcfirst\u ~ ucfirst · \E terminates effect of \U \L \u \l $x = “\Ufederal case”; FEDERAL CASE $x = “\Ufederal\E case”; FEDERAL case $x = “\ufederal \ucase”; Federal Case $y = qq(\U$error\E $message);

– Introduction to Perl 12/13/ Introduction to Perl - Strings, Truth and Regex 17 Regular Expressions · a regular expression is a string that describes or matches a set of strings according to syntax rules · Perl’s match operator is m/ / (c.f. qq/ / or q/ /) · the m is frequently dropped, and / / is used · to bind a regular expression to a string =~ is used · we will later see that m/ / may be used without accompanying =~ · you must think of =~ as a binary operator, like + or -, which returns a value $string =~ m/REGEX/$string =~ /REGEX/ see this think this $string =~ m/REGEX/  =~($string,REGEX)

– Introduction to Perl 12/13/ Introduction to Perl - Strings, Truth and Regex 18 Regular Expressions · regular expressions are made up of · characters – literals like the letter “a” or number “1” · metacharacters – special characters that have complex meaning · character classes – a single character that can match a variety of characters · modifiers – determine plurality (how many) characters can be matched (e.g. one, more than one) · and others · we’ll start slow and build up a basic vocabulary of regular expressions · commonly the following paradigm is seen with regular expressions · remember that =~ is a binary operator – it will return true if a match is successful if ($string =~ /REGEX/) { # do this if REGEX matches the $string } else { # do this, otherwise }

– Introduction to Perl 12/13/ Introduction to Perl - Strings, Truth and Regex 19 Regular Expressions · the most basic regular expression is one which contains the string you want to match, as literals · regular expressions are case sensitive, unless / /i is used · i is one of many flags that control how the REGEX is applied $string = “Hello world”; if ($string =~ /Hello/) { print “string matched”;  a match is made in this case } else { print “no match”; } $string = “Hello world”; if ($string =~ /hello/i) { print “string matched”;  a match is made in this case } else { print “no match”; }

– Introduction to Perl 12/13/ Introduction to Perl - Strings, Truth and Regex 20 Regular Expressions – Character Classes · two commonly used character classes are. and [ ] ·. means “any character” · [ ] means “any of these characters”, e.g. [abc] will match either a or b or c, not ab or abc · when used in isolation these classes match a single character in your string · [ ] works with a range · [a-z], [c-e], [0-9] $string = “hello world”; match?matched by class $string =~ /hello/YES $string =~ /HeLLo/iYES $string =~ /hell./YESo $string =~ /hell[abc]/NO $string =~ /hell[aeiou]/YESo $string =~ /hel/YES $string =~ /hel[lo]/YESl $string =~ /hel[lo]o/YESl $string =~ /he[ll]o/NO

– Introduction to Perl 12/13/ Introduction to Perl - Strings, Truth and Regex 21 Three Ubiquitous Character Classes · \d – any digit · equivalent to [ ] or [0-9] · \w – any alphanumeric character or _ · equivalent to [a-zA-Z0-9_] · \s – any whitespace regex matches if string contains... /\d\d\d/ three digits in succession /1\d2/ 1 followed by any digit followed by 2 /\d\s\d/ a digit followed by a whitespace followed by a digit /[aeiou].[aeiou]/ a lowercase vowel followed by any character followed by lowercase vowel /[aeiou][1-5].B/i a vowel followed by any digit in the range 1-5 followed by any character followed by B or b (case insensitive match) $string = “hello” $string =~ /[hello]/  ?

– Introduction to Perl 12/13/ Introduction to Perl - Strings, Truth and Regex 22 Splitting a String · split is used to create a list from a string, by splitting it along a boundary · reverse of join, which takes a list and glues elements together using a delimiter · split takes a regular expression to act as the boundary · split(/REGEX/,$string) join qw(a b c)  “a b c” split “a b c”  qw(a b c) $string = “once upon a camel”; ($a,$b,$c,$d) = split(/\s/,$string)# split along a single white space $string = “ ”; ($a,$b,$c,$d) = split(/-/,$string)# split along hyphen  (1,2,3,4)

– Introduction to Perl 12/13/ Introduction to Perl - Strings, Truth and Regex 23 Splitting Along Spaces · because whitespace (tab, space) is such a common delimiter, split can be used with “ “ as a boundary to mean any (positive) amount of whitespace · note that split(/ /,$string) would split between single spaces $string = “a b c d” split(“ “,$string)  qw(a b c d) $string = “a b c d” split(/ /,$string)  “a”,”b”,””,”c”,””,””,”d” think this a_b_[]_c_[]_[]_d where [] is the empty string

– Introduction to Perl 12/13/ Introduction to Perl - Strings, Truth and Regex 24 Splitting a String · split is perfect for separating content from delimiters · split creates output (a list) suitable for input to join $string = “user:password:flag”; ($user,$password,$flag) = split(“:”,$string);user password flag $string = “2_5_100” ($x,$y,$z) = split(“_”,$string); $string = “a1b2c”; ($x,$y,$z) = split(/\d/,$string); a b c $string = “a b c d e f g”; join(“ “, split(“ “,$string) ); a b c d e f g join(“-“, split(“ “,$string) ); a-b-c-d-e-f-g join(“ and “, split(“ “,$string) ); a and b and c and d and e and f and g

– Introduction to Perl 12/13/ Introduction to Perl - Strings, Truth and Regex 25 Chop and Chomp · chomp is a boon and used everywhere · it removes a trailing newline (actually the current record separator) from a string · it’s safe to use because it doesn’t touch other characters · it returns the total number of characters chomped · chop removes the last character (whatever it may be) and returns it # $string may have a newline at the end chomp $string; # now string has no newline at the end $string = “camels”; $x = chop $string; $string  camel $x  s

– Introduction to Perl 12/13/ Introduction to Perl - Strings, Truth and Regex 26 Short Script $sequence = undef; for (1..100) { $x = rand(); if ( $x < 0.25 ) { $sequence = $sequence. q(a); } elsif ( $x < 0.5 ) { $sequence = $sequence. q(c); } elsif ( $x < 0.75 ) { $sequence = $sequence. q(g); } else { $sequence = $sequence. q(t); } print $sequence; print “saw poly-A” if $sequence =~ /aaaa/; print “saw aantt” if $sequence =~ /aa.tt/; print join(“ + ”, split(“ata”,$sequence)); ### output atcgccaagttggtgtagatatgaggcccgtccattgttcgtacttaacatgtctgtatagggatctgcttatacttgtcggagataatacggtggcgcg saw aantt atcgccaagttggtgtag + tgaggcccgtccattgttcgtacttaacatgtctgt + gggatctgctt + cttgtcggag + + cggtggcgcg

– Introduction to Perl 12/13/ Introduction to Perl - Strings, Truth and Regex Introduction to Perl Session 2 · you now know · all about string manipulation · a little about regular expressions · use of split, join, and chomp · next time · lists and arrays