Input Validation with Regular Expressions COEN 351.

Slides:



Advertisements
Similar presentations
Regular Expressions in Perl By Josue Vazquez. What are Regular Expressions? A template that either matches or doesn’t match a given string. Often called.
Advertisements

Lecture 2 Introduction to C Programming
Introduction to C Programming
Introduction to C Programming
Chapter 2 Introduction to C Programming
Regular Expression Original Notes by Song Guo. What Regular Expressions Are Exactly - Terminology a regular expression is a pattern describing a certain.
Asp.NET Core Vaidation Controls. Slide 2 ASP.NET Validation Controls (Introduction) The ASP.NET validation controls can be used to validate data on the.
CS 898N – Advanced World Wide Web Technologies Lecture 8: PERL Chin-Chih Chang
Scripting Languages Chapter 6 I/O Basics. Input from STDIN We’ve been doing so with $line = chomp($line); Same as chomp($line= ); line input op gives.
 2008 Pearson Education, Inc. All rights reserved JavaScript: Introduction to Scripting.
Scalar Variables Start the file with: #! /usr/bin/perl –w No spaces or newlines before the the #! “#!” is sometimes called a “shebang”. It is a signal.
Regular Expressions.
 2007 Pearson Education, Inc. All rights reserved Introduction to C Programming.
LING 388: Language and Computers Sandiway Fong Lecture 3: 8/28.
CS 330 Programming Languages 10 / 10 / 2006 Instructor: Michael Eckmann.
Using regular expressions Search for a single occurrence of a specific string. Search for all occurrences of a string. Approximate string matching.
Data types and variables
More Regular Expressions. List/Scalar Context for m// Last week, we said that m// returns ‘true’ or ‘false’ in scalar context. (really, 1 or 0). In list.
Regular Expressions. What are regular expressions? A means of searching, matching, and replacing substrings within strings. Very powerful (Potentially)
Chapter 2 Data Types, Declarations, and Displays
Introduction to C Programming
Scripting Languages Chapter 8 More About Regular Expressions.
Regular Expression A regular expression is a template that either matches or doesn’t match a given string.
REGULAR EXPRESSIONS CHAPTER 14. REGULAR EXPRESSIONS A coded pattern used to search for matching patterns in text strings Commonly used for data validation.
Computer Science 1000 Spreadsheets II Permission to redistribute these slides is strictly prohibited without permission.
Last Updated March 2006 Slide 1 Regular Expressions.
Objectives You should be able to describe: Data Types
Regular Expression Darby Tien-Hao Chang (a.k.a. dirty) Department of Electrical Engineering, National Cheng Kung University.
Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. address.
 Text Manipulation and Data Collection. General Programming Practice Find a string within a text Find a string ‘man’ from a ‘A successful man’
Perl 6 Update - PGE and Pugs Dr. Patrick R. Michaud April 26, 2005.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 4. Document Search and Regular Expressions.
© Copyright 1992–2004 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. Chapter 2 Chapter 2 - Introduction to C Programming.
Week 1 Algorithmization and Programming Languages.
Constants Numeric Constants Integer Constants Floating Point Constants Character Constants Expressions Arithmetic Operators Assignment Operators Relational.
Copyright © 2010 Certification Partners, LLC -- All Rights Reserved Perl Specialist.
Lecture 2: Introduction to C Programming. OBJECTIVES In this lecture you will learn:  To use simple input and output statements.  The fundamental data.
Chapter 9: Perl (continue) Advanced Perl Programming Some materials are taken from Sams Teach Yourself Perl 5 in 21 Days, Second Edition.
Prof. Alfred J Bird, Ph.D., NBCT Door Code for IT441 Students.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Module 6 – Generics Module 7 – Regular Expressions.
ECA 225 Applied Interactive Programming1 ECA 225 Applied Online Programming regular expressions.
Regular Expressions in Perl CS/BIO 271 – Introduction to Bioinformatics.
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
Pattern Matching II. Greedy Matching When dealing with quantifiers, Perl’s pattern matcher is by default greedy. For example, –$_ = “Bob sat next to the.
Copyright © 2003 ProsoftTraining. All rights reserved. Perl Fundamentals.
Copyright © Curt Hill Regular Expressions Providing a Search Pattern.
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong. Adminstrivia Homework 4 not yet graded …
1 Lecture 9 Shell Programming – Command substitution Regular expressions and grep Use of exit, for loop and expr commands COP 3353 Introduction to UNIX.
Karthik Sangaiah.  Developed by Larry Wall ◦ “There’s more than one way to do it” ◦ “Easy things should be easy and hard things should be possible” 
 2008 Pearson Education, Inc. All rights reserved JavaScript: Introduction to Scripting.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Fluency with Information Technology Third Edition by Lawrence Snyder Chapter.
Standard Types and Regular Expressions CS 480/680 – Comparative Languages.
© Copyright 1992–2004 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. 1 Chapter 2 - Introduction to C Programming Outline.
CHAPTER 2 PROBLEM SOLVING USING C++ 1 C++ Programming PEG200/Saidatul Rahah.
Introduction to Programming the WWW I CMSC Winter 2004 Lecture 13.
 2007 Pearson Education, Inc. All rights reserved. A Simple C Program 1 /* ************************************************* *** Program: hello_world.
An Introduction to Regular Expressions Specifying a Pattern that a String must meet.
Variable Variables A variable variable has as its value the name of another variable without $ prefix E.g., if we have $addr, might have a statement $tmp.
Introduction to Programming the WWW I CMSC Winter 2003 Lecture 17.
Pattern Matching: Simple Patterns. Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.
1 Lecture 8 Shell Programming – Control Constructs COP 3353 Introduction to UNIX.
1 Lecture 2 - Introduction to C Programming Outline 2.1Introduction 2.2A Simple C Program: Printing a Line of Text 2.3Another Simple C Program: Adding.
Chapter 2 Variables.
Pattern Matching in Strings
Chapter 2 Variables.
Presentation transcript:

Input Validation with Regular Expressions COEN 351

Input Validation Security Strategies  Black List List all things that are NOT allowed  List is difficult to create  Adding insecure constructs on a continuous basis means that the previous version was unsafe  Testing is based on known attacks.  List from others might not be trustworthy.  White List List of things that are allowed  List might be incomplete and disallow good content  Adding exceptions on a continuous basis does not imply security holes in previous versions.  Testing can be based on known attacks.  List from others can be trusted if source can be trusted.

Perl Regular Expressions Regular Expression = Pattern  Template that either matches or does not match a string

Excursus: Getting Input in Perl Use to read from standard input Use ‘defined’ construct to tell if read was successful while(defined($line= )){ print “I saw $line”; }

Excursus: Getting Input in Perl Non-sensical shortcut Uses standard loop variable $_ while( ){ print "I saw $_"; } foreach( ){ print "I saw $_"; } Gets line, executes body of loop. Gets all the lines, then executes body of loop. $_ is the default loop variable.

Excursus: Getting Input in Perl The STDIN is a default chomp acts on default variable $_ while(<>){ chomp; print "I saw $_\n"; }

Perl Regular Expressions Matching and substitution are fundamental tasks in Perl Implemented using one letter operators:  m/PATTERN/  m// pattern matching  s/PATTERN/REPLACEMENT/  s/// Substitution

Perl Regular Expressions Meta-characters in a pattern need escaping with backslash  \  |  ( )  [ ]  { }  ^  $  *  +  ?

Perl Regular Expressions Interpolation  Perl substitutes strings in strings: $foo = “bar”; /$foo$/;  Equivalent to: /bar$/;

Perl Regular Expression: Binding Operator Pattern matching is so frequent in Perl that there is a special operator Normally, pattern matching is done on default operand $_ =~ binds a string expression to a pattern match (substitution, transliteration)

Perl Regular Expression: Binding Operator =~ has left operand a string =~ has right operand a pattern  Could be interpreted at run time. Returns true / false depending on the success of match. !~ operation is the same, but result is negated.

Perl Regular Expression: Binding Operator $_ =~ $pat; is equivalent to $_ =~ /$pat/; but is less efficient since giving the pattern directly since the regular expression will be recompiled at run time

Perl Regular Expression: Binding Operator Example if ( ($k,$v) = $string =~ m/(\w+)=(\w*)/) { print “Key $k Value $v\n”; } Since =~ has precedence over =, it is evaluated first. The binding operator binds variable $string to a pattern looking for expressions like “ key=word. The binding expression is done in a list context, hence, the resulting matches are returned as a list. The list is then assigned to ($k,$v). The result of the assignment is the number of things assigned, i.e. typically 2. Since 2 is not 0, this is equivalent to true and hence the if-block is entered.

Perl Regular Expressions Qualifiers:  * matches the preceding character zero or more times. Pattern “abc*d” is matched by  rabd  zabccccd  Use parentheses to group letters #/perl/bin/perl while(<>){ chomp; last if $_ eq 'stop'; if (/abc*d /) { print "Matched: |$` $'|\n"; } else { print "No match.\n"; } #/perl/bin/perl while(<>){ chomp; last if $_ eq 'stop'; if (/a(bc)*d /) { print "Matched: |$` $'|\n"; } else { print "No match.\n"; }

Perl Regular Expressions Qualifiers:  ‘*’ matches zero or more instances  ‘+’ matches one or more instances “ab(cde)+fg”  ‘?’ matches none or one

Perl Regular Expressions Alternatives  ‘|’ “or” Either the right or the left side matches

Perl Regular Expressions Character Classes  List of possible characters inside a square bracket  Example: [a-cw-z]+ [a-zA-Z0-9]  Negation provided by caret [^n\-z] matches any character but ‘n’, ‘-’, ‘z’

Perl Regular Expressions Character classes shortcuts  \w (word) is a shortcut for [A-Za-z0-9]  \s (space) is a shortcut for [\f\t\n\r ]  \d (digit) is a shortcut for [0-9]  [^\d] anything but a digit  [^\s] anything but a space character  [^\w] anything but a word character

Perl Regular Expressions Perl regex semantics are based on:  Greed Perl tries to match as much of an expression as is possible  Eagerness Perl gives the first possible match The left-most match wins  Backtracking The entire expression needs to match Perl regex evaluation backtracks if match is impossible

Perl Regular Expressions Eagerness Example:  What is the result of this snippet $string = “boo hoo“; $string =~ s/o*/e/; #left side of =~ needs to be an l-value boo hoo be hoo bee hoo boo heo boo hee eboo hoo

Perl Regular Expressions Quantifiers *, +, ? are not always enough Specify number of occurrences by placing comma separated range in curly brackets  /a{2,12}/ 2 to 12 ‘a’  /a{5,}/ 5 or more ‘a’  /a{5}/ exactly 5 ‘a’

Perl Regular Expressions Anchors  pattern can match everywhere in the string unless you use anchors  ^ beginning of string  $ end of string  /b start or end of a group of w-characters  /B non-word boundary anchor Examples:  /^hello/ matches only at beginning of string  /world$/ matches only at the end of string

Perl Regular Expressions Parentheses and Memory  ( ) group together part of a pattern  Also remember corresponding match part of string.  These are put into a backreference Made by backslash followed by number Available as $1, … after matching Examples  /(.)\1/ matches any character followed by itself  /../ matches any two characters  /([‘”]).*\1/ matches any string starting with single or double quotes followed by zero or more arbitrary characters followed by the same type of quotes. “doesn’t match’ “does match” ‘does match’

Perl Regular Expressions Validating Out of channel verification:  Ask for addresses twice to weed out typos.  Send to address given.  Still need to prevent command-line insertion Lookup of DNS records for MX records  Assumes site connectivity Regular expressions  Typically have subtle errors is valid, but fails simple regex is valid, deliverable, but probably fake

Perl Regular Expressions Validating if ( $ =~ ) { … }  checks for an ampersand if ( $ =~ )  checks for non-white space characters divided by an ampersand  matches if ( $ =~ ) if ( $ =~  matches most valid s, but allows multiple s if ( $ =~  anchored at beginning and end of word

Perl Regular Expressions Checking for strings that only contain alphabetic characters.  ASCII based regex is insufficient: if($var =~ /^[a-zA-Z]+$/) Does not work for characters with diacritic marks  Best solution is to use Unicode properties if($var =~ /^[^\W\d_]+$/) Explanation:  \w matches alphabetic, numeric, underscore (alphanumunder)  \W is a non-alphanumunder  [^\W\d_] is a character that is neither non-alphanumunder, digit, or underscore, hence an alphabetic character  Could also use POSIX character classes, but those depend on locale

Perl Regular Expressions Making regex readable  Place semantic units into a variable with an appropriate name $optional_sign = ‘[-+]?‘; $mandatory_digits = ‘\d+’; $decimal_point = ‘\.?’; $optinonal_digits = ‘\d*’; $number = $optional_sign.$mandatory_digits. $decimal_point.$optional_digits; if ( /($number)/){ … }

Perl Regular Expressions