CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting Regular Expressions.

Slides:



Advertisements
Similar presentations
Regular Expressions using Ruby Assignment: Midterm Class: CPSC5135U – Programming Languages Teacher: Dr. Woolbright Student: James Bowman.
Advertisements

ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 2: 8/23.
LING 388: Language and Computers Sandiway Fong Lecture 2: 8/23.
CS 330 Programming Languages 10 / 10 / 2006 Instructor: Michael Eckmann.
Regular Expressions. u A regular expression is a pattern which matches some regular (predictable) text. u Regular expressions are used in many Unix utilities.
Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl Linux editors and commands (e.g.
Regular Expressions Comp 2400: Fall 2008 Prof. Chris GauthierDickey.
CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting RSS.
Scripting Languages Chapter 8 More About Regular Expressions.
Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies Knowledge Engineering : Systems Integration : Web.
REGULAR EXPRESSIONS CHAPTER 14. REGULAR EXPRESSIONS A coded pattern used to search for matching patterns in text strings Commonly used for data validation.
Last Updated March 2006 Slide 1 Regular Expressions.
System Programming Regular Expressions Regular Expressions
 Text Manipulation and Data Collection. General Programming Practice Find a string within a text Find a string ‘man’ from a ‘A successful man’
CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting XML.
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
INFO 320 Server Technology I Week 7 Regular expressions 1INFO 320 week 7.
CIT 383: Administrative Scripting
A quick Ruby Tutorial, Part 3 COMP313 Source: Programming Ruby, The Pragmatic Programmers’ Guide by Dave Thomas, Chad Fowler, and Andy Hunt.
CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting Writing Methods.
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 4. Document Search and Regular Expressions.
CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting Methods and Hashes.
Kirkwood Center for Continuing Education Introduction to PHP and MySQL By Fred McClurg, Copyright © 2015, Fred McClurg, All Rights.
Regular Expressions in PHP. Supported RE’s The most important set of regex functions start with preg. These functions are a PHP wrapper around the PCRE.
Prof. Alfred J Bird, Ph.D., NBCT Door Code for IT441 Students.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Kirkwood Center for Continuing Education Introduction to PHP and MySQL By Fred McClurg, Copyright © 2010 All Rights Reserved. 1.
Regular Expressions. Overview Regular expressions allow you to do complex searches within text documents. Examples: Search 8-K filings for restatements.
Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn
ECA 225 Applied Interactive Programming1 ECA 225 Applied Online Programming regular expressions.
Regular Expressions in Perl CS/BIO 271 – Introduction to Bioinformatics.
12. Regular Expressions. 2 Motto: I don't play accurately-any one can play accurately- but I play with wonderful expression. As far as the piano is concerned,
©Brooks/Cole, 2001 Chapter 9 Regular Expressions.
CSC 4630 Meeting 21 April 4, Return to Perl Where are we? What is confusing? What practice do you need?
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
CSC 2720 Building Web Applications PHP PERL-Compatible Regular Expressions.
Copyright © Curt Hill Regular Expressions Providing a Search Pattern.
Unix Programming Environment Part 3-4 Regular Expression and Pattern Matching Prepared by Xu Zhenya( Draft – Xu Zhenya(
Regular Expressions CS 2204 Class meeting 6 Created by Doug Bowman, 2001 Modified by Mir Farooq Ali, 2002.
1 Lecture 9 Shell Programming – Command substitution Regular expressions and grep Use of exit, for loop and expr commands COP 3353 Introduction to UNIX.
CSCI 330 UNIX and Network Programming Unit IV Shell, Part 2.
CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting Numbers.
Scripting with Ruby What is a scripting language? What is Ruby?
Standard Types and Regular Expressions CS 480/680 – Comparative Languages.
CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting Directories.
An Introduction to Regular Expressions Specifying a Pattern that a String must meet.
Chapter 4 © 2009 by Addison Wesley Longman, Inc Pattern Matching - JavaScript provides two ways to do pattern matching: 1. Using RegExp objects.
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
Pattern Matching: Simple Patterns. Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
CMSC330 More Ruby. Last lecture Scripting languages Ruby language –Implicit variable declarations –Many control statements –Classes & objects –Strings.
Regular Expressions.
Regular Expressions Copyright Doug Maxwell (
Looking for Patterns - Finding them with Regular Expressions
Regular Expressions in Perl
CIT 383: Administrative Scripting
CIT 383: Administrative Scripting
CIT 383: Administrative Scripting
Folks Carelli, Instructor Kutztown University
CIT 383: Administrative Scripting
CIT 383: Administrative Scripting
CIT 383: Administrative Scripting
CIT 383: Administrative Scripting
CIT 383: Administrative Scripting
CIT 383: Administrative Scripting
Presentation transcript:

CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting Regular Expressions

CIT 383: Administrative Scripting Topics 1.Creating Regexp objects 2.Regular expression syntax 3.Pattern matching 4.Substitution

CIT 383: Administrative Scripting Regular Expressions Used to match patterns against strings.  UNIX commands: egrep, awk, sed  Ruby provides an expanded regexp syntax. Applications of regular expressions  Find every login failure in a log file.  Find every address you received from.  Find every IP address in a file.

CIT 383: Administrative Scripting Creating a Regexp object Three methods re = Regexp.new('^\s*[a-z]') re = /^\s*[a-z]/ re = %r|^\s*[a-z]| Modifiers i: ignore case when matching text m: multiline match, allow. to match \n x: extended syntax with comments + whitespace o: perform #{} interpolations only once

CIT 383: Administrative Scripting Pattern Syntax Characters match themselves except., |, (, ), [, ], {, }, +, \, ^, $, *, ? Use \ to escape, i.e. \| will match a | The. metacharacter matches any character. Anchors require match to match at start or end ^ matches the beginning of a line $ matches the end of a line \A matches the beginning of a string \Z matches the end of a string

CIT 383: Administrative Scripting Regexp Escape Sequences Similar to double quotes \t is tab \n is newline etc. Word boundaries /red/ matches “red”, “bred”, “reddened” /\bred\b/ matches only “red” \B matches nonword boundaries /\brub\B/ matches “ruby” but not “rub”

CIT 383: Administrative Scripting Character Classes Set of characters between brackets [aeiou] will match any vowel [ ] will match any digit Special characters aren’t special inside []’s Additional syntax [A-Z] is a range including all capital letters [A-Za-z0-9] is a range of alphanumerics [^A-Z] is a range of anything but capital letters

CIT 383: Administrative Scripting Special Character Classes Abbreviations \d is [0-9] \D is [^0-9] \s is [ \t\r\n\f] \S is [^ \t\r\n\f] \w is [A-Za-z0-9_] \W is [^A-Za-z0-9_] POSIX Classes [:alnum:] is [A-Za-z0-9] [:alpha:] is [A-Za-z] [:digit:] is [0-9] [:xdigit:] is [0-9A-Fa-f] [:lower:] is [a-z] [:upper:] is [A-Z] [:space:] is [ \t\r\n\f]

CIT 383: Administrative Scripting Alternation Vertical bar matches pattern before or after it pattern1|pattern2 Precedence red|blue matches either “red” or “blue” red ball|blue sky matches “red ball” or “blue sky” but not “red blue sky” or “red ball sky” Use parentheses to group in an expression red (ball|blue) sky

CIT 383: Administrative Scripting Repetition Repetition operators are greedy, matching as many occurrences as possible. re* matches zero or more occurrences of re re+ matches one or more occurrences of re re? matches zero or one occurrences of re re{n} matches exactly n occurrences of re re{n,} matches n or more occurrences of re re{n,m} matches at least n and at most m occurrences of re

CIT 383: Administrative Scripting Additional features Backreferences Regular expressions remember matches in () /([Rr])uby&\1ails/ will match  Ruby & Rails  ruby & rails /(\w+) \1/ will match a repeated word Greedy and non-greedy matching is greedy, will match “ perl>” is non-greedy, will match “ ”

CIT 383: Administrative Scripting Patching Matching Pattern-matching uses the =~ operator re = /[Rr]uby|[Pp]ython/ re =~ “Ruby is better than PHP.” After successful match, can retrieve details: data = Regexp.last_match data.string: the string that was compared data.to_s: the part of the string that matched data.pre_match: portion of string before match data.post_match: portion of string after match data[1]: what first set of () matched data[2]: what second set of () matched data.captures: what all sets of parentheses matched

CIT 383: Administrative Scripting Pattern Matching Methods Slicing “ruby123”[/\d+/] # 123 “ruby123”[/([a-z]+)(\d+)/,1] # ruby “ruby123”[/([a-z]+)(\d+)/,2] # 123 r = “ruby123” r.slice(/\d+/) # 123 r.slice!(/\d+/) # 123, r = “ruby” Splitting s = “one, two, three” s.split # [“one,”, “two,”, “three”] s.split(‘, ‘) # [“one, “two”, “three”] s.split(/\s*,\s*/) # [“one”,”two”,”three”]

CIT 383: Administrative Scripting Substitutions The String class provides RE substitutions sub(re, str): return string where the first substring matching re is replaced by str sub!(re, str): replace the first substring matching re with str gsub(re, str): return string where the all substrings matching re are replaced by str gsub!(re, str): replace all substrings matching re with str

CIT 383: Administrative Scripting Substitution Examples Remove ruby-style quotes line.sub!(/#.*$/, “”) Remove all non-digits line.gsub!(/\D/, “”) Capitalize specified words line.gsub!(/\brails\b/, ‘Rails’) Change “John Smith” to “Smith, John” name.sub!(/(\w+)\s+(\w+)/, ‘\2, \1’) Flip UNIX slashes to Windows slashes path.gsub!(%r|/|, ‘\\’)

CIT 383: Administrative ScriptingSlide #16 References 1.Michael Fitzgerald, Learning Ruby, O’Reilly, David Flanagan and Yukihiro Matsumoto, The Ruby Programming Language, O’Reilly, Hal Fulton, The Ruby Way, 2 nd edition, Addison- Wesley, Robert C. Martin, Clean Code, Prentice Hall, Dave Thomas with Chad Fowler and Andy Hunt, Programming Ruby, 2 nd edition, Pragmatic Programmers, 2005.