Regular Expression Beihang Open Source Club.

Slides:



Advertisements
Similar presentations
Bioinformatics Programming 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Advertisements

Regular Expressions in Perl By Josue Vazquez. What are Regular Expressions? A template that either matches or doesn’t match a given string. Often called.
Regular Expressions (in Python). Python or Egrep We will use Python. In some scripting languages you can call the command “grep” or “egrep” egrep pattern.
CSCI 330 T HE UNIX S YSTEM Regular Expressions. R EGULAR E XPRESSION A pattern of special characters used to match strings in a search Typically made.
Regular Expressions grep and egrep. Previously Basic UNIX Commands –Files: rm, cp, mv, ls, ln –Processes: ps, kill Unix Filters –cat, head, tail, tee,
7 Searching and Regular Expressions (Regex) Mauro Jaskelioff.
Regular Expression Original Notes by Song Guo. What Regular Expressions Are Exactly - Terminology a regular expression is a pattern describing a certain.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
1 CSE 390a Lecture 7 Regular expressions, egrep, and sed slides created by Marty Stepp, modified by Jessica Miller and Ruth Anderson
1 CSE 303 Lecture 7 Regular expressions, egrep, and sed read Linux Pocket Guide pp , 73-74, 81 slides created by Marty Stepp
1 CSE 390a Lecture 7 Regular expressions, egrep, and sed slides created by Marty Stepp, modified by Jessica Miller
LING 388: Language and Computers Sandiway Fong Lecture 2: 8/23.
Regular Expressions Lecturer: Prof. Andrzej (AJ) Bieszczad Phone: “UNIX for Programmers and Users” Third Edition,
Regular Expressions. u A regular expression is a pattern which matches some regular (predictable) text. u Regular expressions are used in many Unix utilities.
Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl Linux editors and commands (e.g.
Filters using Regular Expressions grep: Searching a Pattern.
Regular Expression Darby Tien-Hao Chang (a.k.a. dirty) Department of Electrical Engineering, National Cheng Kung University.
System Programming Regular Expressions Regular Expressions
Programming Perl in UNIX Course Number : CIT 370 Week 4 Prof. Daniel Chen.
 Text Manipulation and Data Collection. General Programming Practice Find a string within a text Find a string ‘man’ from a ‘A successful man’
INFO 320 Server Technology I Week 7 Regular expressions 1INFO 320 week 7.
1 Regular Expressions CIS*2450 Advanced Programming Techniques Material for this lectures has been taken from the excellent book, Mastering Regular Expressions,
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
Regular Expression Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
I/O Redirection and Regular Expressions February 9 th, 2004 Class Meeting 4.
Regular Expressions in PHP. Supported RE’s The most important set of regex functions start with preg. These functions are a PHP wrapper around the PCRE.
I/O Redirection & Regular Expressions CS 2204 Class meeting 4 *Notes by Doug Bowman and other members of the CS faculty at Virginia Tech. Copyright
R EGULAR E XPRESSION IN P ERL (P ART 1) Thach Nguyen.
Unix Programming Environment Part 3-4 Regular Expression and Pattern Matching Prepared by Xu Zhenya( Draft – Xu Zhenya(
Regular Expressions CS 2204 Class meeting 6 Created by Doug Bowman, 2001 Modified by Mir Farooq Ali, 2002.
CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting Regular Expressions.
UNIX Commands RTFM: grep(1), egrep(1) & fgrep(1) Gilbert Detillieux April 13, 2010 MUUG Meeting.
CSCI 330 UNIX and Network Programming Unit IV Shell, Part 2.
CSE 374 Programming Concepts & Tools Hal Perkins Fall 2015 Lecture 5 – Regular Expressions, grep, Other Utilities.
Pattern Matching: Simple Patterns. Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that.
CS:414 INTRODUCTION TO UNIX AND LINUX Part 3: Regular Expressions and vi editor By Dr. Noman Hasany.
Regular Expressions Copyright Doug Maxwell (
RE Tutorial.
Regular expressions, egrep, and sed
Regular Expressions ICCM 2017
Regular Expression Basic and Extended regular expressions. The grep, egrep. Typical examples involving different regular expressions. By: Abhilash C B.
Regular expressions, egrep, and sed
Looking for Patterns - Finding them with Regular Expressions
Regular Expression - Intro
BASIC AND EXTENDED REGULAR EXPRESSIONS
Regular expressions, egrep, and sed
Regular Expressions and perl
Lecture 9 Shell Programming – Command substitution
Grep Allows you to filter text based upon several different regular expression variants Basic Extended Perl.
CSE 390a Lecture 7 Regular expressions, egrep, and sed
Folks Carelli, Instructor Kutztown University
Advanced Find and Replace with Regular Expressions
Unix Talk #2 grep/egrep/fgrep (maybe add more to this one….)
CSE 390a Lecture 7 Regular expressions, egrep, and sed
Chin-Chih Chang CS 497C – Introduction to UNIX Lecture 28: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
Regular expressions, egrep, and sed
Regular Expressions
Regular expressions, egrep, and sed
Regular expressions, egrep, and sed
CIT 383: Administrative Scripting
Regular Expressions grep and sed intro
Regular expressions, egrep, and sed
CSCI The UNIX System Regular Expressions
Regular expressions, egrep, and sed
1.5 Regular Expressions (REs)
Regular expressions, egrep, and sed
CSE 390a Lecture 7 Regular expressions, egrep, and sed
Lab 8: Regular Expressions
ADVANCE FIND & REPLACE WITH REGULAR EXPRESSIONS
Presentation transcript:

Regular Expression Beihang Open Source Club

Beihang Open Source Club A Practical Problem Doubled words Report and highlist lines with doubled words. Word at the end of one line is repeated at the beginning of the next. Capitalization differences. Separated by HTML tags :'...it's <B>very</B> very important...'. Beihang Open Source Club

Beihang Open Source Club #!/usr/bin/perl $/ = ".\n"; while (<>) { next if !s/\b([a-z]+)((?:\s|<[^>]+>)+)(\1\b)/\e[7m$1\e[m$2\e[7m$3\e[m/ig; s/^(?:[^\e]*\n)+//mg; # Remove any unmarked lines. s/^/$ARGV: /mg; # Ensure lines begin with filename. print; } Beihang Open Source Club

powerful, flexible, and efficient text processing. Regular expressions are the key to powerful, flexible, and efficient text processing. Beihang Open Source Club

Regular Expression As A language In shell: *.txt More powerful, more general ==> a generalized pattern language Regular expression: a complete language Two types of characters: Metacharacter => grammer Literal => word Filename pattern: limited metacharacter s!<emphasis>([0-9])+(\.[0-9]+){3})</emphasis>!<inet>$1</inet>! Beihang Open Source Club

Beihang Open Source Club Egrep Extended grep Grep: Global Regular Expression Print % egrep '^(From|Subject): ' mailbox-file Egrep metacharacters Beihang Open Source Club

Beihang Open Source Club Start/End Of Line ^cat matches a line with cat at the beginning. Good habit: interpreting in a rather literal way. ^cat matches if you have the beginning of a line, followed by immediately by c, followed immediately by a, followed immediately by t. cat$ ^$ ^cat$ Beihang Open Source Club

Beihang Open Source Club Character Class Any one of several characters. gr[ea]y sep[ae]r[ae]te and then vesus or <H[123456]> <H[1-6]> [0-9a-zA-Z] Mini language: minus Beihang Open Source Club

Negated Character Class Any character not listed: ^ caret % egrep 'q[^u]' word.list Match a character that's not listed; don't match what's listed Beihang Open Source Club

Beihang Open Source Club Match Any Character 03/19/76, 03-19-76, 03.19.76 03.19.76 03[-./]19[-./]76 03[.-/]19[.-/]76 19 203319 7639 Know your data! Beihang Open Source Club

Beihang Open Source Club Alternation Any one of several subexpressions grey|gray (Geoff|Jeff)(rey|ery) Alternation is constained by parenthesis. '^(From|Subject|Date): ' '^From|Subject|Date: ' Alternation: each alternative can be a full-fledged regex Character class: a single character Beihang Open Source Club

Beihang Open Source Club Word Boundary Anchor a position of a regular expression. Don't actually consume any characters during a match. \<cat\> means, Match if we can find a start-of-word position, followed immediately by c- a-t, followed immediately by an end-of-word position. Find the word cat. Beihang Open Source Club

Beihang Open Source Club Optional Items coloru?r Question mark Only to immediately-preceding item Always successful Example: (July|Jul) (fourth|4th|4) July? (fourth|4th|4) July? (fourth|4(th)?) Beihang Open Source Club

Other Quantifiers: Repetition + (plus): one or more of the immediately preceding item * (star): any number, including none, of the item Quantifier Example: <H3 *> <HR +SIZE *= *14 *> <HR +SIZE *= *[0-9]+ *> Space matters <HR( +SIZE *= *[0-9]+)? *> Beihang Open Source Club

Other Quantifiers: Repetition Intervals: definded range of matches ...{min,max} Example: ...{3,12} [a-zA-Z]{1,5} Beihang Open Source Club

Parentheses and Backreferences Can “remember” text matched by the subexpressoin they enclose Doubled-word problem: \<([a-zA-Z]+) +\1\> \1: metasequence Numbered by opening parentheses: ([a-z])([0-9])\1\2 % egrep -in '\<([a-z]+) +\1\>' files Egrep considers each line in isolation Beihang Open Source Club

Beihang Open Source Club Escape Match metacharater \. escaped period/escaped dot Except in character-class \([a-zA-Z]+\) match a word within parentheses Beihang Open Source Club

Beihang Open Source Club More Flavors Different tools: egrep, Perl, Java, awk... Different versions Goal of regular expression Line Character sequence Terminology: Regex Matching Metacharacter/meeteasequence Beihang Open Source Club

Beihang Open Source Club Even More Terminology: Subexpression Character Understanding how the regex engine really works is the key to really understanding regular expression Beihang Open Source Club

Beihang Open Source Club At Last Not all egrep programs are the same. Three reasons for using parentheses: Constraining alternation (fourth|4(th)?) Grouping Capturing \<([a-zA-Z]+) +\1\> Character classes are special – totally distinct set of metacharacters. Beihang Open Source Club

Beihang Open Source Club More Alternation | and character classes [] are fundamentally different. Negated character class: positive assertion Three types of escaped items: \ and a metacharacter \ and selected non-metacharacter ==> meteasequence \ and others: backslash ignored Question mark and star: don't need to acturally match any character to “match successfully” Beihang Open Source Club

Beihang Open Source Club THE END Beihang Open Source Club