Quiz 30 minutes 10 questions No talking, texting, collaboration, etc…

Slides:



Advertisements
Similar presentations
Session 3BBK P1 ModuleApril 2010 : [#] Regular Expressions.
Advertisements

Regular Expressions (in Python). Python or Egrep We will use Python. In some scripting languages you can call the command “grep” or “egrep” egrep pattern.
IT151: Introduction to Programming
7 Searching and Regular Expressions (Regex) Mauro Jaskelioff.
1 CSE 390a Lecture 7 Regular expressions, egrep, and sed slides created by Marty Stepp, modified by Jessica Miller and Ruth Anderson
1 CSE 303 Lecture 7 Regular expressions, egrep, and sed read Linux Pocket Guide pp , 73-74, 81 slides created by Marty Stepp
1 CSE 390a Lecture 7 Regular expressions, egrep, and sed slides created by Marty Stepp, modified by Jessica Miller
CS 497C – Introduction to UNIX Lecture 31: - Filters Using Regular Expressions – grep and sed Chin-Chih Chang
Regular Expressions. u A regular expression is a pattern which matches some regular (predictable) text. u Regular expressions are used in many Unix utilities.
Regular Expressions Comp 2400: Fall 2008 Prof. Chris GauthierDickey.
Regex Wildcards on steroids. Regular Expressions You’ve likely used the wildcard in windows search or coding (*), regular expressions take this to the.
1 Day 16 Sed and Awk. 2 Looking through output We already know what “grep” does. –It looks for something in a file. –Returns any line from the file that.
Regular Expressions A regular expression defines a pattern of characters to be found in a string Regular expressions are made up of – Literal characters.
Last Updated March 2006 Slide 1 Regular Expressions.
Overview of the grep Command Alex Dukhovny CS 265 Spring 2011.
System Programming Regular Expressions Regular Expressions
Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. address.
 Text Manipulation and Data Collection. General Programming Practice Find a string within a text Find a string ‘man’ from a ‘A successful man’
PHP Workshop ‹#› Data Manipulation & Regex. PHP Workshop ‹#› What..? Often in PHP we have to get data from files, or maybe through forms from a user.
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
The UNIX Shell. The Shell Program that constantly runs at terminal after a user has logged in. Prompts the user and waits for user input. Interprets command.
Review Please turn in your homework and practicals sed.
Finding the needle(s) in the textual haystack
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
(Stream Editor) By: Ross Mills.  Sed is an acronym for stream editor  Instead of altering the original file, sed is used to scan the input file line.
Introduction to Unix – CS 21 Lecture 6. Lecture Overview Homework questions More on wildcards Regular expressions Using grep Quiz #1.
Agenda Regular Expressions (Appendix A in Text) –Definition / Purpose –Commands that Use Regular Expressions –Using Regular Expressions –Using the Replacement.
Agenda Review C++ Library Functions Review User Input Making your own functions Exam #1 Next Week Reading: Chapter 3.
Quiz 15 minutes Open note, open book, open computer Finding the answer – working to get it – is what helps you learn I don’t care how you find the answer,
Regular Expressions.
Regular Expression Dr. Tran, Van Hoai Faculty of Computer Science and Engineering HCMC Uni. of Technology
Introduction to Unix – CS 21 Lecture 12. Lecture Overview A few more bash programming tricks The here document Trapping signals in bash cut and tr sed.
Review Please hand in your practicals and homework Regular Expressions with grep.
REGEX. Problems Have big text file, want to extract data – Phone numbers (503)
Overview A regular expression defines a search pattern for strings. Regular expressions can be used to search, edit and manipulate text. The pattern defined.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Appendix A: Regular Expressions It’s All Greek to Me.
Introduction to sed. Sed : a “S tream ED itor ” What is Sed ?  A “non-interactive” text editor that is called from the unix command line.  Input text.
May 2008CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
Regular Expressions CS 2204 Class meeting 6 Created by Doug Bowman, 2001 Modified by Mir Farooq Ali, 2002.
1 Lecture 9 Shell Programming – Command substitution Regular expressions and grep Use of exit, for loop and expr commands COP 3353 Introduction to UNIX.
CSCI 330 UNIX and Network Programming Unit IV Shell, Part 2.
Validation using Regular Expressions. Regular Expression Instead of asking if user input has some particular value, sometimes you want to know if it follows.
Finding Things Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
CGS – 4854 Summer 2012 Web Site Construction and Management Instructor: Francisco R. Ortega Chapter 5 Regular Expressions.
ORAFACT Text Processing. ORAFACT Searching Inside Files grep - searches for patterns within files grep [options] [[-e] pattern] filename [...] -n shows.
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
Pattern Matching: Simple Patterns. Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that.
OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.
May 2006CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
Lesson 4 String Manipulation. Lesson 4 In many applications you will need to do some kind of manipulation or parsing of strings, whether you are Attempting.
Regular Expressions Copyright Doug Maxwell (
Regular Expressions ICCM 2017
Lecture 9 Shell Programming – Command substitution
CSE 390a Lecture 7 Regular expressions, egrep, and sed
Introduction to computing
Advanced Find and Replace with Regular Expressions
CSE 390a Lecture 7 Regular expressions, egrep, and sed
Regular expressions, egrep, and sed
Regular Expressions
Regular expressions, egrep, and sed
Regular expressions, egrep, and sed
Regular expressions, egrep, and sed
1.5 Regular Expressions (REs)
Regular Expressions grep Familiy of Commands
Regular expressions, egrep, and sed
CSE 390a Lecture 7 Regular expressions, egrep, and sed
Lab 8: Regular Expressions
REGEX.
ADVANCE FIND & REPLACE WITH REGULAR EXPRESSIONS
Presentation transcript:

Quiz 30 minutes 10 questions No talking, texting, collaboration, etc…

Review Please turn in your homework and practicals Regular Expressions

To generate #1 albums, ‘jay --help‘ recommends the -z flag

You’ve Already Used Them grep -i ‘documentroot’ httpd.conf Our command is grep The flag/option is –i for case insensitive httpd.conf is our file (Apache config file) ‘documentroot’ is our regex Called a ‘static string’ – means it doesn’t’ change

Regular Expression A regular expression (or regex) is “a sequence of characters that forms a search pattern” Awesome, thanks again Wikipedia, you’re always so descriptive It’s a set of characters with special meaning to capture another set of characters and either print them out, or modify them (substitute, etc…) Today we’re only looking at printing them out – matching

Static Strings Most simple regex grep -i ‘documentroot’ httpd.conf grep ‘error’ /var/log/messages What you’re matching on is one exact thing Also, I’m only using grep, we’ll get into other utilities later

Non-Static Strings More complex Uses ‘metacharacters’ to perform ‘abstraction’ Metacharacters – a ‘known’ set of characters like * or [ or + or. Abstraction – a way of referencing a more general group than what is explicitly stated grep [st]+ httpd.conf [ ] and + are metacharacters

Metacharacters [ ] indicates a single character range [a-z] would be any lowercase letter – grep [a-z] teams.txt [aeiou] would be any vowel [0-9] is any number Single-character range – grep ‘[Ss]eattle’ teams.txt – [hl][io] would match hi, ho, li, lo Not hl or io

Metacharacter Placement ^ is the beginning of the line – grep ‘^Seattle’ teams.txt – Case-sensitive (S not s) $ is the end of the line – grep ‘Bears$’ teams.txt

Your Turn grep ‘[Ss]ea’ teams.txt grep ‘^Rodgers’ anyfile.txt grep ‘horrible$’ anyfile.txt grep ‘[JFMASOND][aepuco][nbrynlgptvc]’ dates.txt

What The…? ‘[JFMASOND][aepuco][nbrynlgptvc]’ Regex’s get wonky quickly Keep It Simple This is too complex, but we still have to read it So break it down left to right [] matches what?

A Little More Sense ‘[JFMASOND][aepuco][nbrynlgptvc]’ So it will match J or F or M or A or S, etc… Followed by a or e or p or… Ja or Je, Fa or Fe Oh, and n or b or r or y Jan or Jen or Fan or Feb Names? Or something else?

Confusion Regins!. is any single character (not letter) – grep ‘^.b’ would match anything that had be as the second character in the line + and * are “multiples” – In the shell * is a wildcard, NOT in regex’s! – + means “at least one of” whatever came before it – * means “0 or more of” whatever came before it – grep ‘t+’ httpd.conf – grep ‘Bears*’ teams.txt.* is a common way of saying “keep going” – grep ‘^Rodgers.*horrible$’ anyfile.txt

One last bit of confusion Inside the [] the ^ means “not” So [^ab] means any character that’s not a or b ^[ab] means when a or b is at the beginning of the line grep ‘^[^A-Z]’ teams.txt – Any line that does not start with a capital letter

Escape the Regex (But Not) What if we wanted to search for a [ or. character? We would have to ‘escape’ it with \ grep ‘\[‘ pslist – root :38 ? 00:00:01 [flush-253:0] So how would I search for a dollar amount?

Case Study We have a text file full of addresses First, most obvious What if we added the.com domain? We can have.’s in the first part, but let’s say none in the second.

Follow-Up With every regex think – Can I do this easier another way? Just because you can use regex’s doesn’t mean you should – Is my regex as simple as possible? Know the limitations, no regex is perfect, but a lot of them are over-complicated – How is my data formatted? If it’s “regular” data in exactly columns with values in each spot use awk (next Weds)

Own Study Regular Expressions