Quiz 30 minutes 10 questions No talking, texting, collaboration, etc…
Review Please turn in your homework and practicals Regular Expressions
To generate #1 albums, ‘jay --help‘ recommends the -z flag
You’ve Already Used Them grep -i ‘documentroot’ httpd.conf Our command is grep The flag/option is –i for case insensitive httpd.conf is our file (Apache config file) ‘documentroot’ is our regex Called a ‘static string’ – means it doesn’t’ change
Regular Expression A regular expression (or regex) is “a sequence of characters that forms a search pattern” Awesome, thanks again Wikipedia, you’re always so descriptive It’s a set of characters with special meaning to capture another set of characters and either print them out, or modify them (substitute, etc…) Today we’re only looking at printing them out – matching
Static Strings Most simple regex grep -i ‘documentroot’ httpd.conf grep ‘error’ /var/log/messages What you’re matching on is one exact thing Also, I’m only using grep, we’ll get into other utilities later
Non-Static Strings More complex Uses ‘metacharacters’ to perform ‘abstraction’ Metacharacters – a ‘known’ set of characters like * or [ or + or. Abstraction – a way of referencing a more general group than what is explicitly stated grep [st]+ httpd.conf [ ] and + are metacharacters
Metacharacters [ ] indicates a single character range [a-z] would be any lowercase letter – grep [a-z] teams.txt [aeiou] would be any vowel [0-9] is any number Single-character range – grep ‘[Ss]eattle’ teams.txt – [hl][io] would match hi, ho, li, lo Not hl or io
Metacharacter Placement ^ is the beginning of the line – grep ‘^Seattle’ teams.txt – Case-sensitive (S not s) $ is the end of the line – grep ‘Bears$’ teams.txt
Your Turn grep ‘[Ss]ea’ teams.txt grep ‘^Rodgers’ anyfile.txt grep ‘horrible$’ anyfile.txt grep ‘[JFMASOND][aepuco][nbrynlgptvc]’ dates.txt
What The…? ‘[JFMASOND][aepuco][nbrynlgptvc]’ Regex’s get wonky quickly Keep It Simple This is too complex, but we still have to read it So break it down left to right [] matches what?
A Little More Sense ‘[JFMASOND][aepuco][nbrynlgptvc]’ So it will match J or F or M or A or S, etc… Followed by a or e or p or… Ja or Je, Fa or Fe Oh, and n or b or r or y Jan or Jen or Fan or Feb Names? Or something else?
Confusion Regins!. is any single character (not letter) – grep ‘^.b’ would match anything that had be as the second character in the line + and * are “multiples” – In the shell * is a wildcard, NOT in regex’s! – + means “at least one of” whatever came before it – * means “0 or more of” whatever came before it – grep ‘t+’ httpd.conf – grep ‘Bears*’ teams.txt.* is a common way of saying “keep going” – grep ‘^Rodgers.*horrible$’ anyfile.txt
One last bit of confusion Inside the [] the ^ means “not” So [^ab] means any character that’s not a or b ^[ab] means when a or b is at the beginning of the line grep ‘^[^A-Z]’ teams.txt – Any line that does not start with a capital letter
Escape the Regex (But Not) What if we wanted to search for a [ or. character? We would have to ‘escape’ it with \ grep ‘\[‘ pslist – root :38 ? 00:00:01 [flush-253:0] So how would I search for a dollar amount?
Case Study We have a text file full of addresses First, most obvious What if we added the.com domain? We can have.’s in the first part, but let’s say none in the second.
Follow-Up With every regex think – Can I do this easier another way? Just because you can use regex’s doesn’t mean you should – Is my regex as simple as possible? Know the limitations, no regex is perfect, but a lot of them are over-complicated – How is my data formatted? If it’s “regular” data in exactly columns with values in each spot use awk (next Weds)
Own Study Regular Expressions