Presentation is loading. Please wait.

Presentation is loading. Please wait.

-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.

Similar presentations


Presentation on theme: "-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman."— Presentation transcript:

1 -Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.

2  Regular expressions are all over the place.  All syntaxes are almost identical, but for what it’s worth I will be using the syntax tied to Unix systems.

3  In computer science, regular expressions are used to locate strings based on a pattern.  Search for every email address in a file? Regular expressions make it easy.  Often referred to as regexp, regex, etc.  For instance: A phone number is three digits, followed by a dash, three digits, a dash, and then four digits (“555-555- 5555”). You can make a regular expression which matches all phone numbers by indicating that pattern.  For reference: [0-9]{3}-[0-9]{3}-[0-9]{4}

4  Consider values of currency. How could you describe in English any/all monetary values (including cents) with a single pattern?  ie. $50.25

5  Consider values of currency. How could you describe in English any/all monetary values (including cents) with a single pattern?  ie. $50.25  Dollar sign, any number of digits, period, two digits.

6  Consider values of currency. How could you describe in English any/all monetary values (including cents) with a single pattern?  ie. $50.25  Dollar sign, any number of digits, period, two digits.  For reference: \$[0-9]+\.[0-9]{2}

7  Using it to showcase regular expressions. ◦ It actually stands for Global Regular Expression Parser  A command available on most if not all Unix- like systems.  Seem to be incredibly popular command for system administrators.

8  grep is used to do text-based searching, generally on the Linux command line or in scripting  Takes two arguments  Generic format: grep STRING FILE  It prints every line of FILE that has STRING in it.  Example: grep root /etc/passwd ◦ »Prints out all lines in the /etc/passwd file that contain the string "root"

9  The contents of /etc/passwd

10  grep ‘root’ /etc/passwd  What does this tell you? ◦ 2 lines contain the string “root” ◦ Highlights exactly where the string was matched

11  Grep has a number of options, and even though it’s off topic knowing some may help you understand the power of grep/regexps.  -i » ignore case  -v » negation ◦ grep –v hello filename.txt  Would return every line of filename.txt without the word hello in it.

12  How does grep use regular expressions? ◦ Again: stands for Global Regular Expression Parser  Recall the format: grep STRING FILE  The STRING is actually interpreted as a regular expression.  Note: I will be using the –E option for grep ◦ Don’t worry about it, it essentially enables all regexp functionality.

13  First thing’s first… we need a text file to search!  I’ve taken the time to make a simple text file which will help me show some simpler regular expressions.

14

15  How do you up your game from literal strings like “root”, to creating patterns? Regexs have their own syntax.  To start: parenthesis are used for grouping “or statements”.  To match one thing or something else, you group them in parenthesis and separate them with pipes. ◦ (joe|Joe) will match the string “joe” or “Joe” ◦ (hello|goodbye|sup) matches “”hello” “goodbye” or “sup”

16

17  You can specify a range of characters within brackets. ◦ For example [a-z] will match any lower case letter. ◦ [A-Z] any upper case letter ◦ [0-9] any digit

18  Now the pattern is any digit.

19  Now the pattern is digits 0 to 5.

20  You can match one thing after another. ◦ For example: [a-z][0-9] will match any lower case letter followed by a number. Now we are starting to see patterns!

21  When specifying one range or another, you don’t need a pipe. ◦ For example [a-zA-Z] will match any lower or upper case letter. ◦ [0-9a-zA-Z] will match any alphanumeric character

22  Now it’s time to get more specific. What if you want to find something that occurs multiple times in a row?  The +, *, ?, and {} special characters specify how many times you want the pattern directly in front of them to occur. ◦ Ex. [a-zA-Z]+ ◦ The + modifies the grouping in front of it

23  + » one or more instances ◦ [a-zA-Z]+ would match any string of lower/upper case letters at least 1 letter long.  * » zero or more instances ◦ [0-9]* would match any number of digits, or none at all.  ? » zero or one instance (aka optional) ◦ [a-zA-Z]+ would match a single letter or none at all.  [a-z]+[0-9]*[A-Z]? ◦ ade7E ◦ cpB ◦ F12CP X ◦ Please ask questions here if you’re confused!

24  {} » specific or range ◦ {3} or {4,7} ◦ ‘[0-9]{3}-[0-9]{3}-[0-9]{4}’ for a phone number

25  Now we can make a regular expression that matches emails!  Let’s try now…

26  Now we can make a regular expression that matches emails!  Let’s try now…  Any alphanumeric sequence, @, any alphabetical sequence,., any lower case sequence

27  Now we can make a regular expression that matches emails!  Let’s try now…  Any alphanumeric sequence, @, any alphabetical sequence,., any lower case sequence  ‘[a-zA-Z0-9]+’

28  Now we can make a regular expression that matches emails!  Let’s try now…  Any alphanumeric sequence, @, any alphabetical sequence,., any lower case sequence  ‘[a-zA-Z0-9]+@’

29  Now we can make a regular expression that matches emails!  Let’s try now…  Any alphanumeric sequence, @, any alphabetical sequence,., any lower case sequence  ‘[a-zA-Z0-9]+@[a-zA-Z]+’

30  Now we can make a regular expression that matches emails!  Let’s try now…  Any alphanumeric sequence, @, any alphabetical sequence,., any lower case sequence  ‘[a-zA-Z0-9]+@[a-zA-Z]+.’

31  Now we can make a regular expression that matches emails!  Let’s try now…  Any alphanumeric sequence, @, any alphabetical sequence,., any lower case sequence  ‘[a-zA-Z0-9]+@[a-zA-Z]+.[a-z]+’

32

33  Weird… why did we match that third line?

34 . Is a special character which takes the place of anything.  That means ‘t.o’ would match two, too, t2o, or many other things.  That’s how it matched below. The. matched 0!

35  So how do we avoid matching weird things like j03b130@h0tma? ◦ [a-zA-Z0-9]+@[a-zA-Z]+\.[a-z]+’  You can escape special characters by putting \ in front of them. ◦ So \. means a literal period. ◦ Note: Escape \ by putting \ in front of it! \\  So \\ means a literal back slash. ◦ Double Note: the space character is matched by \s  effectively escaping the s character.

36  ^ » Indicates the start of a line

37  Notice how it didn’t match ever line with “I” in it, only the ones which start with I.

38 Vs.

39  $ » indicates end of a line

40  Syntax: ◦ ^ start line ◦ $ end line ◦ + one or more ◦ * zero or more ◦ ? zero or one ◦. replace with anything ◦ {n} n times ◦ {n, m} n to m times ◦ (string1|string2) matches string1 or string2

41  What does this match?  [0-9]{3}-[0-9]{3}-[0-9]{4}

42  What does this match?  [0-9]{3}-[0-9]{3}-[0-9]{4}  Phone numbers!

43  What does this match?  \$[0-9]+\.[0-9]{2}  Money values

44  Example: What does this match?  ‘(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)'

45  Example: What does this match?  ‘(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)‘  That actually matches valid IP addresses.  (I found it online though. Credit to SASIKALA of thegeekstuff.com)

46  Regular expressions simply indicate a pattern. What is important is that the pattern can be searched for as opposed to a literal string.  That means instead of searching for a specific phone number string input, you can search for any existing phone number with ease by matching the pattern that all phone numbers follow.

47  Common tasks that regular expressions are used for:  It finds strings that match a given syntax. ◦ -Ctrl-F, anyone? There are tools to add regular expression functionality to Ctrl-F, at least on Chrome. ◦ -Tool: Regular Expression Searcher  Once you find said strings based on the pattern, there are limitless possibilities as to what you can do with those matches.  Substitution: Replace all matching strings. ◦ -Ctrl-H (on word), anyone?  Splitting: Split strings based upon matches.


Download ppt "-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman."

Similar presentations


Ads by Google