An Intro to Regex in R Alan Wu.

Slides:



Advertisements
Similar presentations
Specifying Languages Our aim is to be able to specify languages for use in the computer. The sketch of the FSA is easy for us to understand, but difficult.
Advertisements

Regular Expressions BKF03 Brian Ciccolo. Agenda Definition Uses – within Aspen and beyond Matching Replacing.
2-1. Today’s Lecture Review Chapter 4 Go over exercises.
Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) –Given tokens specified as regular expressions, Lex automatically generates a routine.
Python: Regular Expressions
Comparing Numeric Values If Val(Text1.Text) = MaxPrice Then (Is the current numeric value stored in the Text property of Text1 equal to the value stored.
Regular Expressions In ColdFusion and Studio. Definitions String - Any collection of 0 or more characters. Example: “This is a String” SubString - A segment.
IS 1181 IS 118 Introduction to Development Tools Chapter 4 String Manipulation and Regular Expressions.
Regular Expressions Comp 2400: Fall 2008 Prof. Chris GauthierDickey.
Regex Wildcards on steroids. Regular Expressions You’ve likely used the wildcard in windows search or coding (*), regular expressions take this to the.
Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies Knowledge Engineering : Systems Integration : Web.
Last Updated March 2006 Slide 1 Regular Expressions.
Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. address.
RegExp. Regular Expression A regular expression is a certain way to describe a pattern of characters. Pattern-matching or keyword search. Regular expressions.
Introduction to Unix – CS 21 Lecture 6. Lecture Overview Homework questions More on wildcards Regular expressions Using grep Quiz #1.
Python Regular Expressions Easy text processing. Regular Expression  A way of identifying certain String patterns  Formally, a RE is:  a letter or.
Quiz 30 minutes 10 questions No talking, texting, collaboration, etc…
BY Sandeep Kumar Gampa.. What is Regular Expression? Regex in.NET Regex Language Elements Examples Regular Expression API How to Test regex in.NET Conclusion.
Examples of comparing strings. “ABC” = “ABC”? yes “ABC” = “ ABC”? No! note the space up front “ABC” = “abc” ? No! Totally different letters “ABC” = “ABCD”?
REGEX. Problems Have big text file, want to extract data – Phone numbers (503)
Overview A regular expression defines a search pattern for strings. Regular expressions can be used to search, edit and manipulate text. The pattern defined.
1 Course Overview PART I: overview material 1Introduction 2Language processors (tombstone diagrams, bootstrapping) 3Architecture of a compiler PART II:
©Brooks/Cole, 2001 Chapter 9 Regular Expressions ( 정규수식 )
Regular Expressions This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. To view a copy of this.
Sys Prog & Scrip - Heriot Watt Univ 1 Systems Programming & Scripting Lecture 12: Introduction to Scripting & Regular Expressions.
Perl Day 4. Fuzzy Matches We know about eq and ne, but they only match things exactly We know about eq and ne, but they only match things exactly –Sometimes.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. ADVANCED.
Introduction to Programming the WWW I CMSC Winter 2004 Lecture 13.
An Introduction to Regular Expressions Specifying a Pattern that a String must meet.
Variable Variables A variable variable has as its value the name of another variable without $ prefix E.g., if we have $addr, might have a statement $tmp.
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.
Lesson 4 String Manipulation. Lesson 4 In many applications you will need to do some kind of manipulation or parsing of strings, whether you are Attempting.
Regular Expressions In Javascript cosc What Do They Do? Does pattern matching on text We use the term “string” to indicate the text that the regular.
Parallel embedded system design lab 이청용 Chapter 2 (2.6~2.7)
Hands-on Regular Expressions Simple rules for powerful changes.
Regular Expressions.
Regular Expressions Upsorn Praphamontripong CS 1110
String Methods Programming Guides.
Regular expressions, egrep, and sed
Perl Regular Expression in SAS
Regular expressions, egrep, and sed
Lecture 19 Strings and Regular Expressions
Advanced Regular Expressions
Regular Expressions (RegEx)
Regular expressions, egrep, and sed
Chapter 19 PHP Part II Credits: Parts of the slides are based on slides created by textbook authors, P.J. Deitel and H. M. Deitel by Prentice Hall ©
CSE 390a Lecture 7 Regular expressions, egrep, and sed
Topics in Linguistics ENG 331
Theory of Computation Languages.
CSE 390a Lecture 7 Regular expressions, egrep, and sed
Selenium WebDriver Web Test Tool Training
CS 1111 Introduction to Programming Fall 2018
Regular expressions, egrep, and sed
Regular Expressions
Regular expressions, egrep, and sed
Python Programming Language
Regular expressions, egrep, and sed
CSE 303 Concepts and Tools for Software Development
Regular expressions, egrep, and sed
String Processing 1 MIS 3406 Department of MIS Fox School of Business
Lecture 25: Regular Expressions
Regular expressions, egrep, and sed
Regular expressions, egrep, and sed
CSE 390a Lecture 7 Regular expressions, egrep, and sed
Lexical Elements & Operators
Regular Expression: Pattern Matching
Nate Brunelle Today: Regular Expressions
REGEX.
Lecture 23: Regular Expressions
Presentation transcript:

An Intro to Regex in R Alan Wu

What is regex? “Regular expression” Sequence of characters that define a search pattern ^\(\d{3}\)\s\d{3}-\d{4} (###) ###-#### “Regular” refers to a language being definable by certain regular rules Regex is short form for regular expression What a regular expression is, is a sequence of characters that define a search pattern

What should I regex? Strings The whole point of regex is for you to more easily characterize strings. Anything that is composed of characters, digits, or punctuation is the target of regex.

When should I use regex Use for Multiple, related conditions or nested if/else Stringr/search functions str_detect(), grepl(): Identify match to a pattern str_extract(): extract match to a pattern str_locate(): locate position of pattern str_replace(), gsub(): replace a pattern str_split(): split a string using a pattern Don’t use it when something simpler will suffice Use should use regex when you have multiple conditions that are related or nested What do I mean by related? Related mean that they are close variations of each other, like misspellings in a text or predictable variations like study ids between a certain number Obviously if it’s a very straightforward if/then or contains/grepl use that

Why should I regex? Flexible and structured “Rename visit IDs with either between one and three spaces in their names” “Instances of ‘Pred’, ‘Prad’, ‘Pren’...” “Find all study IDs that start with a 9 and end with a 0” Clearer and easier to understand (when done right) Cleaner code Easier to change/adapt Reproducibility If you’ve ever had to search for obscure, yet predictable patterns, regex is what you want The following are examples of times in just the last year that I’ve had very specific but predictable requests If these ever were to require editting, it would be super easy to edit the regex without having to rewrite a bunch of new code It’s easier to understand for YOU, because you can read it (and you should comment it), but it’s also easy to explain to someone what you’re doing instead of trying to decipher. This is because it reduces your code length and makes your script shorter to type and replicate Like everything else there are times when it’s appropriate to use and that’s when it’s easier for you and your audience. Obviously you are your own audience 95% of the time, so you’re the best judge of that Naturally because of this your code is easier to change and adapt

I’m sold. How do I regex? Regex Meaning Example Strings detected ^ Start of a string ^abc abc, abcdefg, abc123 $ End of a string abc$ abc, defgabc, 123abc . Any character a.c abc, acc, a1c | Alternation apple|orange apple, orange {...} Explicit quantifier ab{2}c abbc [...] Explicit character match a[bB]c abc, aBc (...) Expression grouping (abc){2} abcabc

I’m sold. How do I regex? Regex Meaning Example Strings detected \ Escape key for special characters \$ $32.50, pa$$word \s Any white-space [bB]\s[cd] B c, b d \w Any word ([aeiou])\w+ alien, elephant, igloo \d Any digit (study_id)\d+ study_id5, study_id1 \S, \W, \D The converse of the previous three (study_id)\D+ study_idA, study_idf

Looking ahead More advanced tactics include: Lookarounds: (?<= … ) “Bradley Cooper” vs. “Cooper Hewitt” Conditionals: (? (A)B|C) If A is true, match with B; else match with C Omitting: s\Kt Text matched by left of \K is omitted, so only the first “t” in “streets” would be captured

I’m sold. How do I regex? Regex Meaning Example Strings detected [a-z] or [:lower:] Lowercase letters [^ABZ[:lower:]] A, B, C, D, c, d [:upper:] Uppercase letters [ABZ[:upper:]] A, B [a-zA-Z] or [:alpha:] Letters [a-zA-Z]\w+ Alabama, alabama [:digit:] Digits a[[:digit:]]b a0b, a1b, a2b [:alnum:] Alphanumeric [[:alnum:]]\w+ A1b2n, 2bdiC3D, [:punct:] punctuation end[[:punct:]] end., end,, end!

Let’s look at some code