Advanced Find and Replace with Regular Expressions

Slides:



Advertisements
Similar presentations
Session 3BBK P1 ModuleApril 2010 : [#] Regular Expressions.
Advertisements

Regular Expressions BKF03 Brian Ciccolo. Agenda Definition Uses – within Aspen and beyond Matching Replacing.
BBK P1 Module2010/11 : [‹#›] Regular Expressions.
Bioinformatics Programming 1 EE, NCKU Tien-Hao Chang (Darby Chang)
1 Chapter 2 Introduction to Java Applications Introduction Java application programming Display ____________________ Obtain information from the.
Regular Expression Original Notes by Song Guo. What Regular Expressions Are Exactly - Terminology a regular expression is a pattern describing a certain.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
Using regular expressions Search for a single occurrence of a specific string. Search for all occurrences of a string. Approximate string matching.
Regular Expressions In ColdFusion and Studio. Definitions String - Any collection of 0 or more characters. Example: “This is a String” SubString - A segment.
Scripting Languages Chapter 8 More About Regular Expressions.
CSE467/567 Computational Linguistics Carl Alphonce Computer Science & Engineering University at Buffalo.
Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies Knowledge Engineering : Systems Integration : Web.
XP Tutorial 14 New Perspectives on HTML, XHTML, and DHTML, Comprehensive 1 Working with Forms and Regular Expressions Validating a Web Form with JavaScript.
Last Updated March 2006 Slide 1 Regular Expressions.
Regular Expressions Week 07 TCNJ Web 2 Jean Chu. Regular Expressions Regular Expressions are a powerful way to validate and format text strings that may.
Language Recognizer Connecting Type 3 languages and Finite State Automata Copyright © – Curt Hill.
Regular Expression Darby Tien-Hao Chang (a.k.a. dirty) Department of Electrical Engineering, National Cheng Kung University.
 Text Manipulation and Data Collection. General Programming Practice Find a string within a text Find a string ‘man’ from a ‘A successful man’
Introduction to Computing Using Python Regular expressions Suppose we need to find all addresses in a web page How do we recognize addresses?
1 Regular Expressions CIS*2450 Advanced Programming Techniques Material for this lectures has been taken from the excellent book, Mastering Regular Expressions,
WDV 331 Dreamweaver Applications Find and Replace Dreamweaver CS6 Chapter 20.
RegExp. Regular Expression A regular expression is a certain way to describe a pattern of characters. Pattern-matching or keyword search. Regular expressions.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
New Perspectives on XML, 2nd Edition
Agenda Regular Expressions (Appendix A in Text) –Definition / Purpose –Commands that Use Regular Expressions –Using Regular Expressions –Using the Replacement.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 4. Document Search and Regular Expressions.
Regular Expressions in PHP. Supported RE’s The most important set of regex functions start with preg. These functions are a PHP wrapper around the PCRE.
REGEX. Problems Have big text file, want to extract data – Phone numbers (503)
Overview A regular expression defines a search pattern for strings. Regular expressions can be used to search, edit and manipulate text. The pattern defined.
Working with Forms and Regular Expressions Validating a Web Form with JavaScript.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn
XML 2nd EDITION Tutorial 4 Working With Schemas. XP Schemas A schema is an XML document that defines the content and structure of one or more XML documents.
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
May 2008CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
R EGULAR E XPRESSION IN P ERL (P ART 1) Thach Nguyen.
CSC 2720 Building Web Applications PHP PERL-Compatible Regular Expressions.
Created by Branden Maglio and Flynn Castellanos Team BFMMA.
Unit 11 –Reglar Expressions Instructor: Brent Presley.
CGS – 4854 Summer 2012 Web Site Construction and Management Instructor: Francisco R. Ortega Chapter 5 Regular Expressions.
Introduction to Programming the WWW I CMSC Winter 2004 Lecture 13.
#N14 Pattern Value (aka Substring attribute) SDD 1.1 Initial Discussion XXX = [Proposal | Initial Discussion | General Direction Proposal]
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.
May 2006CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
01 – HTML (1) Informatics Department Parahyangan Catholic University.
Hands-on Regular Expressions Simple rules for powerful changes.
Regular Expressions.
RE Tutorial.
Excel STDEV.S Function.
Regular Expressions Upsorn Praphamontripong CS 1110
Strings and Serialization
Looking for Patterns - Finding them with Regular Expressions
Regular Expressions (RegEx)
CSC 594 Topics in AI – Natural Language Processing
Regular Expressions and perl
Regular Expression Beihang Open Source Club.
CSC 594 Topics in AI – Natural Language Processing
Pattern Matching in Strings
WEBSITE DESIGN Chp 1
Functions, Regular expressions and Events
CS 1111 Introduction to Programming Fall 2018
Data Manipulation & Regex
Regular Expressions
CSCI The UNIX System Regular Expressions
Lecture 25: Regular Expressions
Regular Expression in Java 101
REGEX.
ADVANCE FIND & REPLACE WITH REGULAR EXPRESSIONS
Lecture 23: Regular Expressions
Presentation transcript:

Advanced Find and Replace with Regular Expressions Robert Kiffe Senior Customer Support Engineer

Agenda Review: Global Find and Replace Introduction to Regular Expressions Challenge #1 Solution Advanced Regular Expressions Challenge #2 Hands On Q & A

Global Find and Replace Location: Content > Find and Replace Administrators Only (User Level 10) Searches a single site Adjust ‘Scope’ to limit searchable content Literal Text or Regex patterns

Global Find and Replace Simple search with results list Preview Replace Safe multi-step process Perform ‘sample’ find/replace and display results list Select pages from results to perform the actual find/replace operation (Optional) Publish selected results

Regular Expressions Regular Expression A pattern that ‘describes’ a certain amount of text The concept arose in the 1950s when the American mathematician Stephen Cole Kleene formalized the description of a regular language. (Thanks Wikipedia) Now used in almost every major programming language

Literal Characters Literal Text Matches Most characters match exactly themselves Case Sensitive Robert does not like to be called robert. Robert does not like to be called robert. Robert

Special Characters Symbol characters that have special purpose (explained later) Full List: \ ^ $ . | ? * + ( ) [ { To match as literal characters, you must ‘escape’ them by adding “\” in front Rob does not like to be called Robert? Rob does not like to be called Robert? Robert\?

Special Character: Period ‘Wildcard’ Character Matches any character except newline. Robert does not like to be called oberth, Bobert, or Goobert. Robert does not like to be called oberth, Bobert, or Goobert. .obert

Special Characters: Quantifiers Symbol characters that define how many of the previous character(s) to match ? (0 or 1) * (0 or More) + (1 or More) Use Curly Brackets to indicate an exact number or range {3} (Exactly 3) {3,} (3 or More) {3,5} (3, 4, or 5) Only modifies the previous character (or group)

Special Characters: Quantifiers Quantifiers: Example ? : 0 or 1 Robert does not like to be called Roberta. Robert does not like to be called Roberta. Roberta?

Special Characters: Parenthesis Capture Groups Encapsulate a character sequence using parentheses: “(…)” Add a quantifier to affect the whole group Replace In the ‘replace field’, refer to your groups using the “dollar sign” and then the group number: $# Count the opening parenthesis characters, “(” , to determine the correct #

Special Characters: Parenthesis Capture Group: Example FIND I like https://school.edu but not https://www.school.edu. I like https://school.edu but not https://www.school.edu. https://www\.(school\.edu) REPLACE I like https://school.edu but not https://school.edu. https://$1

Challenge #1 Find All Links to a Particular Domain Problem is that it can have many formats: Root-relative “/” /about/contact.html Absolute (either protocol) http://www.gallena.com/about/contact.html https://www.gallena.com/about/contact.html No Subdomain http://gallena.com/about/contact.html Examples: <a href="/about/"> <a href="http://www.gallena.com/about/">

Challenge #1: Tips Use a quantifier (ie. ‘?’) to make a part of the URL optional a? Combine a quantifier with Parenthesis to make a substring of the URL optional (abc)?

Challenge #1: Solution Steps to Build the Regex Pattern: href="https?://www\.gallena\.com/ (HTTPS protocol) href="https?://(www\.)?gallena\.com/ (+Subdomain optional) href="(https?://(www\.)?gallena\.com)?/ (+Root-relative) Example Matches: <a href="http://www.gallena.com/about/">About</a> <a href="http://gallena.com/records/index.html">Records</a> <a href="/academics/index.html">Academics</a> <a href="https://www.gallena.com/portal/">Portal Login</a>

Special Characters: Square Brackets Character Sets Characters encased inside square brackets define all possible matches for a single text character: [abc] A quantifier placed directly after the set will affect the whole character set Placing a “-” between characters indicates a ‘range’ Placing a “^” as the first item in the set creates a ‘negative pattern’ Quantifier characters become literal matches: ? + * { } Period character becomes literal match: .

Character Sets: Examples Robert does not like to be called robert. Robert does not like to be called robert. [Rr]obert Robert does not like to be called Richard. Robert does not like to be called Richard. [A-Z][a-z]+ Robert does not like to be called Roberta. Robert does not like to be called Roberta. [^A-Z .]+

Shorthand Character Classes Certain characters can reference a range of characters when ‘escaped’ by a backslash (\) Common Examples: \d matches all digit characters: [0-9] \w matches all ‘word’ characters: [A-Za-z0-9_] \s matches all ‘space’ characters (including line breaks) Using the capital letter will ‘inverse’ the match \S matches all non-space characters: [^\s]

Character Classes: Example Jenny’s number is 867-5309. Jenny’s number is 867-5309. \d{3}-\d{4}

Greedy Matches When using quantifiers, a careless (or purposeful) pattern could match beyond an expected result Apply an extra coating of “?” after the initial quantifier, to make the pattern stop at the first successful match Robert likes dogs! Robert likes cats! Robert likes .*! Robert likes dogs! Robert likes cats! Robert likes .*?!

Challenge #2 Set External Links to Create a New Window Need to add the attribute target="_blank" Links will start with “http” or “https” Examples: <a href="http://www.omniupdate.com/">OmniUpdate</a> <a href="https://petitions.whitehouse.gov/">Petitions</a> Desired Result: <a href="http://www.omniupdate.com/" target="_blank">OmniUpdate</a> <a href="https://petitions.whitehouse.gov/" target="_blank">Petitions</a>

Challenge #2: Tips Remember lessions learned from Challenge #1 (abc)? Remember syntax requirements of HTML (or XML) HTML/XML have special characters that can only be used in certain places Use a “Not” to match any character not in the set [^abc] Use capture groups to re-place content as needed (abc) -> $1

Challenge #2: Solution Steps to Build the Regex Pattern FIND: REPLACE: <a href="http://www\.omniupdate\.com/">OmniUpdate</a> (Starting Pattern) <a\s*href="http://www\.omniupdate\.com/"\s*> (Account for whitespace) <a\s*href="https?://[^"]+"\s*> (Match any absolute URL) (<a\s*href="https?://[^"]+"\s*)> (Capture Group) REPLACE: $1 target="_blank"> (Use capture group, then end anchor tag) Example Match/Replace: <a href="http://www.omniupdate.com/about/">About</a> (Full Match) <a href="http://www.omniupdate.com/about/">About</a> (Capture) <a href="http://www.omniupdate.com/about/" target="_blank">About</a> (Replace)

Thank you. Robert Kiffe Sr. Customer Support Engineer OmniUpdate 805-484-9400 ext 223 rkiffe@omniupdate.com outc18.com/surveys