Editing Tons of Text? RegEx to the Rescue! Eric Cressey Senior UX Content Writer Symantec Corporation.

Slides:



Advertisements
Similar presentations
Symantec Education Skills Assessment SESA 3.0 Feature Showcase
Advertisements

IT Analytics for Symantec Endpoint Protection
SpreadsheetML Basics.
Html: getting started HTML is hyper text markup language. It is what web browsers look at on the Internet. HTML documents should be created in a simple.
Cascading Style Sheets
1 Online Self-Defense: Avoiding Scams Chau Mai December 5, 2013.
Web Pages and Style Sheets Bert Wachsmuth. HTML versus XHTML XHTML is a stricter version of HTML: HTML + stricter rules = XHTML. XHTML Rule violations:
Web Page Development Identify elements of a Web Page Start Notepad
1 of 6 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2007 Microsoft Corporation.
Symantec Vision and Strategy for the Information-Centric Enterprise Muhamed Bavçiç Senior Technology Consultant SEE.
Create a Web Site with Frames
XP Tutorial 5New Perspectives on HTML, XHTML, and DHTML, Comprehensive 1 Designing a Web Site with Frames Using Frames to Display Multiple Web Pages Tutorial.
Filters using Regular Expressions grep: Searching a Pattern.
Last Updated March 2006 Slide 1 Regular Expressions.
XML introduction to Ahmed I. Deeb Dr. Anwar Mousa  presenter  instructor University Of Palestine-2009.
Regular Expression Darby Tien-Hao Chang (a.k.a. dirty) Department of Electrical Engineering, National Cheng Kung University.
 Text Manipulation and Data Collection. General Programming Practice Find a string within a text Find a string ‘man’ from a ‘A successful man’
INFO 320 Server Technology I Week 7 Regular expressions 1INFO 320 week 7.
Symantec Managed Security Services The Power To Protect Duncan Evans Director, Cyber Security Services 1.
1.  Describe the anatomy of a web page  Format the body of a web page with block-level elements including headings, paragraphs, lists, and blockquotes.
V0.1 BlackBerry HTML5/WebWorks Applications for the BlackBerry ® PlayBook™ Tablet BlackBerry Academic Program Module 5 - Writing HTML5/WebWorks API Extensions.
IPC144 Introduction to Programming Using C Week 1 – Lesson 2
1 Safely Using Shared Computers Amanda Grady December 2013.
15.1 Fundamentals of HTML.
Website Design: Creating Your Pages and Finishing Your Website Tuesday, February 22 nd -Tuesday, March 1 st.
Unit 2, cont. September 12 More HTML. Attributes Some tags are modifiable with attributes This changes the way a tag behaves Modifying a tag requires.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
Type presentation name here in slide master © 2007 SDL. Company Confidential. Forward-looking information is based upon multiple assumptions and uncertainties.
GPS 2011 Slide - 1 MS CERT KIT Microworld Nova. GPS 2011 Slide - 2 Presentation of Microworld Nova The MS Cert Kit MS Cert Kit presentation The backend.
CS 330 Programming Languages 10 / 07 / 2008 Instructor: Michael Eckmann.
Quick Thoughts on PGP Use Cases for KMIP 1 Michael Allen Sr. Technical Director.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn
Tutorial 13 Validating Documents with Schemas
1 Tutorial 12 Working with Namespaces Combining XML Vocabularies in a Compound Document.
XP New Perspectives on XML, 2 nd Edition Tutorial 8 1 TUTORIAL 8 CREATING ELEMENT GROUPS.
20-753: Fundamentals of Web Programming 1 Lecture 10: Server-Side Scripting II Fundamentals of Web Programming Lecture 10: Server-Side Scripting II.
Click to add Presentation Title Arial 32, 5 line max title space line 3, title space line 4, title space line 5 Presenter Title Organization Insert your.
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
Type presentation name here in slide master © 2007 SDL. Company Confidential. Forward-looking information is based upon multiple assumptions and uncertainties.
Copyright © Curt Hill Regular Expressions Providing a Search Pattern.
Student Pages
Data Collection and Web Crawling. Overview Data intensive applications are likely to powered by some databases. How do you get the data in your database?
Installation of Storage Foundation for Windows High Availability 5.1 SP2 1 Daniel Schnack Principle Technical Support Engineer.
Unit 11 –Reglar Expressions Instructor: Brent Presley.
HTML Links HTML uses a hyperlink to another document on the Web.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. ADVANCED.
Copy to Tape TOI. 2 Copy to Tape TOI Agenda Overview1 Technical Feature Implementation2 Q&A3.
Shared Engineering Services APJ Ghostdetect ver 1.0 for SPC Donghyun Seo Dec 12, 2008.
Optimized Synthetics 1 OpenStorage Optimized Synthetics.
Type presentation name here in slide master © 2007 SDL. Company Confidential. Forward-looking information is based upon multiple assumptions and uncertainties.
Partner Proctored Assessment Registration Process Ajit Jha 1 Partner Assessment.
OST Virtual Synthetics 1. Synthetics Overvier Definitions – Catalog – Image – Extent Process Overview (today) – Extent map derivation – Read agenda –
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
Cyber Security in the Post-AV Era Amit Mital Chief Technology Officer General Manager, Emerging Endpoints Business Unit.
CS 330 Programming Languages 09 / 30 / 2008 Instructor: Michael Eckmann.
APIs related to NBU AIR Feature 1 OST APIs Related to NBU AIR Feature.
Maximize Profits Through Stronger Security Brook Chelmo Product Marketing
1 Designing and using normalization rules Yoel Kortick Senior Librarian, Ex Libris.
Regular Expressions In Javascript cosc What Do They Do? Does pattern matching on text We use the term “string” to indicate the text that the regular.
David Hatten Developer, UrbanCode 17 October 2013
RE Tutorial.
Defining and using an external search profile with multiple targets for copy cataloging Yoel Kortick Senior Librarian Alma Product Management.
Types of Search Questions
Advanced Regular Expressions
Editing Tons of Text? RegEx to the Rescue!
Arrays and files BIS1523 – Lecture 15.
Microsoft Build /10/2018 1:35 AM © 2016 Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY,
ISC440: Web Programming 2 Server-side Scripting PHP 3
Advanced Find and Replace with Regular Expressions
Presentation transcript:

Editing Tons of Text? RegEx to the Rescue! Eric Cressey Senior UX Content Writer Symantec Corporation

Why regular expressions?

Content maintenance adds up Properties and source code XML, HTML, structured content Localization Bugs

Sometimes we solve big problems Remove legacy HTML from 10,000+ page Flare project – One week of work instead of more than a month – Saved 4 weeks of work Update KB URLs in 20,000+ s and files – Two weeks of work instead of months – Saved 10 weeks of work across multiple departments – No errors or missed references

Agenda 1.Basics 2.Syntax and examples 3.Tips for massive projects

Encountering regex in the wild can be scary ^[2-9]\d{2}-\d{3}-\d{4}$ ^#?([a-f0-9]{6}|[a-f0-9]{3})$ ^(?:(?:25[0-5]|2[0-4][0- 9]|[01]?[0-9][0- 9]?)\.){3}(?:25[0-5]|2[0-4][0- 9]|[01]?[0-9][0-9]?)$

What are they? Searches and regular expressions find patterns in text Add logic, precision, and flexibility to searches

Why use them?

Are there prerequisites? Your text editor must support regular expressions

How do you make them? 1.Start with the text you’re looking for 2.Identify a pattern 3.Add special characters 4.Test the regular expression to see if it matches what you want

Best practices 1.Use version control 2.Use a basic text editor 3.Test before replacing 4.Test again before committing

Syntax and examples Copyright © 2014 Symantec Corporation 12 1Dealing with variance 2Using positional context 3Matching unknown content 4Putting it all together: HTML patterns

Matching name variations

Start with what you know Regex Eric Text to match Eric Erik

Identify a pattern Regex Eri Text to match Eric Erik

Add special character and syntax Regex Eri[ck] Square braces define a set of allowed characters Text to match Eric Erik

This pattern also works Regex Eri(c|k) Or (Eric|Erik) Parenthesis group content together The pipe specifies OR logic Text to match Eric Erik

Matching URL variations

Start with what you know Regex Include optional content Text to match symantec.com

Identify a pattern in the text you want to match symantec.com

Escape special characters with a backslash Regex Text to match symantec.com

Add groups to logical sections with parentheses Regex ( Text to match symantec.com

Indicate number of times to match each group Regex (https?:\/\/)?(www\.)?symantec\.co m +, *, or ? specifies how many times to match a group or character + one or more * zero or more ? zero or one Text to match symantec.com

Find first name when followed by last name

Start with what you know Regex Eric Text to match Eric Eric Creasey Eric Cressey Eric C

Add special characters and syntax Regex Eric(?= Cressey) (?=) is a positive lookahead. Eric is returned only if the next characters match the lookahead content Text to match Eric Eric Creasey Eric Cressey Eric C

How do positive lookaheads work? Eric(?= Cressey) 1.Finds “Eric” as usual 2.Evaluates the following content to see if it matches the lookahead content 3.If the content is the same, “Eric” is a match Eric Eric Creasey Eric Cressey Eric C

Find first name not followed by last name

There are negative lookaheads Regex Eric(?! Cressey) (?!) is a negative lookahead Eric is matched if the next characters do not match the lookahead content Text to match Eric Eric Creasey Eric Cressey Eric C

Find last name when it follows first name

There are also lookbehinds Regex (?<=Eric )Cressey (?<=) is a positive lookbehind Cressey is matched if the previous characters match the lookbehind content Text to match Eric Eric Creasey Eric Cressey Eric C Erik Cressey

How do positive lookbehinds work? (?<=Eric )Cressey 1.Evaluates each character to see if it follows “Eric ” 2.It gets to “C” and then evaluates the rest of the expression 3.Only the match outside the lookbehind is returned Eric Eric Creasey Eric Cressey Eric C Erik Cressey

Find last name when it doesn’t follow first name

There are negative lookbehinds Regex (?<!Eric )Cressey (?<!) is a negative lookbehind Cressey is matched if the previous characters do not match the lookbehind content Text to match Eric Eric Creasey Eric Cressey Eric C Bill Cressey Cressey

Get the value for a given string ID

Start with what you know Regex stringID= Text to match stringID=Hello, world! stringID= 안녕하세요, 세계 stringID=Hola món

Add special characters and syntax Regex (?<=stringID=).* Positive lookbehind means content must follow the string ID. (period) matches any character * is greedy and matches the previous character as many times as possible Text to match stringID=Hello, world! stringID= 안녕하세요, 세계 stringID=Hola món

Make sure your ampersands are encoded

Start with what you know Regex & Text to match & &

Add special characters and syntax Regex &(?!amp;) Only matches ampersand when not followed by amp; Useful if you don’t want to replace all occurrences Text to match & &

Get the content in an HTML tag

Start with what you know Regex.* Text to match Hello, world This is an example

Add special characters and syntax Regex (? ).*(?= ) You can use lookaheads and lookbehinds together Text to match Hello, world This is an example

Get a paragraph with a specific class

Start with what you know Regex Text to match Hello, world This is the second paragraph

Add syntax to match unknown content Regex.* Greedy matches return the longest match Text to match Hello, world This is the second paragraph

Temper greedy matches Regex.*? *? Lazy matches return the shortest match Text to match Hello, world This is the second paragraph

Get a paragraph based on one of many attributes

Use lazy matches to fill in unknown content Regex.*? Text to match Hello, world This is the second paragraph Goodbye

Multi-line replacements

Sometimes you want to insert multiple lines of text Find Hello Text to match Hello Replace with Hello Hi What’s up

You can use whitespace special characters in replacement text Find Hello Result Hello Hi What’s up Replace with Hello\nHi\nWhat’s up

Add tags around content

You can reference groups in replacement text Regex (.*?) Text to match This sentence has some legacy content we want to replace. Replacement $1 Updated text This sentence has some legacy content we want to replace.

Updating URLs

Groups are numbered sequentially Text to match Replacement Updated text Regex

Let’s recap. Here’s what we’ve learned so far. Groups – OR logic – Using groups in replacement text Lookaheads and lookbehinds Special characters – frequency (*,+,?) – newlines (\n) – any character (.). – escape with backslash (\) if necessary

Tips for massive projects

The manual approach doesn’t scale well when… Multiple regex operations are needed Regex must be applied in a specific order You need to match a pattern within a pattern You are working with many files in many directories

Steps for manually editing files in a directory 1.Get all files in a directory 2.For each file: – If the extension is.properties,.xml, or.txt 1.Get the text. 2.Use regex to find and update URLs. 3.Save the file. 3.For each directory: 1. Repeat directory steps above.

Pseudo code for programmatically editing files Get all files in a directory For each file in directory If the extension is.properties,.xml, or.txt Get the text Use regex to find and update URLs Save the file For each directory, repeat directory steps above

Benefits of the programmatic approach Write each regex once You can perform them in a specific order Agile! Easy to update the program when requirements evolve Easy to test and iterate

You don’t have to start from scratch Get my basic program on GitHubGitHub Add regular expressions Visit eric.cressey.org for helpful resourceseric.cressey.org Feel free to ask me if you have questions

Takeaways If there’s a pattern, use regular expressions You only need to know a small part of regex syntax to automate most repetitive tasks You can save days or weeks of time on large projects

Resources Notepad++ - free text editor with regex support regex101.com - great for writing and testing your regex regex101.com eric.cressey.org - more regex tutorials eric.cressey.org

Thank you! Copyright © 2015 Symantec Corporation. All rights reserved. Symantec and the Symantec Logo are trademarks or registered trademarks of Symantec Corporation or its affiliates in the U.S. and other countries. Other names may be trademarks of their respective owners. This document is provided for informational purposes only and is not intended as advertising. All warranties relating to the information in this document, either express or implied, are disclaimed to the maximum extent allowed by law. The information in this document is subject to change without notice. Eric Cressey