Regular Expression JavaScript Web Technology Derived from:

Slides:



Advertisements
Similar presentations
Bioinformatics Programming 1 EE, NCKU Tien-Hao Chang (Darby Chang)
Advertisements

Regular Expressions in Perl By Josue Vazquez. What are Regular Expressions? A template that either matches or doesn’t match a given string. Often called.
CSCI 330 T HE UNIX S YSTEM Regular Expressions. R EGULAR E XPRESSION A pattern of special characters used to match strings in a search Typically made.
Regular Expression Original Notes by Song Guo. What Regular Expressions Are Exactly - Terminology a regular expression is a pattern describing a certain.
Craig Schock, 2003 Binary Numbers Numbering Systems Counting Symbolic Bases Common Bases (10, 2, 8, 16) Representing Information Binary to Decimal Conversions.
Asp.NET Core Vaidation Controls. Slide 2 ASP.NET Validation Controls (Introduction) The ASP.NET validation controls can be used to validate data on the.
LING/C SC/PSYC 438/538 Computational Linguistics Sandiway Fong Lecture 2: 8/23.
LING 388: Language and Computers Sandiway Fong Lecture 2: 8/23.
LING 388: Language and Computers Sandiway Fong Lecture 3: 8/28.
Using regular expressions Search for a single occurrence of a specific string. Search for all occurrences of a string. Approximate string matching.
Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl Linux editors and commands (e.g.
Regular Expressions In ColdFusion and Studio. Definitions String - Any collection of 0 or more characters. Example: “This is a String” SubString - A segment.
Scripting Languages Chapter 8 More About Regular Expressions.
CSE467/567 Computational Linguistics Carl Alphonce Computer Science & Engineering University at Buffalo.
Characters & Strings Lesson 1 CS1313 Spring Characters & Strings Lesson 1 Outline 1.Characters & Strings Lesson 1 Outline 2.Numeric Encoding of.
Regular Expressions. String Matching The problem of finding a string that “looks kind of like …” is common  e.g. finding useful delimiters in a file,
Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies Knowledge Engineering : Systems Integration : Web.
REGULAR EXPRESSIONS CHAPTER 14. REGULAR EXPRESSIONS A coded pattern used to search for matching patterns in text strings Commonly used for data validation.
Regular Language & Expressions. Regular Language A regular language is one that a finite state machine (fsm) will accept. ‘Alphabet’: {a, b} ‘Rules’:
Last Updated March 2006 Slide 1 Regular Expressions.
Regular Expressions Week 07 TCNJ Web 2 Jean Chu. Regular Expressions Regular Expressions are a powerful way to validate and format text strings that may.
BIOS1 Basic Input Output System BIOS BIOS refers to a set of procedures or functions that enable the programmer have access to the hardware of the computer.
Language Recognizer Connecting Type 3 languages and Finite State Automata Copyright © – Curt Hill.
Regular Expression Darby Tien-Hao Chang (a.k.a. dirty) Department of Electrical Engineering, National Cheng Kung University.
System Programming Regular Expressions Regular Expressions
Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. address.
 Text Manipulation and Data Collection. General Programming Practice Find a string within a text Find a string ‘man’ from a ‘A successful man’
Digital Design: From Gates to Intelligent Machines
INFO 320 Server Technology I Week 7 Regular expressions 1INFO 320 week 7.
Informatics I101 February 25, 2003 John C. Paolillo, Instructor.
REGULAR EXPRESSIONS. Lexical Analysis Lexical analysers can be constructed by programs such as LEX These programs employ as input a description of the.
RegExp. Regular Expression A regular expression is a certain way to describe a pattern of characters. Pattern-matching or keyword search. Regular expressions.
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
1 Javascript Web and Database Management System. 2 Javascript Javascript is a scripting language –There is no need to compile the code before executing.
REGEX. Problems Have big text file, want to extract data – Phone numbers (503)
Overview A regular expression defines a search pattern for strings. Regular expressions can be used to search, edit and manipulate text. The pattern defined.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Module 6 – Generics Module 7 – Regular Expressions.
Regular Expressions What is this line all about? while (!($search =~ /^\s*$/)) { It’s a string search just like before, but with a huge twist – regular.
Appendix A: Regular Expressions It’s All Greek to Me.
CS346 Regular Expressions1 Pattern Matching Regular Expression.
May 2008CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
CSC 2720 Building Web Applications PHP PERL-Compatible Regular Expressions.
Copyright © Curt Hill Regular Expressions Providing a Search Pattern.
CIT 383: Administrative ScriptingSlide #1 CIT 383: Administrative Scripting Regular Expressions.
1 Validating user input is the bane of every software developer’s existence. When you are developing cross-browser web applications (IE4+ and NS4+) this.
1 Information Representation in Computer Lecture Nine.
CSCI 330 UNIX and Network Programming Unit IV Shell, Part 2.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. ADVANCED.
Systems Architecture, Fourth Edition 1 Data Representation Chapter 3.
CS 614: Theory and Construction of Compilers Lecture 5 Fall 2003 Department of Computer Science University of Alabama Joel Jones.
An Introduction to Regular Expressions Specifying a Pattern that a String must meet.
Chapter 4 © 2009 by Addison Wesley Longman, Inc Pattern Matching - JavaScript provides two ways to do pattern matching: 1. Using RegExp objects.
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
Programming for GCSE Topic 2.2: Binary Representation T eaching L ondon C omputing William Marsh School of Electronic Engineering and Computer Science.
May 2006CLINT-LIN Regular Expressions1 Introduction to Computational Linguistics Regular Expressions (Tutorial derived from NLTK)
ICS611 Lex Set 3. Lex and Yacc Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the.
PRIMITIVE TYPES IN JAVA Primitive Types Operations on Primitive Types.
Hands-on Regular Expressions Simple rules for powerful changes.
RE Tutorial.
Machine level representation of data Character representation
Regular Expressions Upsorn Praphamontripong CS 1110
Strings and Serialization
Looking for Patterns - Finding them with Regular Expressions
Javascript, Loops, and Encryption
ASCII Character Codes nul soh stx etx eot 1 lf vt ff cr so
- Regular expressions:
Text Representation ASCII Collating Sequence
ADVANCE FIND & REPLACE WITH REGULAR EXPRESSIONS
Presentation transcript:

Regular Expression JavaScript Web Technology Derived from:

Definitions TermsDescription Literal Any character used in search or matching expression, i.e. to find ind in windows the ind is a literal string Metacharacter One or more special characters that have a unique meaning and are NOT used as literals in the search expression, i.e. character ^ (circumflex or caret) is a metacharacter Escape Sequence A way of indicating that we want to use one of our metacharacters as a literal, using \ (backslash) in front of metacharacter, i.e. find ^ind in w^indows then we use \^ind, use C:\\file.txt to find c:\file.txt. Target String String in which we want to find our match or search pattern. Search Expression Expression that we use to search our target string, that is, the pattern we use to find what we want

Our Example Target Strings Throughout this guide we will use the following as our target strings: STR1 Mozilla/4.0 (compatible; MSIE 5.0; Windows NT; DigExt) STR2 Mozilla/4.75 [en](X11;U;Linux i586)

Simple Matching m STR1matchFinds the m in compatible STR2no matchThere is no lower case m in this. Searches are case sensitive. a/4 STR1matchFound in Mozilla/4.0 - any combination of characters can be used for the match STR2matchFound in same place as in STR1 5 [ STR1no matchThe search is looking for a pattern of '5 [' and this does NOT exist in STR1. Spaces are valid in searches. STR2matchFound in Mozilla/4.75 [en] in STR1matchfound in Windows STR2matchFound in Linux le STR1matchfound in compatible STR2no matchThere is an l and an e in this string but they are not adjacent (or contiguous). Search For

Brackets, Ranges and Negation MetacharacterMeaning [ ]Match anything inside the square brackets for one character position once and only once, i.e. [12] means match the target to either 1 or 2 while [ ] means match to any character in the range 0 to 9. -The - (dash) inside square brackets is the 'range separator' and allows us to define a range, in our example above of [ ] we could rewrite it as [0-9]. We can define more than one range inside a list e.g. [0-9A-C] means check for 0 to 9 and A to C (but not a to c). NOTE: To test for - inside brackets (as a literal) it must come first or last, that is, [-0-9] will test for - and 0 to 9. ^The ^ (circumflex or caret) inside square brackets negates the expression i.e. [^Ff] means anything except upper or lower case F and [^a-z] means everything except lower case a to z.

Brackets, Ranges and Negation in[du]STR1matchfinds ind in Windows STR2matchfinds inu in Linux x[0-9A-Z]STR1no matchThe tests are case sensitive to find the xt in DigExt we would need to use [0-9a-z] or [0-9A-Zt]. We can also use this format for testing upper and lower case e.g. [Ff] will check for lower and upper case F. STR2matchFinds x2 in Linux2 [^A-M]inSTR1matchFinds Win in Windows STR2no matchWe have excluded the range A to M in our search so Linux is not found but linux (if present) would be found. Search For

Positioning (or Anchors) MetacharacterMeaning ^The ^ outside square brackets means look only at the beginning of the target string. i.e. ^Win will not find Windows in STR1 but ^Moz will find Mozilla. $The $ (dollar) means look only at the end of the target string, i.e. fox$ will find a match in 'silver fox' but not in 'the fox jumped over the moon'..The. (period) means any character(s) in this position. i.e. ton. will find tons and tonneau but not wanton because it has no following character.

Positioning (or Anchors) [a-z]\)$ STR1match finds t) in DigiExt) Note: The \ is an escape characher and is required to treat the ) as a literal STR2no matchWe have a numeric value at the end of this string but we would need [0-9a-z]) to find it..in STR1matchFinds Win in Windows. STR2matchFinds Lin in Linux. Search For

Iteration 'metacharacters' MetacharacterMeaning ? The ? (question mark) matches the preceding character 0 or 1 times only i.e., colou?r will find both color and colour. * The * (asterisk or star) matches the preceding character 0 or more times i.e., tre* will find tree and tread and trough. + The + (plus) matches the previous character 1 or more times. i.e., tre+ will find tree and tread but not trough. {n} Matches the preceding character n times exactly. i.e., to find a local phone number we could use [0-9]{3}-[0-9]{4} which would find any number of the form Note: The - (dash) in this case, because it is outside the square brackets, is a literal. Value is enclosed in braces (curly brackets). {n,m} Matches the preceding character at least n times but not more than m times. i.e., 'ba{2,3}b' will find 'baab' and 'baaab' but NOT 'bab' or 'baaaab'. Values are enclosed in braces (curly brackets).

Iteration 'metacharacters' \(.*lSTR1match finds l in compatible (Note: The opening \ is an escape sequence used to indicate the ( it precedes is a literal not a metacharacter.) STR2no matchMozilla contains lls but not preceded by an open parenthesis (no match) and Linux has an upper case L (no match). W*inSTR1matchFinds the Win in Windows. STR2matchFinds in in Linux preceded by W zero times - so a match. [xX][0-9a-z]{2}STR1no matchFinds x in DigExt but only one t. STR2matchFinds X and 11 in X11. Search For

More 'metacharacters' MetacharacterMeaning ()The ( (open parenthesis) and ) (close parenthesis) may be used to group (or bind) parts of our search expression together - see this example.see this example |The | (vertical bar or pipe) is called alternation in techspeak and means find the left hand OR right values, for example, gr(a|e)y will find 'gray' or 'grey'.

Example: More 'metacharacters' ^([L-Z]in)STR1no match The '^' is an anchor indicating first position. Win does not start the string so no match. STR2no matchThe '^' is an anchor indicating first position. Linux does not start the string so no match. ((4\.[0-3])|(2\.[0-3]))STR1matchFinds the 4.0 in Mozilla/4.0. STR2matchFinds the 2.2 in Linux (W|L)inSTR1matchFinds Win in Windows. STR2matchFinds Lin in Linux.

POSIX Character Class Definitions ValueMeaning [:digit:]Only the digits 0 to 9 [:alnum:]Any alphanumeric character 0 to 9 OR A to Z or a to z. [:alpha:]Any alpha character A to Z or a to z. [:blank:]Space and TAB characters only. [:xdigit:]Hexadecimal notation 0-9, A-F, a-f. [:punct:]Punctuation symbols., " ' ? ! ; : # $ % & ( ) * + - / [ ] \ ^ _ { } | ~ [:print:]Any printable character. [:space:]Any whitespace characters (space, tab, NL, FF, VT, CR). Many system abbreviate as \s. [:graph:]Exclude whitespace (SPACE, TAB). Many system abbreviate as \W. [:upper:]Any alpha character A to Z. [:lower:]Any alpha character a to z. [:cntrl:]Control Characters NL CR LF TAB VT FF NUL SOH STX EXT EOT ENQ ACK SO SI DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC IS1 IS2 IS3 IS4 DEL.