Applications of Regular Expressions BY— NIKHIL KUMAR KATTE 1.

Slides:



Advertisements
Similar presentations
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All rights reserved. 1 Chapter 9 Strings.
Advertisements

Formal Language, chapter 4, slide 1Copyright © 2007 by Adam Webber Chapter Four: DFA Applications.
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
1 Chapter 2 Introduction to Java Applications Introduction Java application programming Display ____________________ Obtain information from the.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
176 Formal Languages and Applications: We know that Pascal programming language is defined in terms of a CFG. All the other programming languages are context-free.
1 Regular Expressions & Automata Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.
Regular Expressions in Java. Namespace in XML Transparency No. 2 Regular Expressions Regular expressions are an extremely useful tool for manipulating.
Regular Expressions in Java. Regular Expressions A regular expression is a kind of pattern that can be applied to text ( String s, in Java) A regular.
1 A Quick Introduction to Regular Expressions in Java.
Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl Linux editors and commands (e.g.
Regular Expressions & Automata Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Regular expression. Validation need a hard and very complex programming. Sometimes it looks easy but actually it is not. So there is a lot of time and.
1 Overview Regular expressions Notation Patterns Java support.
Scanning with Jflex.
Lecture 2: Lexical Analysis CS 540 George Mason University.
1 Scanning Aaron Bloomfield CS 415 Fall Parsing & Scanning In real compilers the recognizer is split into two phases –Scanner: translate input.
Lesson 3 – Regular Expressions Sandeepa Harshanganie Kannangara MBCS | B.Sc. (special) in MIT.
CS 536 Spring Learning the Tools: JLex Lecture 6.
1 Form Validation. Validation  Validation of form data can be cumbersome using the basic techniques  StringTokenizer  If-else statements  Most of.
Introduction to Programming Prof. Rommel Anthony Palomino Department of Computer Science and Information Technology Spring 2011.
ADSA: RegExprs/ Advanced Data Structures and Algorithms Objective –look at programming with regular expressions (REs) in Java Semester 2,
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
Chapter 1 Introduction Dr. Frank Lee. 1.1 Why Study Compiler? To write more efficient code in a high-level language To provide solid foundation in parsing.
Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Machine-independent code improvement Target code generation Machine-specific.
REGULAR EXPRESSIONS. Lexical Analysis Lexical analysers can be constructed by programs such as LEX These programs employ as input a description of the.
1 Regular Expressions. 2 Regular expressions describe regular languages Example: describes the language.
Regular Expressions Regular expressions are a language for string patterns. RegEx is integral to many programming languages:  Perl  Python  Javascript.
Lexical Analysis I Specifying Tokens Lecture 2 CS 4318/5531 Spring 2010 Apan Qasem Texas State University *some slides adopted from Cooper and Torczon.
Agenda Regular Expressions (Appendix A in Text) –Definition / Purpose –Commands that Use Regular Expressions –Using Regular Expressions –Using the Replacement.
COMP313A Programming Languages Lexical Analysis. Lecture Outline Lexical Analysis The language of Lexical Analysis Regular Expressions.
Regular Expressions CSC207 – Software Design. Motivation Handling white space –A program ought to be able to treat any number of white space characters.
Regular Expression in Java 101 COMP204 Source: Sun tutorial, …
Regular Expressions.
Flex: A fast Lexical Analyzer Generator CSE470: Spring 2000 Updated by Prasad.
Group 4 Java Compiler Group Members: Atul Singh(Y6127) Manish Agrawal(Y6241) Mayank Sachan(Y6253) Sudeept Sinha(Y6483)
Overview A regular expression defines a search pattern for strings. Regular expressions can be used to search, edit and manipulate text. The pattern defined.
Review: Compiler Phases: Source program Lexical analyzer Syntax analyzer Semantic analyzer Intermediate code generator Code optimizer Code generator Symbol.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Regular Expressions. Overview Regular expressions allow you to do complex searches within text documents. Examples: Search 8-K filings for restatements.
Module 6 – Generics Module 7 – Regular Expressions.
CPS 506 Comparative Programming Languages Syntax Specification.
Practical 1-LEX Implementation
1 Lex & Yacc. 2 Compilation Process Lexical Analyzer Source Code Syntax Analyzer Symbol Table Intermed. Code Gen. Code Generator Machine Code.
ICS312 LEX Set 25. LEX Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the C program.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Compiler Construction By: Muhammad Nadeem Edited By: M. Bilal Qureshi.
Strings and Related Classes String and character processing Class java.lang.String Class java.lang.StringBuffer Class java.lang.Character Class java.util.StringTokenizer.
CSE 374 Programming Concepts & Tools Hal Perkins Fall 2015 Lecture 5 – Regular Expressions, grep, Other Utilities.
1 Compiler Construction (CS-636) Muhammad Bilal Bashir UIIT, Rawalpindi.
By Mr. Muhammad Pervez Akhtar
Computer Programming 2 Lab (1) I.Fatimah Alzahrani.
CSC 110 – Intro to Computing - Programming
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Pattern Matching: Simple Patterns. Introduction Programmers often need to scan a file, directory, etc. for a specific substring. –Find all files that.
OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.
ICS611 Lex Set 3. Lex and Yacc Lex is a program that generates lexical analyzers Converting the source code into the symbols (tokens) is the work of the.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
CS314 – Section 5 Recitation 2
REGULAR EXPRESSION Java provides the java.util.regex package for pattern matching with regular expressions. Java regular expressions are very similar.
Regular Expressions.
Looking for Patterns - Finding them with Regular Expressions
Chapter 3 Lexical Analysis.
Regular Languages.
Review: Compiler Phases:
CS 3304 Comparative Languages
Regular Expressions in Java
Regular Expressions in Java
Regular Expressions in Java
Presentation transcript:

Applications of Regular Expressions BY— NIKHIL KUMAR KATTE 1

Index What is a Regular Expression? Applications of Regular Expressions  Regular expressions in Operating systems(Unix)  Regular expressions in Compilation(Lexical analysis)  Regular expressions in Programming languages(Java) 2

Regular Expression Regular expression is defined as a sequence of characters. A regular expression, often called a pattern, it is an expression used to specify a set of strings required for a particular purpose Regular Expression operators X Y concatinationX followed by Y X | Y alterationX or Y X* Kleen closureZero or more occurrences of X X + One or more occurrences of X (X) used for grouping X 3

Examples  L(001) = {001}  L(0+10*) = { 0, 1, 10, 100, 1000, 10000, … }  L(0*10*) = {1, 01, 10, 010, 0010, …} i.e. {w | w has exactly a single 1}  L(  )* = {w | w is a string of even length}  L((0(0+1))*) = { ε, 00, 01, 0000, 0001, 0100, 0101, …}  L((0+ε)(1+ ε)) = {ε, 0, 1, 01}  L(1Ø) = Ø ; concatenating the empty set to any set yields the empty set.  Rε = R  R+Ø = R 4

Applications of Regular Expression in Unix Basically regular expressions are used in search tools Text file search in Unix (tool: egrep) This command searches for a text pattern in the file and list the file names containing that pattern. 5

Example:  Standard egrep command looks like: egrep ’ ’ Some common flags are: -c for counting the number of successful matches and not printing the actual matches -i to make the search case insensitive -n to print the line number before each match printout -v to take the complement of the regular expression (i.e. return the lines which don't match) -l to print the filenames of files with lines which match the expression. 6

Example: File create Sam Dexter John Raman By implementing egrep command : % egrep ‘n’ create John Raman % File create Dexter John Alice Raman By implementing egrep command: % egrep ‘o.*h’ create john % 7

Few more related examples egrep -c '^1|01$' lots_o_bits count the number of lines in lots_o_bits which either start with 1 or end with 01 egrep -c '10*10*10*10*10*10*10*10*10*10*1' lots_o_bits count the number of lines with at least eleven 1's egrep -i '\ ' myletter.txt list all the lines in myletter.txt containing the word insensitive of case. 8

Regular Expressions in Lexical analysis  Interaction of Lexical analyzer with parser  Source Program Lexical analyzer symbol table parser token Nexttoken() 9

Lexical Analysis The Lexical Analyzer (lexer) reads source code and generates a stream of tokens What is a token?  Identifier  Keyword  Number  Operator  Punctuation Tokens can be described using regular expressions! 10

pattern: The rule describing how a token can be formed. Ex: identifier: ([a-z]|[A-Z]) ([a-z]|[A-Z]|[0-9])* How to specify tokens: all the basic elements in a language must be tokens so that they can be recognized. Token types: constant, identifier, reserved word, operator and misc. symbol. main() { int i, j; for (I=0; I<50; I++) { printf(“I = %d”, I); } 11

Regular expressions in Java: Regular expressions are a language of string patterns built in to most modern programming languages, including Java 1.4 onwards; they can be used for: searching, extracting, and modifying text. The Java package java.util.regex contains classes for working with regexps in Java Character Classes Character classes are used to define the content of the pattern. Example, what should the pattern look for?. Dot, any character (may or may not match line terminators, read on) \d A digit: [0-9] \D A non-digit: [^0-9] \s A whitespace character: [ \t\n\x0B\f\r] \S A non-whitespace character: [^\s] \w A word character: [a-zA-Z_0-9] \W A non-word character: [^\w] However; notice that in Java, you will need to “double escape” these backslashes. String pattern = "\\d \\D \\W \\w \\S \\s"; 12

Quantifiers Quantifiers can be used to specify the number or length that part of a pattern should match or repeat. A quantifier will bind to the expression group to its immediate left. * Match 0 or more times + Match 1 or more times ? Match 1 or 0 times {n} Match exactly n times {n,} Match at least n times 13

Matching/Validating Regular expressions make it possible to find all instances of text that match a certain pattern, and return a Boolean value if the pattern is found/not found. Sample code p ublic class ValidateDemo { public static void main(String[] args) { List input = new ArrayList (); input.add(" "); input.add(" "); input.add(" (attack)"); input.add(" "); input.add(" "); for (String ssn : input) { if (ssn.matches("^(\\d{3}-?\\d{2}-?\\d{4})$")) { System.out.println("Found good SSN: " + ssn); OUTPUT: } This produces the following output: Found good SSN: Found good SSN: p ublic class ValidateDemo { public static void main(String[] args) { List input = new ArrayList (); input.add(" "); input.add(" "); input.add(" (attack)"); input.add(" "); input.add(" "); for (String ssn : input) { if (ssn.matches("^(\\d{3}-?\\d{2}-?\\d{4})$")) { System.out.println("Found good SSN: " + ssn); }

Extracting/Capturing Specific values can be selected out of a large complex body of text. These values can be used in the application. Sample Code This produces the followi public stapublic class ExtractDemo { public static void main(String[] args) { String input = "I have a cat, but I like my dog better."; Pattern p = Pattern.compile("(mouse|cat|dog|wolf|bear|human)"); Matcher m = p.matcher(input); List animals = new ArrayList (); while (m.find()) { System.out.println("Found a " + m.group() + "."); animals.add(m.group()); tic void main(String[] args) { String input = "I have a cat, but I like my dog better."; Pattern p = Pattern.compile("(mouse|cat|dog|wolf|bear|human)"); Matcher m = p.matcher(input); List animals = new ArrayList (); while (m.find()) { System.out.println("Found a " + m.group() + "."); animals.add(m.group()); This produces the following output: Found a cat Found a dog 15 public class ExtractDemo { public static void main(String[] args) { String input = "I have a cat, but I like my dog better."; Pattern p = Pattern.compile("(mouse|cat|dog|wolf|bear|human)"); Matcher m = p.matcher(input); List animals = new ArrayList (); while (m.find()) { System.out.println("Found a " + m.group() + "."); animals.add(m.group());

References  tutorial.htm tutorial.htm   &rep=rep1&type=pdf &rep=rep1&type=pdf  in-java-part-1/ in-java-part-1/ 16