1 A pair of sometimes useful functions Function ord returns a character’s ordinance / character code (Unicode) Function chr returns the character with.

Slides:



Advertisements
Similar presentations
Regular Expressions Pattern and Match objects Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

1 A pair of sometimes useful functions Function ord returns a character’s ordinance / character code (Unicode) Function chr returns the character with.
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All rights reserved. 1 Chapter 9 Strings.
Python regular expressions. “Some people, when confronted with a problem, think ‘I know, I'll use regular expressions.’ Now they have two problems.”
1 Recap: Two ways of using regular expression Search directly: re.search( regExp, text ) 1.Compile regExp to a special format (an SRE_Pattern object) 2.Search.
1 Recap: Two ways of using regular expression Search directly: re.search( regExp, text ) 1.Compile regExp to a special format (an SRE_Pattern object) 2.Search.
Fundamentals of Characters and Strings Characters: fundamental building blocks of Python programs Function ord returns a character’s character code.
CS 330 Programming Languages 10 / 11 / 2007 Instructor: Michael Eckmann.
1 Recap: Two ways of using regular expression Search directly: re.search( regExp, text ) 1.Compile regExp to a special format (an SRE_Pattern object) 2.Search.
Chapter 7. 2 Objectives You should be able to describe: The string Class Character Manipulation Methods Exception Handling Input Data Validation Namespaces.
Working with Files CSC 161: The Art of Programming Prof. Henry Kautz 11/9/2009.
Binary Search Trees continued Trees Draw the BST Insert the elements in this order 50, 70, 30, 37, 43, 81, 12, 72, 99 2.
Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies Knowledge Engineering : Systems Integration : Web.
Lesson 3 – Regular Expressions Sandeepa Harshanganie Kannangara MBCS | B.Sc. (special) in MIT.
1 Spidering the Web in Python CSC 161: The Art of Programming Prof. Henry Kautz 11/23/2009.
Advanced Programming Collage of Information Technology University of Palestine, Gaza Prepared by: Mahmoud Rafeek Alfarra Lecture 16: Working with Text.
Tutorial 14 Working with Forms and Regular Expressions.
Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. address.
Methods in Computational Linguistics II with reference to Matt Huenerfauth’s Language Technology material Lecture 4: Matching Things. Regular Expressions.
Regular Expressions in.NET Ashraya R. Mathur CS NET Security.
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
CIS 218 Advanced UNIX1 CIS 218 – Advanced UNIX (g)awk.
Strings The Basics. Strings can refer to a string variable as one variable or as many different components (characters) string values are delimited by.
Perl and Regular Expressions Regular Expressions are available as part of the programming languages Java, JScript, Visual Basic and VBScript, JavaScript,
1 An Introduction to Python Part 3 Regular Expressions for Data Formatting Jacob Morgan Brent Frakes National Park Service Fort Collins, CO April, 2008.
Fall Week 4 CSCI-141 Scott C. Johnson.  Computers can process text as well as numbers ◦ Example: a news agency might want to find all the articles.
Python Regular Expressions Easy text processing. Regular Expression  A way of identifying certain String patterns  Formally, a RE is:  a letter or.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 4. Document Search and Regular Expressions.
Regular Expressions: The Proper Care and Feeding Zain Naboulsi MSDN Developer Evangelist Microsoft.
CS 330 Programming Languages 10 / 07 / 2008 Instructor: Michael Eckmann.
Working with arrays (we will use an array of double as example)
Post-Module JavaScript BTM 395: Internet Programming.
BY Sandeep Kumar Gampa.. What is Regular Expression? Regex in.NET Regex Language Elements Examples Regular Expression API How to Test regex in.NET Conclusion.
Regular Expressions – An Overview Regular expressions are a way to describe a set of strings based on common characteristics shared by each string in.
Clearly Visual Basic: Programming with Visual Basic 2008 Chapter 24 The String Section.
Review Please hand in your practicals and homework Regular Expressions with grep.
Regular Expressions Regular Expressions. Regular Expressions  Regular expressions are a powerful string manipulation tool  All modern languages have.
 2002 Prentice Hall. All rights reserved. 1 Chapter 13 – String Manipulation and Regular Expressions Outline 13.1 Introduction 13.2 Fundamentals of Characters.
Working with Forms and Regular Expressions Validating a Web Form with JavaScript.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Searching and Regular Expressions. Proteins 20 amino acids Interesting structures beta barrel, greek key motif, EF hand... Bind, move, catalyze, recognize,
12. Regular Expressions. 2 Motto: I don't play accurately-any one can play accurately- but I play with wonderful expression. As far as the piano is concerned,
CS346 Regular Expressions1 Pattern Matching Regular Expression.
CSC 4630 Meeting 21 April 4, Return to Perl Where are we? What is confusing? What practice do you need?
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
A Level Computing#BristolMet Session ObjectivesU2#S12 MUST describe the terms modal and pretty printing in term of input and output facilities. SHOULD.
Computer Programming for Biologists Class 6 Nov 21 th, 2014 Karsten Hokamp
Perl Tutorial. Why PERL ??? Practical extraction and report language Similar to shell script but lot easier and more powerful Easy availablity All details.
2004/12/051/27 SPARCS 04 Seminar Regular Expression By 박강현 (lightspd)
LING/C SC/PSYC 438/538 Lecture 8 Sandiway Fong. Adminstrivia Homework 4 not yet graded …
1 Validating user input is the bane of every software developer’s existence. When you are developing cross-browser web applications (IE4+ and NS4+) this.
Notes on Python Regular Expressions and parser generators (by D. Parson) These are the Python supplements to the author’s slides for Chapter 1 and Section.
Prof. Alfred J Bird, Ph.D., NBCT Office – McCormick 3rd floor 607.
Prof. Alfred J Bird, Ph.D., NBCT Door Code for IT441 Students.
ICS3U_FileIO.ppt File Input/Output (I/O)‏ ICS3U_FileIO.ppt File I/O Declare a file object File myFile = new File("billy.txt"); a file object whose name.
Python – May 16 Recap lab Simple string tokenizing Random numbers Tomorrow: –multidimensional array (list of list) –Exceptions.
Chapter 23 The String Section (String Manipulation) Clearly Visual Basic: Programming with Visual Basic nd Edition.
-Joseph Beberman *Some slides are inspired by a PowerPoint presentation used by professor Seikyung Jung, which was derived from Charlie Wiseman.
Today… Strings: –String Methods Demo. Raising Exceptions. os Module Winter 2016CISC101 - Prof. McLeod1.
Python Pattern Matching and Regular Expressions Peter Wad Sackett.
Winter 2016CISC101 - Prof. McLeod1 CISC101 Reminders Quiz 3 this week – last section on Friday. Assignment 4 is posted. Data mining: –Designing functions.
CSC-305 Design and Analysis of AlgorithmsBS(CS) -6 Fall-2014CSC-305 Design and Analysis of AlgorithmsBS(CS) -6 Fall-2014 Design and Analysis of Algorithms.
Regular Expressions In Javascript cosc What Do They Do? Does pattern matching on text We use the term “string” to indicate the text that the regular.
Java Basics Regular Expressions.  A regular expression (RE) is a pattern used to search through text.  It either matches the.
CS 330 Class 7 Comments on Exam Programming plan for today:
Miscellaneous Items Loop control, block labels, unless/until, backwards syntax for “if” statements, split, join, substring, length, logical operators,
15-110: Principles of Computing
CISC101 Reminders Assignment 3 due next Friday. Winter 2019
Introduction to Computer Science
Python Strings.
Presentation transcript:

1 A pair of sometimes useful functions Function ord returns a character’s ordinance / character code (Unicode) Function chr returns the character with the given character code >>> ord('ff') Traceback (most recent call last): File " ", line 1, in ? TypeError: ord() expected a character, but string of length 2 found >>> ord('f') 102 >>> ord('.') 46 >>> chr(46) '.'

2 Danish Intelligence Agency Memo Concerning: incident where activists threw red paint at prime minister Anders Fogh Rasmussen Task: improve electronic surveillance to avoid such indicents in the future String searching using find

3 surveillance.py Magic: red text Find index of first occurrence of word starting at startindex Print substring around suspicious word without exceeding string Strings are immutable: manipulation methods return new strings

4 surveillancetest.py

5 surveillancetest.py, output Not all words found, text okay All words found, text is suspicious! ting by Douglas Coupland was sold for a slight change of plans, the prime minister atten All words found, text is suspicious! fice. He hides the paint behind a plant. Tuesday m him and throw the paint. They keep attacking him, George and Ringo attack him and throw the paint. e paint. They keep attacking him until they're arr y're arrested. The attack should take place at 10a Here's the plan: Paul breaks into Christia the paint behind a plant. Tuesday morning before t We find words containing a suspicious word: may be important

6 Parents Music Resource Center Concerning: crude language in much of today’s music Task: implement censorship to remove bad words More string methods: splitlines, join, replace

7 censorship.py If any words were BEEPed, print line and play one beep per word Split text in list of lines In each line, replace each bad word with BEEP Join censored lines with newlines and return full text

8 Celine Dion: With each moment, moment pBEEPing by Beeped words: 1 Crime Mob: Ol' stankin BEEP (Hoe) Jank BEEP (Hoe) Suck my BEEP you (Hoe) Ol' fat BEEP (Hoe) But aiight! We finna get these lame BEEP niggaz You see a hoe BEEP nigga, call his BEEP out. Aye! Aye! Stomp his BEEP like (Hoe) Ol' lame BEEP (Hoe) I'ma tell you how it is nigga you betta get the BEEP back cause a nigga like me don't give a BEEP A nigga suppose to gon leave yo BEEP choked You sound like a BEEP yo BEEP I'ma hit we don't give a BEEP cause you is a lame One hitter quitter yo BEEP get popped Back the BEEP up 'fore I show you who reala Whats up wit ya BEEP nigga Ol' sucka BEEP, busta BEEP, cryin to yo momma BEEP I'ma keep up drama I'm a muthaBEEPin plum BEEP See you just a dumb BEEP go on wit yo young BEEP Try me like a sucka but I know you just a lame BEEP In my section they glad to see a nigga that don't give a BEEP Stomp you to the floor and tell you get yo pussy BEEP up Pick that nigga BEEP up, tear his lame BEEP up Niggaz representin Ellenwood time to mBEEP up Throwin blows like Johnny Cage, you think you wanna BEEP wit me Do this BEEP like Pastor Troy Uuh Huh I'm outside hoe Take my BEEPin word I ain't got no reason to lie hoe Beeped words: 34 Program tested on two songs by Celine Dion and Crime Mob : We find words containing a suspicious word: not desirable here. See exercise.

9 Regular Expressions – Motivation Problem: search suspicious text for any Danish text1 = "No Danish here fj3a“ text2 = "But here: what a el ds“ text3 = "And here - Cumbersome using ordinary string methods.

10 Text2 contains this Danish address: RegExp solution (to be explained later)

11 Regular Expressions Provide more efficient and powerful alternative to string search methods Instead of searching for a specific string we can search for a text pattern – Don’t have to search explicitly for ‘Monday’, ‘Tuesday’, ‘Wednesday’.. : there is a pattern in these search strings. – A regular expression is a text pattern In Python, regular expression processing capabilities provided by module re

12 Example Simple regular expression: regExp = “football” - matches only the string “football” To search a text for regExp, we can use re.search( regExp, text )

13 Compiling Regular Expressions re.search( regExp, text ) 1.Compile regExp to a special format (an SRE_Pattern object) 2.Search for this SRE_Pattern in text 3.Result is an SRE_Match object If we need to search for regExp several times, it is more efficient to compile it once and for all: compiledRE = re.compile( regExp) 1.Now compiledRE is an SRE_Pattern object compiledRE.search( text ) 2.Use search method in this SRE_Pattern to search text 3.Result is same SRE_Match object

14 Searching for ‘football’ import re text1 = "Here are the football results: Bosnia - Denmark 0-7" text2 = "We will now give a complete list of python keywords." regularExpression = "football" compiledRE = re.compile( regularExpression) SRE_Match1 = compiledRE.search( text1 ) SRE_Match2 = compiledRE.search( text2 ) if SRE_Match1: print "Text1 contains the substring ‘football’" if SRE_Match2: print "Text2 contains the substring ‘football’" Text1 contains the substring 'football' Compile regular expression and get the SRE_Pattern object Use the same SRE_Pattern object to search both texts and get two SRE_Match objects (or none if the search was unsuccesful)

15 Building more sophisticated patterns Metacharacters: ? : matches zero or one occurrences of the expression it follows + : matches one or more occurrences of the expression it follows * : matches zero or more occurrences of the expression it follows # search for zero or one t, followed by two a’s: regExp1 = “t?aa“ # search for g followed by one or more c’s followed by one a: regExp1 = “gc+a“ #search for ct followed by zero or more g’s followed by one a: regExp1 = “ctg*a“

16 Text contains the regular expression t?aa Text contains the regular expression gc+a Text contains the regular expression ctg*a Use the SRE_Pattern objects to search the text and get SRE_Match objects