CSC1018F: Regular Expressions

Slides:



Advertisements
Similar presentations
Information Processing
Advertisements

CSC 110 – Intro to Computing Lecture 14: Midterm Review.
1 Recap: Two ways of using regular expression Search directly: re.search( regExp, text ) 1.Compile regExp to a special format (an SRE_Pattern object) 2.Search.
Computer Systems 1 Fundamentals of Computing
+ CS 325: CS Hardware and Software Organization and Architecture Integers and Arithmetic.
Working with Files CSC 161: The Art of Programming Prof. Henry Kautz 11/9/2009.
Programmable Logic Controllers
Memory Terminology & Data Representation CSCI 1060 Fall 2006.
Numbering Systems CS208.
IT-101 Section 001 Lecture #3 Introduction to Information Technology.
Numbering Systems. CSCE 1062 Outline What is a Numbering System Review of decimal numbering system Binary representation range Hexadecimal numbering system.
Title NUMERIC SYSTEMS USED IN NETWORKING NUMERIC SYSTEMS USED IN NETWORKING.
Lecture 4 Last Lecture –Positional Numbering Systems –Converting Between Bases Today’s Topics –Signed Integer Representation Signed magnitude One’s complement.
Engineering 1040: Mechanisms & Electric Circuits Spring 2014 Number Systems.
1 DLD Lecture 18 Recap. 2 Recap °Number System/Inter-conversion, Complements °Boolean Algebra °More Logic Functions: NAND, NOR, XOR °Minimization with.
Lecture 2 Bits, Bytes & Number systems
CCE-EDUSAT SESSION FOR COMPUTER FUNDAMENTALS Date: Session III Topic: Number Systems Faculty: Anita Kanavalli Department of CSE M S Ramaiah.
Python Regular Expressions Easy text processing. Regular Expression  A way of identifying certain String patterns  Formally, a RE is:  a letter or.
1 CSC 594 Topics in AI – Text Mining and Analytics Fall 2015/16 4. Document Search and Regular Expressions.
CMSC 104, Lecture 051 Binary / Hex Binary and Hex The number systems of Computer Science.
Regular Expressions CSC207 – Software Design. Motivation Handling white space –A program ought to be able to treat any number of white space characters.
Introduction Copyright © Software Carpentry 2010 This work is licensed under the Creative Commons Attribution License See
 2002 Prentice Hall. All rights reserved. 1 Chapter 13 – String Manipulation and Regular Expressions Outline 13.1 Introduction 13.2 Fundamentals of Characters.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
CSC 107 – Programming For Science. Positional Notation  Used in nearly all modern numerical systems  Right-to-left ordering of digits within larger.
DECIMALBINARY a) b) c) d) e) f) Revision Exercise DECIMALBINARY a) b) c)
Binary01.ppt Decimal Decimal: Base 10 means 10 Unique numerical digits ,00010,000 Weight Positions 3,
PHY 107 – Programming For Science. Positional Notation  Used in nearly all modern numerical systems  Right-to-left ordering of digits within larger.
CS1Q Computer Systems Lecture 2 Simon Gay. Lecture 2CS1Q Computer Systems - Simon Gay2 Binary Numbers We’ll look at some details of the representation.
Unit 11 –Reglar Expressions Instructor: Brent Presley.
Number Representation Lecture Topics How are numeric data items actually stored in computer memory? How much space (memory locations) is.
CSC1018F: Regular Expressions (Tutorial) Diving into Python Ch. 7 Number Systems.
Dale Roberts Department of Computer and Information Science, School of Science, IUPUI CSCI 230 Information Representation: Positive Integers Dale Roberts,
Regular expressions Day 11 LING Computational Linguistics Harry Howard Tulane University.
ECE DIGITAL LOGIC LECTURE 2: DIGITAL COMPUTER AND NUMBER SYSTEMS Assistant Prof. Fareena Saqib Florida Institute of Technology Fall 2016, 01/14/2016.
CS151 Introduction to Digital Design Chapter 1: Digital Systems and Information Lecture 2 1Created by: Ms.Amany AlSaleh.
CSC 110 – Intro to Computing Lecture 3: Converting between bases & Arithmetic in other bases.
The Hexadecimal System is base 16. It is a shorthand method for representing the 8-bit bytes that are stored in the computer system. This system was chosen.
CMSC 1041 Binary / Hex Binary and Hex The number systems of Computer Science.
Number Systems. The position of each digit in a weighted number system is assigned a weight based on the base or radix of the system. The radix of decimal.
Computer Maintenance Numbering Systems Copyright © Texas Education Agency, All rights reserved.1.
28 Formatted Output.
Computer Maintenance Numbering Systems Trade & Industrial Education
Lec 3: Data Representation
Regular Expressions Upsorn Praphamontripong CS 1110
3.1 Denary, Binary and Hexadecimal Number Systems
Chapter 10 Selected Single-Row Functions Oracle 10g: SQL
CMSC201 Computer Science I for Majors Lecture 22 – Binary (and More)
CSC 594 Topics in AI – Natural Language Processing
Regular Expressions in Perl
ECE Application Programming
CSC1018F: Functional Programming
ITE102 – Computer Programming (C++)
Chapter 3 Data Representation
Number System conversions
Fundamentals & Ethics of Information Systems IS 201
CSC 594 Topics in AI – Natural Language Processing
Binary Numbers Material on Data Representation can be found in Chapter 2 of Computer Architecture (Nicholas Carter) CSC 370 (Blum)
Unit 2.6 Data Representation Lesson 1 ‒ Numbers
Numbering System TODAY AND TOMORROW 11th Edition
CSC1018F: Intermediate Python
CS 1111 Introduction to Programming Fall 2018
Fundamentals of Python: First Programs
Numeral systems (radix)
COMS 161 Introduction to Computing
Information Representation
UNIT – 3 & 4. Data Representation and Internal
Presentation transcript:

CSC1018F: Regular Expressions Diving into Python Ch. 7 Number Systems

Lecture Outline Recap of OO Python [week 3] Regular Expressions Standard Verbose Number Systems Binary, decimal, hexadecimal

Recap of OO Python Object Orientation: Exceptions File Handling: Module importing Defining, initializing and instantiating Classes Class attributes Class methods Exceptions File Handling: Opening, reading, writing and closing

Intro to Regular Expressions Regular expressions are a powerful means for parsing text to identify complex patterns of characters Standard string methods (find, replace, split) can be insufficient in complex cases But regular expressions can be complicated and difficult to read so avoid them if string methods will do the job Read regular expressions from left to right Usage: Import re # regular expression functionality in re module Re.sub(regexpr, repstr, inputstr) # typical search & replace

Format of Regular Expressions Syntax: $ - end of string marker ^ - start of string marker \b - word boundary marker (to avoid backslash escapes use a raw string - r"stringcontents") ? - optional match to a single character (A|B|C) - indicates mutually exclusive options A, B and C Examples: re.sub(r"\bROAD$", "RD.", addr) addr: 60 BROAD ROAD  60 BROAD RD. re.search(r"^(a|b|c) -", question) question: a - how are you?  <SRE_Match object …>

Further Syntax P{n, m} syntax: More syntax: Examples: Deals with repeating patterns Read as pattern P appears at least n times but no more than m times More syntax: \d - any numeric digit \D - any character except a numeric digit + - 1 or more * - 0 or more ( ) - to indicate groups Examples: >>> phPat = re.compile(r"^(\d{3})\D*(\d{7})$") >>> phPat.search(“021 6504058”).groups() (‘021’, ‘6504058’)

Verbose Regular Expressions So far only compact regular expressions To aid readability we would like to include comments and spaces Use re.VERBOSE as the last arguments to re functions Whitespace is ignored Comments ( # commentstr) are ignored Example: pattern = """ ^ # beginning of string $ # end of string """

Case Study Counting 1-10 in roman numerals Additive and subtractive combination of I (=1), V(=5), X (=10) Can have at most 3 of a particular numeral in a row >>> roman = r"^(I?X|IV|V?I{0,3})$" >>> re.search(roman, "X") <_sre.SRE_Match object at 0x1e55be0> >>> re.search(roman, "VIII") <_sre.SRE_Match object at 0x1e55ba0> >>> re.search(roman, "") <_sre.SRE_Match object at 0x1e55ce0> >>> re.search(roman, "IIII") == None True

Number Systems Decimal (base 10) Binary (base 2) Hexadecimal (base 16) Digits (0-9) Each place represents a power of ten 172 = 2*100 + 7*101 + 1*102 = 172 Binary (base 2) Digits (0,1) Each place represents a power of two 10011 = 1*20 + 1*21 + 0* 22 + 0* 23 + 1* 24 = 19 Hexadecimal (base 16) Digits (0-9, A-F) A-F represent 10-15 Each place represents a power of sixteen E.g., F7A = 10*160 + 7* 161 + 15* 162 = 3962

Conversion Decimal to others Bin2Hex: Hex2Bin Hex is used because: Repeatedly divide number by base and populate places from right to left with the remainder E.g. Dec2Bin: 50 / 2 [% = 0] = 25 / 2 [% = 1] = 12 / 2 [% = 0] = 6 / 2 [% = 0] = 3 / 2 [% = 1] = 1 / 2 [% = 1] = 0 [110010] Bin2Hex: Collect binary digits into groups of four and convert E.g., 111000011111 = 1110 0001 1111 = E1F Hex2Bin Hexadecimal digits convert into groups of four binary digits E.g., A7C = 1010 0111 1100 = 101001111100 Hex is used because: It is easy to convert to and from binary Offers a more compact representation

Revision Exercise Create a function which will take a date string in any one of the following formats: dd/mm/yyyy or dd/mm/yy Other separators (e.g., ‘\’, ‘ ‘, ‘-’) are also allowed Single figure entries may have the form x or 0x, e.g. 3/4/5 or 03/04/05 dd month yy or yyyy where month may be written in full (December) or abbreviated (Dec. or Dec) And return it in the format: dd month(in full) yyyy, e.g. 13 March 2006 Implement this using regular expressions and also implement range checking on dates