Introducing Oracle Regular Expressions

Slides:



Advertisements
Similar presentations
Oracle Warehouse Builder 10g Ensure Data Quality
Advertisements

Session 3BBK P1 ModuleApril 2010 : [#] Regular Expressions.
Form Validation CS What is form validation?  validation: ensuring that form's values are correct  some types of validation:  preventing blank.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
Oracle11g: PL/SQL Programming Chapter 1 Introduction to PL/SQL.
1 Query Languages. 2 Boolean Queries Keywords combined with Boolean operators: –OR: (e 1 OR e 2 ) –AND: (e 1 AND e 2 ) –BUT: (e 1 BUT e 2 ) Satisfy e.
Regular Expressions 101 Danny Bryant City of Atlanta.
Last Updated March 2006 Slide 1 Regular Expressions.
MORE APPLICATIONS OF REGULAR EXPRESSION By Venkata Sai Pundamalli id:
Overview Classes of datatypes available in Oracle 10g – Character – Numeric – Long, Raw – Dates/Times – Large Objects (LOBs) – ROWID – Specialized 1.
Regular Expressions Dr. Ralph D. Westfall May, 2011.
Native Support for Web Services  Native Web services access  Enables cross platform interoperability  Reduces middle-tier dependency (no IIS)  Simplifies.
Regular Expression Darby Tien-Hao Chang (a.k.a. dirty) Department of Electrical Engineering, National Cheng Kung University.
Chapter 10 Selected Single-Row Functions Oracle 10g: SQL.
ASP.NET Programming with C# and SQL Server First Edition Chapter 5 Manipulating Strings with C#
CIS 451: Regular Expressions Dr. Ralph D. Westfall January, 2009.
8 Copyright © 2006, Oracle. All rights reserved. Regular Expression Support.
Internet Information Systems Writing to Databases and Amending Data.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Chapter 10: The Data Tier We discuss back-end data storage for Web applications, relational data, and using the MySQL database server for back-end storage.
Oracle 11g: SQL Chapter 10 Selected Single-Row Functions.
Oracle9i Database Administrator: Implementation and Administration 1 Chapter 14 Globalization Support in the Database.
BY Sandeep Kumar Gampa.. What is Regular Expression? Regex in.NET Regex Language Elements Examples Regular Expression API How to Test regex in.NET Conclusion.
Database Programming Sections 13–Creating, revoking objects privileges.
Regular Expressions Theory and Practice Jeff Schoolcraft MDCFUG 12/13/2005.
2# BLAST & Regular Expression Searches Functionality Susie Stephens Life Sciences Product Manager Oracle Corporation.
Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn
GREP. Whats Grep? Grep is a popular unix program that supports a special programming language for doing regular expressions The grammar in use for software.
Week 7 Lecture 2 Globalization Support in the Database.
XML and Database.
Chapter 5 : Integrity And Security  Domain Constraints  Referential Integrity  Security  Triggers  Authorization  Authorization in SQL  Views 
Using SQL in PL/SQL Oracle Database PL/SQL 10g Programming Chapter 4.
7 Copyright © 2009, Oracle. All rights reserved. Regular Expression Support.
Session id: Darrell Hilliard Senior Delivery Manager Oracle University Oracle Corporation.
Starting with Oracle SQL Plus. Today in the lab… Connect to SQL Plus – your schema. Set up two tables. Find the tables in the catalog. Insert four rows.
Gollis University Faculty of Computer Engineering Chapter Five: Retrieval, Functions Instructor: Mukhtar M Ali “Hakaale” BCS.
Dynamic SQL Writing Efficient Queries on the Fly ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
Understanding Regular Expressions – Peter Robson – Finnish OUG Helsinki, April JustSQL.com Understanding Regular Expressions Peter Robson Technical.
Lesson 4 String Manipulation. Lesson 4 In many applications you will need to do some kind of manipulation or parsing of strings, whether you are Attempting.
N5 Databases Notes Information Systems Design & Development: Structures and links.
Regular Expressions Copyright Doug Maxwell (
Fundamentals of DBMS Notes-1.
CS 330 Class 7 Comments on Exam Programming plan for today:
CS 440 Database Management Systems
Regular Expressions.
Dynamic SQL Writing Efficient Queries on the Fly
Looking for Patterns - Finding them with Regular Expressions
Chapter 6 - Database Implementation and Use
Chapter 10 Selected Single-Row Functions Oracle 10g: SQL
Oracle11g: PL/SQL Programming Chapter 1 Introduction to PL/SQL.
CSC 594 Topics in AI – Natural Language Processing
Regular Expressions in Perl
1Z0-071 Exam : Oracle Database 12c SQL
ATS Application Programming: Java Programming
Dynamic SQL Writing Efficient Queries on the Fly
CSC 594 Topics in AI – Natural Language Processing
Query Languages.
The relational operators
© Akhilesh Bajaj, All rights reserved.
PHP.
Web DB Programming: PHP
Data Manipulation & Regex
Contents Preface I Introduction Lesson Objectives I-2
Chapter 7 Using SQL in Applications
Regular Expressions and Grep
Lecture 25: Regular Expressions
1.5 Regular Expressions (REs)
IST 318 Database Administration
Lecture 23: Regular Expressions
How to cheat at Scrabble An intro to regular expressions
Presentation transcript:

Introducing Oracle Regular Expressions Session id: 40105 Introducing Oracle Regular Expressions Jonathan Gennick, O'Reilly & Associates Peter Linsley, Oracle Corporation

What are Regular Expressions? A language, or syntax, you can use to describe patterns in text Example: [0-9]{3}-[0-9]{4} That which you can describe, you can find and manipulate Unix ed, grep, perl, and now everywhere!

Why Describe Patterns? Humans have long worked with patterns: Postal and email addresses URLs Phone numbers Often it’s not the data that’s important, but the pattern: Bioinformatics Validate format of URLs and email addresses Correct formatting of phone numbers Would be nice to have a more specific statement regarding bioinformatics.

Pre-Oracle Database 10g Find parks with acreage in their descriptions: SELECT * FROM park WHERE description LIKE '%acre%'; Finds '217-acre' and '27 acres', but also ‘few acres’, ‘more acres than all other parks’, 'the location of a massacre', etc.

Pre-Oracle Database 10g cont. Pattern matching with LIKE Limited to only two operators: % and _ OWA_PATTERN No support for alternation, ASCII only, relatively poor performance Non-native solutions External Procedures Difficult to deploy, maintain, and support Client based solutions Pull all that data down across the network

Oracle Database 10g Four regular expression functions REGEXP_LIKE does pattern match? REGEXP_INSTR where does it match? REGEXP_SUBSTR what does it match? REGEXP_REPLACE replace what matched. POSIX Extended Regular Expressions UNIX Regular Expressions Backreference support added Longest match not supported

REGEXP_LIKE Determine whether a pattern exists in a string Revisiting the acreage problem: SELECT * FROM park WHERE REGEXP_LIKE(description, '[0-9]+(-| )acre'); Finds '217-acre' and '27 acres' REJECTS ‘few acres’, ‘more acres than all other parks’, 'the location of a massacre', etc.

Useful for Constraints Filter allowable data with check constraint Only allow alphabetical characters: CREATE TABLE t1 (c1 VARCHAR2(20), CHECK (REGEXP_LIKE(c1, '^[[:alpha:]]+$')));   INSERT INTO t1 VALUES ('newuser');  1 row created. INSERT INTO t1 VALUES ('newuser1');  ORA-02290: check constraint violated

Metacharacters Operator Description . match any character a? match 'a' zero or one time a* match 'a' zero or more times a+ match 'a' one or more times a|b match either 'a' or 'b' a{m,n} match 'a' between m and n times [abc] match either 'a' or 'b' or 'c' (abc) match group 'abc' \n match nth group [:cc:] match character class [.ce.] match collation element [=ec=] match equivalence class

REGEXP_INSTR Find out where a match occurs: SELECT REGEXP_INSTR(description, '[0-9]+(-| )acre') FROM park; REGEXP_INSTR(DESCRIPTION,'[0-9]+… --------------------------------- 6 20 …

REGEXP_SUBSTR Determine what text matched: SELECT REGEXP_SUBSTR(description, '[0-9]+(-| )acre') FROM park; REGEXP_SUBSTR(DESCRIPT ---------------------- 217-acre 27 acre …

REGEXP_SUBSTR Cont To extract just the acreage value: SELECT REGEXP_SUBSTR( REGEXP_SUBSTR(description, '[0-9]+(-| )acre'),'[0-9]+') FROM park; REGEXP_SUBSTR(REGEXP -------------------- 217 27

REGEXP_REPLACE Convert acres to hectares: UPDATE park SET description = REGEXP_REPLACE( description,'([0-9]+)(-| )acre', TO_CHAR(0.4047 * TO_NUMBER( REGEXP_SUBSTR( REGEXP_SUBSTR(description, '[0-9]+(-| )acre'),'[0-9]+'))) || '\2' || 'hectare');

REGEXP_REPLACE Cont. Convert acres to hectares: UPDATE park SET description = REGEXP_REPLACE( description,'([0-9]+)(-| )acre', TO_CHAR(0.4047 * TO_NUMBER( REGEXP_SUBSTR( REGEXP_SUBSTR(description, '[0-9]+(-| )acre'),'[0-9]+'))) || '\2' || 'hectare');

REGEXP_REPLACE Cont. This 217-acre park is wonderful. UPDATE park SET description = REGEXP_REPLACE( description,'([0-9]+)(-| )acre', TO_CHAR(0.4047 * TO_NUMBER( REGEXP_SUBSTR( REGEXP_SUBSTR(description, '[0-9]+(-| )acre'),'[0-9]+'))) || '\2' || 'hectare');

REGEXP_REPLACE Cont. This 217-acre park is wonderful. 217-acre UPDATE park SET description = REGEXP_REPLACE( description,'([0-9]+)(-| )acre', TO_CHAR(0.4047 * TO_NUMBER( REGEXP_SUBSTR( REGEXP_SUBSTR(description, '[0-9]+(-| )acre'),'[0-9]+'))) || '\2' || 'hectare');

REGEXP_REPLACE Cont. This 217-acre park is wonderful. 217-acre 217 UPDATE park SET description = REGEXP_REPLACE( description,'([0-9]+)(-| )acre', TO_CHAR(0.4047 * TO_NUMBER( REGEXP_SUBSTR( REGEXP_SUBSTR(description, '[0-9]+(-| )acre'),'[0-9]+'))) || '\2' || 'hectare');

REGEXP_REPLACE Cont. This 217-acre park is wonderful. 217-acre 217 217 * 0.4047 = 87.8199 UPDATE park SET description = REGEXP_REPLACE( description,'([0-9]+)(-| )acre', TO_CHAR(0.4047 * TO_NUMBER( REGEXP_SUBSTR( REGEXP_SUBSTR(description, '[0-9]+(-| )acre'),'[0-9]+'))) || '\2' || 'hectare');

REGEXP_REPLACE Cont. This 217-acre park is wonderful. 217-acre 217 217 * 0.4047 = 87.8199 87.8199\2hectare UPDATE park SET description = REGEXP_REPLACE( description,'([0-9]+)(-| )acre', TO_CHAR(0.4047 * TO_NUMBER( REGEXP_SUBSTR( REGEXP_SUBSTR(description, '[0-9]+(-| )acre'),'[0-9]+'))) || '\2' || 'hectare');

REGEXP_REPLACE Cont. 1 2 This 217-acre park is wonderful. 217-acre 217 217 * 0.4047 = 87.8199 87.8199\2hectare 87.8199-hectare 1 2 UPDATE park SET description = REGEXP_REPLACE( description,'([0-9]+)(-| )acre', TO_CHAR(0.4047 * TO_NUMBER( REGEXP_SUBSTR( REGEXP_SUBSTR(description, '[0-9]+(-| )acre'),'[0-9]+'))) || '\2' || 'hectare');

REGEXP_REPLACE Cont. This 217-acre park is wonderful. 217-acre 217 217 * 0.4047 = 87.8199 87.8199\2hectare 87.8199-hectare This 87.8199-hectare park is wonderful. UPDATE park SET description = REGEXP_REPLACE( description,'([0-9]+)(-| )acre', TO_CHAR(0.4047 * TO_NUMBER( REGEXP_SUBSTR( REGEXP_SUBSTR(description, '[0-9]+(-| )acre'),'[0-9]+'))) || '\2' || 'hectare');

Oracle Regular Expressions D E M O N S T R A T I O N Oracle Regular Expressions

Performance Pattern matching can be complex Need to compile to state machine Lex and parse Examine all possible branches until match found Compiled once per statement Can be faster than LIKE for complex scenarios Usually faster than PL/SQL equivalent ZIP code checking 5 times faster

Performance Cont. Some poorly-performing expressions: 'a{2}' will be slower than 'aa' '.*b' on input that doesn't contain a 'b' can also be quite time-consuming Mastering Regular Expressions By Jeffrey Friedl Chapter 6, Crafting an Efficient Expression

Using with Indexes Use function-based indexes: CREATE INDEX acre_ind ON park (REGEXP_SUBSTR( REGEXP_SUBSTR(description, '[0-9]+(-| )acre'),'[0-9]+')); To support regular expression queries: SELECT * FROM park WHERE REGEXP_SUBSTR(REGEXP_SUBSTR(description, '[0-9]+(-| )acre'),'[0-9]+') = 217;

Using with Views Hide the complexity from users: CREATE VIEW park_acreage as SELECT park_name, REGEXP_SUBSTR( description, '[0-9]+(-| )acre'), '[0-9]+') acreage FROM park;

Using with PL/SQL REGEXP_LIKE acts as a Boolean function in PL/SQL: IF REGEXP_LIKE(description, '[0-9]+(-| )acre') THEN acres := REGEXP_SUBSTR( REGEXP_SUBSTR(description, '[0-9]+(-| )acre'),'[0-9]+'); ... All other functions act identically in PL/SQL and SQL.

Longest Match vs Greediness Greediness = each element matches as much as possible. For example: SELECT REGEXP_SUBSTR( 'In the beginning','.+[[:space:]]') FROM dual;  In the

Longest Match vs Greediness Longest match = find the variations resulting in the greatest number of matching characters: SELECT REGEXP_SUBSTR('bbb','b|bb') FROM dual;  b SELECT REGEXP_SUBSTR('bbb','bb|b') FROM dual;  bb

Optional Parameters All but REGEXP_LIKE take optional parameters for starting position and occurrence: REGEXP_INSTR (source, pattern, start, occurrence, match) REGEXP_SUBSTR (source, pattern, start, occurrence, match) REGEXP_REPLACE(source, pattern, replace, start, occurrence, match) For example: REGEXP_SUBSTR('description','[^[:space:]]+',1,10)

Match Parameter All functions take an optional match parameter: Is matching case sensitive? Does period (.) match newlines? Is the source string one line or many? The match parameter comes last

Case-sensitivity Case-insensitive search: SELECT * FROM park WHERE REGEXP_LIKE( description, '[0-9]+(-| )acre', 'i');

Newline matching INSERT INTO park VALUES ('Park 6', '640' || CHR(10) || 'ACRE'); SELECT * FROM park WHERE REGEXP_LIKE( description, '[0-9]+.acre', 'in');

Yes! String anchors INSERT INTO employee (surname) VALUES ('Ellison' || CHR(10) || 'Gennick'); SELECT * FROM EMPLOYEE WHERE REGEXP_LIKE( surname,'^Ellison'); Yes!

No! String anchors INSERT INTO employee (surname) VALUES ('Ellison' || CHR(10) || 'Gennick') SELECT * FROM EMPLOYEE WHERE REGEXP_LIKE( surname,'^Gennick'); No!

Yes! String anchors INSERT INTO employee (surname) VALUES ('Ellison' || CHR(10) || 'Gennick') SELECT * FROM EMPLOYEE WHERE REGEXP_LIKE( surname,'^Gennick','m'); Yes!

Locale Support Full Locale Support All character sets All languages Case and accent insensitive searching Linguistic range Character classes Collation elements Equivalence classes

Character Sets and Languages For example, you can search for Ukrainian names beginning with Ґ and ending with к: SELECT * FROM employee WHERE REGEXP_LIKE( surname, '^Ґ[[:alpha:]]*к$','n');

Case- and Accent-Insensitive Searching Respect for NLS settings: ALTER SESSION SET NLS_SORT = GENERIC_BASELETTER; With this sort, case won't matter and an expression such as: REGEXP_INSTR(x,'resume') will find "resume", "résumé", "Résume", etc.

Linguistic Range Ranges respect NLS_SORT settings: a,b,c…z [a-z] NLS_SORT=GERMAN [a-z] a,A,b,B,c,C…z,Z NLS_SORT=GERMAN_CI

Character Classes Character classes such as [:alpha:] and [:digit:] encompass more than just Latin characters. For example, [:digit:] matches: Latin 0 through 9 Arabic-Indic٠through ٩ And more

Collation Elements ALTER SESSION SET NLS_SORT=XSPANISH; SELECT REGEXP_SUBSTR( 'El caballo, Chico come la tortilla.', '[[:alpha:]]*[ch][[:alpha:]]*', 1,1,'i') FROM dual; caballo

Collation Elements ALTER SESSION SET NLS_SORT=XSPANISH; SELECT REGEXP_SUBSTR( 'El caballo, Chico come la tortilla.', '[[:alpha:]]*[[.ch.]][[:alpha:]]*', 1,1,'i') FROM dual; Chico

Equivalence Classes Ignore case and accents without changing NLS_SORT: REGEXP_INSTR(x,'r[[=e=]]sum[[=e=]]') Finds 'resume', 'résumé', and 'rEsumE'

Conclusion String searching and manipulation is at the heart of a great many applications Oracle Regular Expressions provide versatile string manipulation in the database instead of externalized in middle tier logic They are Locale sensitive and support character large objects Available in both SQL and PL/SQL

Next Steps…. Recommended sessions Session #40088 New SQL Capabilities Session #40202 Oracle HTML DB Recommended demos and/or hands-on labs Database Globalization Pod R See Your Business in Our Software Visit the DEMOgrounds for a customized architectural review, see a customized demo with Solutions Factory, or receive a personalized proposal. Visit the DEMOgrounds for more information. Relevant web sites to visit for more information http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap09.html

Shameless Plug Oracle Regular Expressions Pocket Reference Jonathan Gennick & Peter Linsley Free! At the O'Reilly & Associaties Booth