Download presentation
Presentation is loading. Please wait.
2
Introducing Oracle Regular Expressions
Session id: 40105 Introducing Oracle Regular Expressions Jonathan Gennick, O'Reilly & Associates Peter Linsley, Oracle Corporation
3
What are Regular Expressions?
A language, or syntax, you can use to describe patterns in text Example: [0-9]{3}-[0-9]{4} That which you can describe, you can find and manipulate Unix ed, grep, perl, and now everywhere!
4
Why Describe Patterns? Humans have long worked with patterns:
Postal and addresses URLs Phone numbers Often it’s not the data that’s important, but the pattern: Bioinformatics Validate format of URLs and addresses Correct formatting of phone numbers Would be nice to have a more specific statement regarding bioinformatics.
5
Pre-Oracle Database 10g Find parks with acreage in their descriptions:
SELECT * FROM park WHERE description LIKE '%acre%'; Finds '217-acre' and '27 acres', but also ‘few acres’, ‘more acres than all other parks’, 'the location of a massacre', etc.
6
Pre-Oracle Database 10g cont.
Pattern matching with LIKE Limited to only two operators: % and _ OWA_PATTERN No support for alternation, ASCII only, relatively poor performance Non-native solutions External Procedures Difficult to deploy, maintain, and support Client based solutions Pull all that data down across the network
7
Oracle Database 10g Four regular expression functions
REGEXP_LIKE does pattern match? REGEXP_INSTR where does it match? REGEXP_SUBSTR what does it match? REGEXP_REPLACE replace what matched. POSIX Extended Regular Expressions UNIX Regular Expressions Backreference support added Longest match not supported
8
REGEXP_LIKE Determine whether a pattern exists in a string
Revisiting the acreage problem: SELECT * FROM park WHERE REGEXP_LIKE(description, '[0-9]+(-| )acre'); Finds '217-acre' and '27 acres' REJECTS ‘few acres’, ‘more acres than all other parks’, 'the location of a massacre', etc.
9
Useful for Constraints
Filter allowable data with check constraint Only allow alphabetical characters: CREATE TABLE t1 (c1 VARCHAR2(20), CHECK (REGEXP_LIKE(c1, '^[[:alpha:]]+$'))); INSERT INTO t1 VALUES ('newuser'); 1 row created. INSERT INTO t1 VALUES ('newuser1'); ORA-02290: check constraint violated
10
Metacharacters Operator Description . match any character
a? match 'a' zero or one time a* match 'a' zero or more times a+ match 'a' one or more times a|b match either 'a' or 'b' a{m,n} match 'a' between m and n times [abc] match either 'a' or 'b' or 'c' (abc) match group 'abc' \n match nth group [:cc:] match character class [.ce.] match collation element [=ec=] match equivalence class
11
REGEXP_INSTR Find out where a match occurs:
SELECT REGEXP_INSTR(description, '[0-9]+(-| )acre') FROM park; REGEXP_INSTR(DESCRIPTION,'[0-9]+… 6 20 …
12
REGEXP_SUBSTR Determine what text matched:
SELECT REGEXP_SUBSTR(description, '[0-9]+(-| )acre') FROM park; REGEXP_SUBSTR(DESCRIPT 217-acre 27 acre …
13
REGEXP_SUBSTR Cont To extract just the acreage value:
SELECT REGEXP_SUBSTR( REGEXP_SUBSTR(description, '[0-9]+(-| )acre'),'[0-9]+') FROM park; REGEXP_SUBSTR(REGEXP 217 27
14
REGEXP_REPLACE Convert acres to hectares: UPDATE park
SET description = REGEXP_REPLACE( description,'([0-9]+)(-| )acre', TO_CHAR( * TO_NUMBER( REGEXP_SUBSTR( REGEXP_SUBSTR(description, '[0-9]+(-| )acre'),'[0-9]+'))) || '\2' || 'hectare');
15
REGEXP_REPLACE Cont. Convert acres to hectares: UPDATE park
SET description = REGEXP_REPLACE( description,'([0-9]+)(-| )acre', TO_CHAR( * TO_NUMBER( REGEXP_SUBSTR( REGEXP_SUBSTR(description, '[0-9]+(-| )acre'),'[0-9]+'))) || '\2' || 'hectare');
16
REGEXP_REPLACE Cont. This 217-acre park is wonderful. UPDATE park
SET description = REGEXP_REPLACE( description,'([0-9]+)(-| )acre', TO_CHAR( * TO_NUMBER( REGEXP_SUBSTR( REGEXP_SUBSTR(description, '[0-9]+(-| )acre'),'[0-9]+'))) || '\2' || 'hectare');
17
REGEXP_REPLACE Cont. This 217-acre park is wonderful. 217-acre
UPDATE park SET description = REGEXP_REPLACE( description,'([0-9]+)(-| )acre', TO_CHAR( * TO_NUMBER( REGEXP_SUBSTR( REGEXP_SUBSTR(description, '[0-9]+(-| )acre'),'[0-9]+'))) || '\2' || 'hectare');
18
REGEXP_REPLACE Cont. This 217-acre park is wonderful. 217-acre 217
UPDATE park SET description = REGEXP_REPLACE( description,'([0-9]+)(-| )acre', TO_CHAR( * TO_NUMBER( REGEXP_SUBSTR( REGEXP_SUBSTR(description, '[0-9]+(-| )acre'),'[0-9]+'))) || '\2' || 'hectare');
19
REGEXP_REPLACE Cont. This 217-acre park is wonderful. 217-acre 217
217 * = UPDATE park SET description = REGEXP_REPLACE( description,'([0-9]+)(-| )acre', TO_CHAR( * TO_NUMBER( REGEXP_SUBSTR( REGEXP_SUBSTR(description, '[0-9]+(-| )acre'),'[0-9]+'))) || '\2' || 'hectare');
20
REGEXP_REPLACE Cont. This 217-acre park is wonderful. 217-acre 217
217 * = \2hectare UPDATE park SET description = REGEXP_REPLACE( description,'([0-9]+)(-| )acre', TO_CHAR( * TO_NUMBER( REGEXP_SUBSTR( REGEXP_SUBSTR(description, '[0-9]+(-| )acre'),'[0-9]+'))) || '\2' || 'hectare');
21
REGEXP_REPLACE Cont. 1 2 This 217-acre park is wonderful. 217-acre 217
217 * = \2hectare hectare 1 2 UPDATE park SET description = REGEXP_REPLACE( description,'([0-9]+)(-| )acre', TO_CHAR( * TO_NUMBER( REGEXP_SUBSTR( REGEXP_SUBSTR(description, '[0-9]+(-| )acre'),'[0-9]+'))) || '\2' || 'hectare');
22
REGEXP_REPLACE Cont. This 217-acre park is wonderful. 217-acre 217
217 * = \2hectare hectare This hectare park is wonderful. UPDATE park SET description = REGEXP_REPLACE( description,'([0-9]+)(-| )acre', TO_CHAR( * TO_NUMBER( REGEXP_SUBSTR( REGEXP_SUBSTR(description, '[0-9]+(-| )acre'),'[0-9]+'))) || '\2' || 'hectare');
23
Oracle Regular Expressions
D E M O N S T R A T I O N Oracle Regular Expressions
24
Performance Pattern matching can be complex
Need to compile to state machine Lex and parse Examine all possible branches until match found Compiled once per statement Can be faster than LIKE for complex scenarios Usually faster than PL/SQL equivalent ZIP code checking 5 times faster
25
Performance Cont. Some poorly-performing expressions:
'a{2}' will be slower than 'aa' '.*b' on input that doesn't contain a 'b' can also be quite time-consuming Mastering Regular Expressions By Jeffrey Friedl Chapter 6, Crafting an Efficient Expression
26
Using with Indexes Use function-based indexes:
CREATE INDEX acre_ind ON park (REGEXP_SUBSTR( REGEXP_SUBSTR(description, '[0-9]+(-| )acre'),'[0-9]+')); To support regular expression queries: SELECT * FROM park WHERE REGEXP_SUBSTR(REGEXP_SUBSTR(description, '[0-9]+(-| )acre'),'[0-9]+') = 217;
27
Using with Views Hide the complexity from users:
CREATE VIEW park_acreage as SELECT park_name, REGEXP_SUBSTR( description, '[0-9]+(-| )acre'), '[0-9]+') acreage FROM park;
28
Using with PL/SQL REGEXP_LIKE acts as a Boolean function in PL/SQL:
IF REGEXP_LIKE(description, '[0-9]+(-| )acre') THEN acres := REGEXP_SUBSTR( REGEXP_SUBSTR(description, '[0-9]+(-| )acre'),'[0-9]+'); ... All other functions act identically in PL/SQL and SQL.
29
Longest Match vs Greediness
Greediness = each element matches as much as possible. For example: SELECT REGEXP_SUBSTR( 'In the beginning','.+[[:space:]]') FROM dual; In the
30
Longest Match vs Greediness
Longest match = find the variations resulting in the greatest number of matching characters: SELECT REGEXP_SUBSTR('bbb','b|bb') FROM dual; b SELECT REGEXP_SUBSTR('bbb','bb|b') FROM dual; bb
31
Optional Parameters All but REGEXP_LIKE take optional parameters for starting position and occurrence: REGEXP_INSTR (source, pattern, start, occurrence, match) REGEXP_SUBSTR (source, pattern, start, occurrence, match) REGEXP_REPLACE(source, pattern, replace, start, occurrence, match) For example: REGEXP_SUBSTR('description','[^[:space:]]+',1,10)
32
Match Parameter All functions take an optional match parameter:
Is matching case sensitive? Does period (.) match newlines? Is the source string one line or many? The match parameter comes last
33
Case-sensitivity Case-insensitive search: SELECT * FROM park
WHERE REGEXP_LIKE( description, '[0-9]+(-| )acre', 'i');
34
Newline matching INSERT INTO park VALUES ('Park 6',
'640' || CHR(10) || 'ACRE'); SELECT * FROM park WHERE REGEXP_LIKE( description, '[0-9]+.acre', 'in');
35
Yes! String anchors INSERT INTO employee (surname)
VALUES ('Ellison' || CHR(10) || 'Gennick'); SELECT * FROM EMPLOYEE WHERE REGEXP_LIKE( surname,'^Ellison'); Yes!
36
No! String anchors INSERT INTO employee (surname)
VALUES ('Ellison' || CHR(10) || 'Gennick') SELECT * FROM EMPLOYEE WHERE REGEXP_LIKE( surname,'^Gennick'); No!
37
Yes! String anchors INSERT INTO employee (surname)
VALUES ('Ellison' || CHR(10) || 'Gennick') SELECT * FROM EMPLOYEE WHERE REGEXP_LIKE( surname,'^Gennick','m'); Yes!
38
Locale Support Full Locale Support
All character sets All languages Case and accent insensitive searching Linguistic range Character classes Collation elements Equivalence classes
39
Character Sets and Languages
For example, you can search for Ukrainian names beginning with Ґ and ending with к: SELECT * FROM employee WHERE REGEXP_LIKE( surname, '^Ґ[[:alpha:]]*к$','n');
40
Case- and Accent-Insensitive Searching
Respect for NLS settings: ALTER SESSION SET NLS_SORT = GENERIC_BASELETTER; With this sort, case won't matter and an expression such as: REGEXP_INSTR(x,'resume') will find "resume", "résumé", "Résume", etc.
41
Linguistic Range Ranges respect NLS_SORT settings: a,b,c…z [a-z]
NLS_SORT=GERMAN [a-z] a,A,b,B,c,C…z,Z NLS_SORT=GERMAN_CI
42
Character Classes Character classes such as [:alpha:] and [:digit:] encompass more than just Latin characters. For example, [:digit:] matches: Latin 0 through 9 Arabic-Indic٠through ٩ And more
43
Collation Elements ALTER SESSION SET NLS_SORT=XSPANISH;
SELECT REGEXP_SUBSTR( 'El caballo, Chico come la tortilla.', '[[:alpha:]]*[ch][[:alpha:]]*', 1,1,'i') FROM dual; caballo
44
Collation Elements ALTER SESSION SET NLS_SORT=XSPANISH;
SELECT REGEXP_SUBSTR( 'El caballo, Chico come la tortilla.', '[[:alpha:]]*[[.ch.]][[:alpha:]]*', 1,1,'i') FROM dual; Chico
45
Equivalence Classes Ignore case and accents without changing NLS_SORT:
REGEXP_INSTR(x,'r[[=e=]]sum[[=e=]]') Finds 'resume', 'résumé', and 'rEsumE'
46
Conclusion String searching and manipulation is at the heart of a great many applications Oracle Regular Expressions provide versatile string manipulation in the database instead of externalized in middle tier logic They are Locale sensitive and support character large objects Available in both SQL and PL/SQL
47
Next Steps…. Recommended sessions
Session #40088 New SQL Capabilities Session #40202 Oracle HTML DB Recommended demos and/or hands-on labs Database Globalization Pod R See Your Business in Our Software Visit the DEMOgrounds for a customized architectural review, see a customized demo with Solutions Factory, or receive a personalized proposal. Visit the DEMOgrounds for more information. Relevant web sites to visit for more information
48
Shameless Plug Oracle Regular Expressions Pocket Reference
Jonathan Gennick & Peter Linsley Free! At the O'Reilly & Associaties Booth
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.