Download presentation
Presentation is loading. Please wait.
1
PolyAnalyst Web Report Training
Manipulating Text Data in PolyAnalyst - Text Extraction and Regular Expressions PolyAnalyst Web Report Training Megaputer Intelligence © 2014 Megaputer Intelligence Inc.
2
Outline Agenda Extract Terms node Basics of Regular Expression
Example of Regex with PolyAnalyst
3
Outline Agenda Extract Terms node Basics of Regular Expression
Example of Regex with PolyAnalyst
4
Extract Terms Node Extract text segments from a column using Regular Expressions
5
Extract Terms Node Extract text segments from a column using Regular Expressions
6
Extract Terms Node Select Text or String Columns
7
Extract Terms Node Add a new rule
8
Extract Terms Node Simplest Regex Rule Case Insensitive
9
Extract Terms Node
10
Outline Agenda Extract Terms node Basics of Regular Expression
Example of Regex with PolyAnalyst
11
Outline Basics of Regular Expression
The simplest regex is simply a string of characters: Simplest Regex Rule
12
Outline Basics of Regular Expression If we expand it to:
Then it fails!
13
Basics of Regular Expression
Outline \s represents a space
14
Basics of Regular Expression
Outline PDL Phrase(parking, lot)
15
Outline Basics of Regular Expression Vertical Bar | represents “or”
Parentheses () represent grouping
16
Outline Basics of Regular Expression \d matches for any digit (0 to 9)
Plus sign + denotes one or more matches
17
Basics of Regular Expression
Outline
18
Outline Basics of Regular Expression
Question mark ? denotes: zero or one match Asterisk * denotes: zero or more matches
19
Basics of Regular Expression
Outline
20
Outline Other Useful Syntax
Dot . matches for any character except newline Caret ^ denotes beginning of string Dollar sign $ denotes end of string Curly brackets {} denotes exact number of match. For example: w{3} match for www p{1,5} match for happy or happpppy
21
{ } [ ] ( ) ^ $ . | * + ? \ \$\d+\.\d+ = $19.99 Outline Metacharacters
Some characters are reserved for use in regex notation The metacharacters are: { } [ ] ( ) ^ $ . | * + ? \ For example: \$\d+\.\d+ = $19.99
22
Outline More? PolyAnalyst Help Manual Online Resources
Test and see the highlights
23
Outline Agenda Extract Terms node Basics of Regular Expression
Example of Regex with PolyAnalyst
24
Outline Extract [Age] of Suspect
Other than groupings, parentheses () are also used for storing
25
Extract and Sort [Age] Outline
26
Clean Up Text / String Columns
Outline
27
Outline Clean Up Text / String Columns
.* matches for any number of characters except newline
28
Clean Up Text / String Columns
Outline
29
Clean Up Text / String Columns
Outline
30
Delimiter and Extraction
Outline
31
Outline Delimiter and Extraction
\w matches for any alpha numeric character and the underscore character: [A-Z] [a-z] [0-9] _
32
Delimiter and Extraction
Outline
33
Delimiter and Extraction
Outline
34
Delimiter and Extraction
Outline
35
Outline Delimiter and Extraction
Other than groupings, parentheses () are also used for storing
36
Delimiter and Extraction
Outline
37
Replace Terms Node Find and replace patterns of characters in one or more string or text columns.
38
Data Redaction Outline
39
Regex in Replace Terms Node
40
Data Redaction Outline
41
Regex in Replace Terms Node
42
Regex in Replace Terms Node
43
Contacting Megaputer Questions?
44
An Example of Regular Expression with a Web Scraping Project
Appendix: An Example of Regular Expression with a Web Scraping Project of Glassdoor Data Contacting Megaputer
45
Polish the Information
46
Remove Unnecessary Info
(?s) denotes “treat everything on the same line”
47
Find a Delimiter For forums or blogs with multiple posts in one webpage Find ways to identify common patterns
48
Separate Records of Info
49
Find a Delimiter
50
Find a Delimiter
51
Records Separated!
52
Different Ways to Extract Data
Right from the parsed text Option to work on raw HTML codes
53
Data Extraction – Parsed Text
Title of Review Location Job Title Date & Time
54
Data Extraction – Parsed Text
55
Data Extraction – Raw HTML Codes
Title of Review Job Title Location
56
Resulting Dataset Outline
57
Making Good Use of the Info
58
Contacting Megaputer Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.