PolyAnalyst Web Report Training

Slides:



Advertisements
Similar presentations
2-1. Today’s Lecture Review Chapter 4 Go over exercises.
Advertisements

Searching using regular expressions. A regular expression is also a ‘special text string’ for describing a search pattern. Regular expressions define.
CSCI 6962: Server-side Design and Programming Input Validation and Error Handling.
Regular Expressions in Perl By Josue Vazquez. What are Regular Expressions? A template that either matches or doesn’t match a given string. Often called.
Regular Expression Original Notes by Song Guo. What Regular Expressions Are Exactly - Terminology a regular expression is a pattern describing a certain.
ISBN Regular expressions Mastering Regular Expressions by Jeffrey E. F. Friedl –(on reserve.
Asp.NET Core Vaidation Controls. Slide 2 ASP.NET Validation Controls (Introduction) The ASP.NET validation controls can be used to validate data on the.
LING 388: Language and Computers Sandiway Fong Lecture 2: 8/23.
CS 330 Programming Languages 10 / 10 / 2006 Instructor: Michael Eckmann.
Regular Expressions In ColdFusion and Studio. Definitions String - Any collection of 0 or more characters. Example: “This is a String” SubString - A segment.
Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Enterprise Data Quality CDEP: Tailoring Parser Configuration.
UNIX Filters.
Regular Expressions in ColdFusion Applications Dave Fauth DOMAIN technologies Knowledge Engineering : Systems Integration : Web.
Regular Language & Expressions. Regular Language A regular language is one that a finite state machine (fsm) will accept. ‘Alphabet’: {a, b} ‘Rules’:
Last Updated March 2006 Slide 1 Regular Expressions.
Regular Expressions Week 07 TCNJ Web 2 Jean Chu. Regular Expressions Regular Expressions are a powerful way to validate and format text strings that may.
Overview of the grep Command Alex Dukhovny CS 265 Spring 2011.
Regular Expression Darby Tien-Hao Chang (a.k.a. dirty) Department of Electrical Engineering, National Cheng Kung University.
Pattern matching with regular expressions A common file processing requirement is to match strings within the file to a standard form, e.g. address.
 Text Manipulation and Data Collection. General Programming Practice Find a string within a text Find a string ‘man’ from a ‘A successful man’
PHP Workshop ‹#› Data Manipulation & Regex. PHP Workshop ‹#› What..? Often in PHP we have to get data from files, or maybe through forms from a user.
Computer Programming for Biologists Class 5 Nov 20 st, 2014 Karsten Hokamp
INFO 320 Server Technology I Week 7 Regular expressions 1INFO 320 week 7.
ASP.NET Programming with C# and SQL Server First Edition Chapter 5 Manipulating Strings with C#
Regular Expressions CSC207 – Software Design. Motivation Handling white space –A program ought to be able to treat any number of white space characters.
When you read a sentence, your mind breaks it into tokens—individual words and punctuation marks that convey meaning. Compilers also perform tokenization.
Regular Expressions for PHP Adding magic to your programming. Geoffrey Dunn
ECA 225 Applied Interactive Programming1 ECA 225 Applied Online Programming regular expressions.
Satisfy Your Technical Curiosity Regular Expressions Roy Osherove Methodology & Team System Expert Sela Group The.
Regular Expressions What is this line all about? while (!($search =~ /^\s*$/)) { It’s a string search just like before, but with a huge twist – regular.
Copyright © 2003 Pearson Education, Inc. Slide 6a-1 The Web Wizard’s Guide to PHP by David Lash.
Appendix A: Regular Expressions It’s All Greek to Me.
CS 330 Programming Languages 10 / 02 / 2007 Instructor: Michael Eckmann.
R EGULAR E XPRESSION IN P ERL (P ART 1) Thach Nguyen.
Regular Expressions CS 2204 Class meeting 6 Created by Doug Bowman, 2001 Modified by Mir Farooq Ali, 2002.
1 Validating user input is the bane of every software developer’s existence. When you are developing cross-browser web applications (IE4+ and NS4+) this.
Regular Expressions Pattern and String Matching in Text.
Unit 11 –Reglar Expressions Instructor: Brent Presley.
7 Copyright © 2009, Oracle. All rights reserved. Regular Expression Support.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. ADVANCED.
CS 614: Theory and Construction of Compilers Lecture 5 Fall 2003 Department of Computer Science University of Alabama Joel Jones.
Introduction to Programming the WWW I CMSC Winter 2004 Lecture 13.
Algebra 1A Vocabulary 1-2 Review Problem 5 Suppose you draw a segment from any one vertex of a regular polygon to the other vertices. A sample for a regular.
OOP Tirgul 11. What We’ll Be Seeing Today  Regular Expressions Basics  Doing it in Java  Advanced Regular Expressions  Summary 2.
Chapter 18 The HTML Tag
Lesson 4 String Manipulation. Lesson 4 In many applications you will need to do some kind of manipulation or parsing of strings, whether you are Attempting.
Hands-on Regular Expressions Simple rules for powerful changes.
Regular Expressions Copyright Doug Maxwell (
RE Tutorial.
Lesson 5-Exploring Utilities
Parsing 2 of 4: Scanner and Parsing
Strings and Serialization
Looking for Patterns - Finding them with Regular Expressions
Regular Expressions in Perl
Regular Expressions and perl
Chapter 19 PHP Part II Credits: Parts of the slides are based on slides created by textbook authors, P.J. Deitel and H. M. Deitel by Prentice Hall ©
Session III Chapter 6 – Creating DTDs
SAS in Data Cleaning.
SQL Text Manipulation Farrokh Alemi, Ph.D.
CSCI 431 Programming Languages Fall 2003
Data Manipulation & Regex
PolyAnalyst Web Report Training
PolyAnalyst Web Report Training
PolyAnalyst Web Report Training
Session II Chapter 6 – Creating DTDs
PolyAnalyst Web Report Training
Regular Expression: Pattern Matching
PolyAnalyst Web Report Training
ADVANCE FIND & REPLACE WITH REGULAR EXPRESSIONS
PolyAnalyst Web Report Training
Presentation transcript:

PolyAnalyst Web Report Training Manipulating Text Data in PolyAnalyst - Text Extraction and Regular Expressions PolyAnalyst Web Report Training Megaputer Intelligence www.megaputer.com © 2014 Megaputer Intelligence Inc.

Outline Agenda Extract Terms node Basics of Regular Expression Example of Regex with PolyAnalyst

Outline Agenda Extract Terms node Basics of Regular Expression Example of Regex with PolyAnalyst

Extract Terms Node Extract text segments from a column using Regular Expressions

Extract Terms Node Extract text segments from a column using Regular Expressions

Extract Terms Node Select Text or String Columns

Extract Terms Node Add a new rule

Extract Terms Node Simplest Regex Rule Case Insensitive

Extract Terms Node

Outline Agenda Extract Terms node Basics of Regular Expression Example of Regex with PolyAnalyst

Outline Basics of Regular Expression The simplest regex is simply a string of characters: Simplest Regex Rule

Outline Basics of Regular Expression If we expand it to: Then it fails!

Basics of Regular Expression Outline \s represents a space

Basics of Regular Expression Outline PDL Phrase(parking, lot)

Outline Basics of Regular Expression Vertical Bar | represents “or” Parentheses () represent grouping

Outline Basics of Regular Expression \d matches for any digit (0 to 9) Plus sign + denotes one or more matches

Basics of Regular Expression Outline

Outline Basics of Regular Expression Question mark ? denotes: zero or one match Asterisk * denotes: zero or more matches

Basics of Regular Expression Outline

Outline Other Useful Syntax Dot . matches for any character except newline Caret ^ denotes beginning of string Dollar sign $ denotes end of string Curly brackets {} denotes exact number of match. For example: w{3} match for www p{1,5} match for happy or happpppy

{ } [ ] ( ) ^ $ . | * + ? \ \$\d+\.\d+ = $19.99 Outline Metacharacters Some characters are reserved for use in regex notation The metacharacters are: { } [ ] ( ) ^ $ . | * + ? \ For example: \$\d+\.\d+ = $19.99

Outline More? PolyAnalyst Help Manual Online Resources http://en.wikipedia.org/wiki/Regular_expression http://www.regular-expressions.info/ Test and see the highlights http://www.regexr.com/

Outline Agenda Extract Terms node Basics of Regular Expression Example of Regex with PolyAnalyst

Outline Extract [Age] of Suspect Other than groupings, parentheses () are also used for storing

Extract and Sort [Age] Outline

Clean Up Text / String Columns Outline

Outline Clean Up Text / String Columns .* matches for any number of characters except newline

Clean Up Text / String Columns Outline

Clean Up Text / String Columns Outline

Delimiter and Extraction Outline

Outline Delimiter and Extraction \w matches for any alpha numeric character and the underscore character: [A-Z] [a-z] [0-9] _

Delimiter and Extraction Outline

Delimiter and Extraction Outline

Delimiter and Extraction Outline

Outline Delimiter and Extraction Other than groupings, parentheses () are also used for storing

Delimiter and Extraction Outline

Replace Terms Node Find and replace patterns of characters in one or more string or text columns.

Data Redaction Outline

Regex in Replace Terms Node

Data Redaction Outline

Regex in Replace Terms Node

Regex in Replace Terms Node

Contacting Megaputer Questions?

An Example of Regular Expression with a Web Scraping Project Appendix: An Example of Regular Expression with a Web Scraping Project of Glassdoor Data Contacting Megaputer

Polish the Information

Remove Unnecessary Info (?s) denotes “treat everything on the same line”

Find a Delimiter For forums or blogs with multiple posts in one webpage Find ways to identify common patterns

Separate Records of Info

Find a Delimiter

Find a Delimiter

Records Separated!

Different Ways to Extract Data Right from the parsed text Option to work on raw HTML codes

Data Extraction – Parsed Text Title of Review Location Job Title Date & Time

Data Extraction – Parsed Text

Data Extraction – Raw HTML Codes Title of Review Job Title Location

Resulting Dataset Outline

Making Good Use of the Info

Contacting Megaputer Questions?