PolyAnalyst Web Report Training

Slides:



Advertisements
Similar presentations
XML: Extensible Markup Language
Advertisements

PolyAnalyst Data and Text Mining tool Your Knowledge Partner TM www
CS16 Week 2 Part 2 Kyle Dewey. Overview Type coercion and casting More on assignment Pre/post increment/decrement scanf Constants Math library Errors.
Information Retrieval in Practice
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
Overview of Search Engines
Overview of XPath Author: Dan McCreary Date: October, 2008 Version: 0.2 with TEI Examples M D.
Chapter 4 Query Languages.... Introduction Cover different kinds of queries posed to text retrieval systems Keyword-based query languages  include simple.
A lesson approach © 2011 The McGraw-Hill Companies, Inc. All rights reserved. a lesson approach Microsoft® Access 2010 © 2011 The McGraw-Hill Companies,
ECA 228 Internet/Intranet Design I Meta Tags & Directories.
Microsoft Access 2010 Building and Using Queries.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Structure Query Language SQL. Database Terminology Employee ID 3 3 Last name Small First name Tony 5 5 Smith James
EXist Indexing Using the right index for you data Date: 9/29/2008 Dan McCreary President Dan McCreary & Associates (952) M.
1 CS 430: Information Discovery Lecture 3 Inverted Files.
ITCS373: Internet Technology Lecture 5: More HTML.
Logic (continuation) Boolean Logic and Bit Operations.
Introduction to XML This presentation covers introductory features of XML. What XML is and what it is not? What does it do? Put different related technologies.
4 1 SEARCHING THE WEB Using Search Engines and Directories Effectively New Perspectives on THE INTERNET.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Albert Gatt LIN3021 Formal Semantics Lecture 4. In this lecture Compositionality in Natural Langauge revisited: The role of types The typed lambda calculus.
University of Sheffield, NLP Module 6: ANNIC Kalina Bontcheva © The University of Sheffield, This work is licensed under the Creative Commons.
Created by Branden Maglio and Flynn Castellanos Team BFMMA.
SEMANTIC VERIFICATION IN AN ONLINE FACT SEEKING ENVIRONMENT DMITRI ROUSSINOV, OZGUR TURETKEN Speaker: Li, HueiJyun Advisor: Koh, JiaLing Date: 2008/5/1.
Gollis University Faculty of Computer Engineering Chapter Five: Retrieval, Functions Instructor: Mukhtar M Ali “Hakaale” BCS.
 Every word matters. Generally, all the words you put in the query will be used.  Search is always case insensitive. A search for [ new york times ]
To query or not to query! Review of search techniques, methods and …tricks Part of this presentation is adapted from:
LECTURE 10 Semantic Analysis. REVIEW So far, we’ve covered the following: Compilation methods: compilation vs. interpretation. The overall compilation.
Information Retrieval in Practice
This Week’s Agenda APA style: -In-text citation -Reference List
CC La Web de Datos Primavera 2017 Lecture 7: SPARQL [i]
Search Engine Architecture
A Simple Syntax-Directed Translator
Lexical Analysis CSE 340 – Principles of Programming Languages
Information Science and Engineering
Query Models Use Types What do search engines do.
 2012 Pearson Education, Inc. All rights reserved.
Knowledge and reasoning – second part
Text Based Information Retrieval
Perl Programming Language Design and Implementation (4th Edition)
Why the interest in Queries?
What Are They? Who Needs ‘em? An Example: Scoring in Tennis
PolyAnalyst Data and Text Mining tool
Multimedia Information Retrieval
Representing Structure and Behavior with Trees
Thanks to Bill Arms, Marti Hearst
CC La Web de Datos Primavera 2016 Lecture 7: SPARQL (1.0)
Lecture 15 (Notes by P. N. Hilfinger and R. Bodik)
Review: Compiler Phases:
CSE 341 Section 7 Winter 2018 Adapted from slides by Eric Mullen, Nicholas Shahan, Dan Grossman, and Tam Dang.
What Are They? Who Needs ‘em? An Example: Scoring in Tennis
Knowledge and reasoning – second part
The written assignment
Text Mining & Natural Language Processing
Language and Learning Introduction to Artificial Intelligence COS302
Text Mining & Natural Language Processing
PolyAnalyst Web Report Training
PolyAnalyst Web Report Training
Introduction to Text Analysis
PolyAnalyst Web Report Training
PolyAnalyst Web Report Training
PolyAnalyst Web Report Training
PolyAnalyst Web Report Training
Databases and Information Management
Important Problem Types and Fundamental Data Structures
PolyAnalyst Web Report Training
PolyAnalyst Web Report Training
PolyAnalyst Web Report Training
PolyAnalyst™ text mining tool Allstate Insurance example
Presentation transcript:

PolyAnalyst Web Report Training PolyAnalyst PDL PolyAnalyst Web Report Training Title page Megaputer Intelligence megaputer.com © 2014 Megaputer Intelligence Inc.

Agenda Outline An overview of PDL PDL bits and pieces

Outline PDL Overview What is PDL? Pattern Definition Language. What does PDL do? Defines text patterns: expressions matching the text that you are looking for.

Outline PDL Overview What does PDL do? (An example) Data: PDL expression: Result:

Outline PDL Overview Why do we need PDL? To match the right texts, and only the right texts, with a concise and intuitive syntax, at a high speed. Functionality Accuracy Simplicity Efficiency

Outline PDL Overview Why do we need PDL? PDL gets the job done accurately, easily, and efficiently.

Outline PDL Overview How does PDL do it? 1: Indexing Splits texts into paragraphs, sentences, and words. Obtains the frequency and location info. Assigns POS tags.

Outline PDL Overview How does PDL do it? 1: Indexing The notion of tokens: A token is a sequence of indexed characters. It is the base unit on which the search engine works.

Outline PDL Overview How does PDL do it? 1: Indexing The notion of tokens:

PDL Overview Outline How does PDL do it? 2: Dictionaries

Outline PDL Overview How does PDL do it? 2: Dictionaries Containers of lists of words, relations between words, and properties about the words and the relations. Language specific. Can use dictionaries to alter the results of text analysis nodes.

Outline PDL Overview How does PDL do it? 2: Dictionaries Data: PDL expression: Regular expression: Wildcard expression:

Outline PDL Overview How does PDL do it? 2: Dictionaries Data: PDL expression: Regular expression: Wildcard expression:

Outline PDL Overview Where is PDL used? - Search Query - Taxonomy - Dim. Matrix - Link Terms Where is PDL used? - Search Query

Outline PDL Overview Where is PDL used? - Search Query - Taxonomy - Dim. Matrix - Link Terms

Outline PDL Overview Where is PDL used? - Search Query - Taxonomy - Dim. Matrix - Link Terms

Outline PDL Overview Where is PDL used? - Search Query - Taxonomy - Dim. Matrix - Link Terms

Outline PDL Overview Two main types of PDL functions Semantic functions Use dictionaries to generate sets of word forms. Language dependent. Scoping functions Search for tokens within a given scope.

Outline PDL Overview Semantic functions antonym() associate() entity() generalize() hold() negate() part() possible() thesaurus() term() related() singleroot() stem()

Outline PDL Overview Scoping functions except() follow() header() near() paragraph() pattern() phrase() position() sentence()

Outline PDL Overview General forms of PDL functions fn_name(term[,…]) fn_name(term,term2,term3,…) negate(allow,available) fn_name([N,]term[,…]) fn_name(N,term,term2,term3,…) sentence(2,school,art)

Outline PDL Overview General forms of PDL functions fn_name(term,term2,term3,…) fn_name(N,term,term2,term3,…) term: a function, or a token, or a seq. of functions or tokens, w/ or w/o operators. and xor or not & / |

Outline PDL Overview General forms of PDL functions fn_name(term,term2,term3,…) fn_name(N,term,term2,term3,…) sentence(high,school,art) sentence(2,phrase(high,school),art) sentence(high,school,art or sport)

Outline PDL Overview PDL macros and variables PDL macros Custom PDL functions To simplify functional forms E.g.: macro(snear3,term,term2) ≡ sentence(near(3,term,term2))

PDL Overview Outline PDL macros and variables

Outline PDL Overview PDL macros and variables PDL variables Specific, long PDL expressions To simplify argument values E.g.: var(airbag) ≡ airbag or case(SIR) or phrase(air or side,bag)

PDL Overview Outline PDL macros and variables

Agenda Outline An overview of PDL PDL bits and pieces

PDL Bits 'n Pieces Outline So how do you feel about PDL?

Outline PDL Bits 'n Pieces Is it really that bad? Let’s polyanalyze it and see what others have to say…

Outline PDL Bits 'n Pieces ! So It is difficult! = stem(it) and stem(is) and thesaurus(difficult) Three things to learn here: The search engine automatically does stemming on everything unless in [ ]. The search engine automatically adds and in-between adjacent bare words. ! is a shorthand for thesaurus().

Outline PDL Bits 'n Pieces / We often say things like him/her. What if we polyanalyze plan/planning?

Outline PDL Bits 'n Pieces / So plan is different from plan/planning? Is this a bug to report at http://www.polyanalyst.com/mantis? Not this time, because: / is a PDL operator that returns the difference between the arguments.

Outline PDL Bits 'n Pieces / That is, plan/planning looks for the complement of planning in plan. Would that just be plan then? Why is there zero match? The answer is stemming. So we really need plan/[planning].

Outline PDL Bits 'n Pieces / A total of 11 records with both stem(plan) and school in a sentence:

Outline PDL Bits 'n Pieces / What if the original text contains things like him/her and we are indeed looking for those? * [A/B] is interpreted by the search engine as [A B].

Outline PDL Bits 'n Pieces phrase() Love-hate relationship with phrase(). Any text in double quotes is always interpreted as a phrase: "A B" = phrase(A,B).

Outline PDL Bits 'n Pieces phrase() A B = A and B phrase(A B,C) = phrase(phrase(A,B),C) ≠ phrase(A and B,C) phrase(A B,C) = phrase(A,B,C)

Outline PDL Bits 'n Pieces phrase() The search engine generally ignores punctuations, but phrase(0,…) and pattern(0,…) allow to exclude them.

Outline PDL Bits 'n Pieces phrase() phrase() vs. pattern() Base forms: phrase(A,B) vs. pattern(A,B) pattern() is almost the same as phrase(), except that pattern() allows stop words b/w arguments.

PDL Bits 'n Pieces phrase() Outline phrase() vs. pattern()

Outline PDL Bits 'n Pieces phrase() The Extended form of phrase() phrase(N,term1,term2,term3,…) Matches text fragments that contain all the argument terms in the specified order in the same sentence, and where the difference between the positions of any adjacent pair of terms is no more than N.

Outline PDL Bits 'n Pieces phrase() The Extended form of phrase() To specify that the maximum position difference between any terms be N1, while the maximum position difference between neighboring terms be N2, one can use the following expression: near(N1,phrase(N2,term1,term2,term3,…))

Outline PDL Bits 'n Pieces phrase() In phrase(), sentence(), near(), etc., "not" is only allowed at the beginning of an argument to mean "absence".

Outline PDL Bits 'n Pieces phrase() except() embedded in phrase(): phrase(school, not except()) means phrase(school, <absence of all words>), i.e., the match shouldn't contain a second argument.

Outline PDL Bits 'n Pieces phrase() except() embedded in phrase(): phrase(school, except(.)) means phrase(school, <any word, except all words>), i.e., the second argument must be in the match, but at the same time it cannot be anything.

Outline PDL Bits 'n Pieces thesaurus() thesaurus(POS,term,term2,term3,…) Matches synonyms of any argument term. Can choose to restrict to certain part(s) of speech.

Outline PDL Bits 'n Pieces term() term(list,list2,list3,…) Matches all the words from the argument word list(s).

Outline PDL Bits 'n Pieces term() term() matches the stemmed forms of any given word from the list(s).

Outline PDL Bits 'n Pieces stem() singleroot() vs. stem() singleroot() matches word forms with the same root as the term. stem() matches word forms with the same stem as the term.

PDL Bits 'n Pieces stem() Outline Can specify POS in stem() as well.

Outline PDL Bits 'n Pieces Vs. SRL SRL: Symbolic Rule Language. For data manipulation and calc. In column and row operations. For example: date([Release Time Raw],"DT;24;YYYYMMDD")

Outline PDL Bits 'n Pieces Vs. LinguaMark PolyAnalyst LinguaMark® Used to define language constructions associated with entities, evaluations, and sentiments. ‘Director’ <,GF(OF)> <$Company>:@ ‘is’ <$Person> matches “Director of Microsoft Corp. is Bill Gates”

Outline PDL Bits 'n Pieces Vs. LinguaMark PolyAnalyst LinguaMark® “Custom Entity Extraction with PolyAnalyst’s LinguaMark Language” Date: Thursday, May 15 Time: 8:45 – 9:25 am

Megaputer Intelligence Contacting Megaputer Questions? Megaputer Intelligence megaputer.com