PolyAnalyst Web Report Training

Slides:



Advertisements
Similar presentations
A worldwide library cooperative OCLC Online Computer Library Center OCLC CJK Users Group 2007 Annual Meeting March 24, 2007, Boston David Whitehair, OCLC.
Advertisements

© Megaputer intelligence, Inc. Your Knowledge Partner Survey Analysis using PolyAnalyst TM.
© 2007 Megaputer Intelligence Utilizing Text Analytics in Your VOC Program: Analyzing Verbatims with PolyAnalyst Sergei Ananyan Megaputer Intelligence.
Large-Scale Entity-Based Online Social Network Profile Linkage.
Distant Supervision for Emotion Classification in Twitter posts 1/17.
PolyAnalyst Data and Text Mining tool Your Knowledge Partner TM www
Social media monitoring, measurement & engagement - Copyright © 2009 Radian65/17/2015.
Solutions for Multilingual Literature by XSL Formatter 6,800 known languages.
1 Distance education : What could technology offer ? Gérard CHOLLET ENST/CNRS-LTCI 46 rue Barrault PARIS cedex 13
Talk, Translate, and Voice By: Jill Gruttadauro, Amanda Swetish, Porter Waung.
Evaluations Submit your evals online.
UNLIMITED. SIMULTANEOUS. NO CHECK-OUT. eREFERENCE.
Advanced Google Searching June Liebert Director and Assistant Professor The John Marshall Law School “Do no harm” – the Google mantra.
4th project meeting 27-29/05/2013, Budapest, Hungary FP 7-INFRASTRUCTURES programme agINFRA agINFRA A data infrastructure for agriculture.
IBM Maximo Asset Management © 2007 IBM Corporation Tivoli Technical Exchange Calls Aug 31, Maximo - Multi-Language Capabilities Ritsuko Beuchert.
George Brown A View From Above.  Who created it?  Scientists from the Digital Equipment Corporation’s Research lab in Palo Alto, CA  When was it created?
EText Overview Pearson Confidential July Pearson eText Platform Platform Goal: –Provide a highly competitive, Pearson-owned platform that concurrently.
November 8, Global Competitive Internet Usage Forecasting Across Countries and Languages June Wei Department of Management/MIS College of Business.
FishBase Summary Page about Salmo salar in the standard Language of FishBase (English) ENBI-WP-11: Multilingual Access to European Biodiversity Sites through.
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Technology vocabulary slides assignment. Application Definition : A program or group of programs designed for end users. Application software can be divided.
496C0074 包依婷 496C0087 陳鈺閔 translation software. 1. Product Features Multilanguage, Cross-platform, Customized My Dr.eye Instant Dictionary It includes.
The PATENTSCOPE search system: CLIR February 2013 Sandrine Ammann Marketing & Communications Officer.
SUMMON ® 2.0 DISCOVERY REINVENTED. What is Summon 2.0? A new, streamlined, modern interface New and enhanced features providing layers of contextual guidance.
UA in ImageCLEF 2005 Maximiliano Saiz Noeda. Index System  Indexing  Retrieval Image category classification  Building  Use Experiments and results.
Content Mgmt Services eText Overview Digital Delivery Aug 7, 2012.
Profiling Web Archive Coverage for Top-Level Domain & Content Language Ahmed AlSum, Michele C. Weigle, Michael L. Nelson, and Herbert Van de Sompel International.
5 th EI World Congress - Berlin, July 2007 Use of the Web and Internet Technologies to enhance Teacher Union Work.
Luis Avila Tics. We have to recognize all the operating systems we have nowadays in the different smartphones Blackberry: Bb OS Iphone: iOS Nokia: symbian.
PageManager /16 What ’ s the strength in PM6 ? Open Architecture Tree View to Browse Any Folders In Your System Open Architecture Tree View to Browse.
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
4 Singles Dating 3.0 Company BP Why Do this? If This Is Possible Dating 5.0 Location Based Dating Location Based Dating.
Find International Driving Document Translator Online
ELanguages creative collaboration for teachers globally.
Pass Microsoft MCSE Exam MCSE: Business Intelligence
Information Retrieval in Practice
Summon® 2.0 Discovery Reinvented
RECENT TRENDS IN SMT By M.Balamurugan, Phd Research Scholar,
Measuring Monolinguality
Sentiment Analysis: The Emotionality of Discourse .
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
CLIR PATENTSCOPE search system
Machine Learning overview Chapter 18, 21
Profiling Web Archive Coverage for Top-Level Domain & Content Language
Microsoft Dumps With Real Exam Question Answers - Dumps4download
PolyAnalyst Data and Text Mining tool
This meme comes from South Park (S2E )
CLIR PATENTSCOPE search system
Text Categorization Assigning documents to a fixed set of categories

CSE 635 Multimedia Information Retrieval
The Translation Management System for Global Enterprises
PolyAnalyst Web Report Training
PolyAnalyst Web Report Training
PolyAnalyst Web Report Training
COUNTRIES NATIONALITIES LANGUAGES.
PolyAnalyst Web Report Training
PolyAnalyst Web Report Training
PolyAnalyst Web Report Training
PolyAnalyst Web Report Training
Claro ScanPen Reader By Claro Software Limited
PolyAnalyst Web Report Training
PolyAnalyst Web Report Training
PolyAnalyst Web Report Training
PolyAnalyst Web Report Training
Big Data Big Data first appeared towards the end of the 1990’s and has become a buzz word in the last few years.
Content Mgmt Services Digital Delivery Update February 2013
PolyAnalyst Web Report Training

PolyAnalyst™ text mining tool Allstate Insurance example
Presentation transcript:

PolyAnalyst Web Report Training Multilingual Analysis PolyAnalyst Web Report Training Megaputer Intelligence www.megaputer.com © 2014 Megaputer Intelligence Inc.

Multilingual Data Outline

Internet Usage by Language Outline

The proportion of English texts has decreased significantly. Internet Usage by Language Outline The proportion of English texts has decreased significantly.

Growth in Language Usage Outline

Growth of English text data is much slower than other languages. Growth in Language Usage Outline Growth of English text data is much slower than other languages.

Garfield comic translated to Japanese and back Lost in Translation Outline Garfield comic translated to Japanese and back *http://garfieldlostintranslation.blogspot.com/

Original Garfield comic in English Lost in Translation Outline Original Garfield comic in English

Outline PolyAnalyst Languages European Languages English Spanish French German Russian Italian Dutch Polish Portuguese Turkish Greek Asian Languages Chinese (Simplified & Traditional) Japanese Korean

Outline All-Language Functionalities PDL Functions Nodes case() count() empty() except() follow() hcolor() header() macro() near() number() paragraph() pattern() phrase() regex() soundex() stem() term() wildcard() Nodes Bayes & SVM Classification Distinct Texts Keyword Extraction Language Detection Link Terms Search Query Spell Check

Outline Language-Specific Functionalities Text Classification Node Dutch French German Portuguese Russian negate() & possible() PDL functions Chinese (Simplified)

Outline English-Only Functionalities Advanced Text Analysis Nodes Sentiment Analysis Node Entity Extraction Node Semantic PDL Functions antonym() associate() generalize() hold() part() related() singleroot() thesaurus()

Online Feedback for Mobile Chat Apps Case Study Online Feedback for Mobile Chat Apps

Outline Losing Information: Example 1 Turkish Feedback “Surekli kullaniyorum.”

Outline Losing Information: Example 1 Turkish Feedback Machine Translation “Surekli kullaniyorum.” “And a Scrambler.”

Outline Losing Information: Example 1 Turkish Feedback Machine Translation “Surekli kullaniyorum.” “And a Scrambler.” Actual Meaning “I use it all the time.”

Outline Losing Information: Example 2 Turkish Feedback “Insanlarla arani aciyor okunmadigi halde okundu demesi ilginç.”

Outline Losing Information: Example 2 Turkish Feedback Machine Translation “Insanlarla arani aciyor okunmadigi halde okundu demesi ilginç.” “People say, interesting read, even though it hurt okunmadigi arani”

Outline Losing Information: Example 2 Turkish Feedback Machine Translation “Insanlarla arani aciyor okunmadigi halde okundu demesi ilginç.” “People say, interesting read, even though it hurt okunmadigi arani” Actual Meaning “It creates rifts between people it’s interesting that it says read even though it hasn’t been.”

Outline Losing Information: Example 3 Turkish Feedback “4 veriyorum çünku ses kalitesi iyi degil ugultulu ve gidiyor internet full oldugu halde duzeltme yapinn”

Outline Losing Information: Example 3 Turkish Feedback Machine Translation “I'm not as good sound quality 4 because the buzzing and goes well with the internet full duzeltme yapinn” “4 veriyorum çünku ses kalitesi iyi degil ugultulu ve gidiyor internet full oldugu halde duzeltme yapinn”

Outline Losing Information: Example 3 Turkish Feedback Machine Translation “I'm not as good sound quality 4 because the buzzing and goes well with the internet full duzeltme yapinn” “4 veriyorum çünku ses kalitesi iyi degil ugultulu ve gidiyor internet full oldugu halde duzeltme yapinn” Actual Meaning “I give it a 4 because the sound quality isn’t good there’s buzzing and it cuts out even though the internet is full fix it”

End-to-end data analysis Methodology End-to-end data analysis Data Loading Data Cleansing Data-Driven Analysis Analyst-Driven Analysis Visualizations

End-to-end data analysis Methodology End-to-end data analysis Data Loading Data Cleansing Data-Driven Analysis Analyst-Driven Analysis Visualizations

Outline Dictionaries & Indexing Dictionaries of each language are stored and accessed separately Each text analysis node accesses one set of dictionaries at a time That language is either determined during implicit indexing or can be assigned explicitly using Index node

Dictionary Manager Outline

Outline Dictionaries & Indexing Dictionaries of each language are stored and accessed separately Each text analysis node accesses one set of dictionaries at a time That language is either determined during implicit indexing or can be assigned using the Index node

Text Analysis Node Properties Outline

Outline Dictionaries & Indexing Dictionaries of each language are stored and accessed separately Each text analysis node accesses one set of dictionaries at a time That language is either determined during implicit indexing or can be assigned explicitly using Index node

Index Node Outline

Outline Best Practices Run Language Detection Filter data by language Run separate analyses on each separate dataset in the original language for that dataset

Outline Best Practices Run Language Detection Filter data by language Run separate analyses on each separate dataset in the original language for that dataset

Language Detection Outline

Outline Best Practices Run Language Detection Filter data by language Run separate analyses on each separate dataset in the original language for that dataset

Feedback Languages

Focus on English, Russian, Turkish, and Chinese Feedback Languages Focus on English, Russian, Turkish, and Chinese

Outline Best Practices Run Language Detection Filter data by language Run separate analyses on each separate dataset in the original language for that dataset

Separate Analyses per Language Outline

End-to-end data analysis Methodology End-to-end data analysis Data Loading Data Cleansing Data-Driven Analysis Analyst-Driven Analysis Visualizations

English Keywords

Top 5 English Keywords: update, message, excellent, phone, love

Turkish Keywords

Top 5 Turkish Keywords: message, great, super, error, recommend

Keywords by Language

Common keywords across languages Keywords by Language Common keywords across languages

Keywords by Language Keywords Distinct to English: phone, version, crash, fix, voice, friend, chat

Keywords by Language Keywords Distinct to Turkish: error, notification, time, storage, recommendation, single

English Link Terms

Turkish Link Terms package storage internet enough invalid push deliver notification send memory message late card

End-to-end data analysis Methodology End-to-end data analysis Data Loading Data Cleansing Data-Driven Analysis Analyst-Driven Analysis Visualizations

Outline Analyst-Driven Taxonomy For simultaneous highlighting in all languages: Run taxonomy separately on each language-specific dataset Merge scored results

Outline Analyst-Driven Taxonomy For simultaneous highlighting in all languages: Run taxonomy separately on each language-specific dataset One multilingual taxonomy using <or> Separate language-specific taxonomies Merge scored results

Outline Analyst-Driven Taxonomy For simultaneous highlighting in all languages: Run taxonomy separately on each language-specific dataset One multilingual taxonomy using <or> Separate language-specific taxonomies Merge scored results

Multilingual Taxonomy

Can run analyses in English, Chinese, and Russian Multilingual Taxonomy Can run analyses in English, Chinese, and Russian

Can run analyses in English, Chinese, and Russian Multilingual Taxonomy Can run analyses in English, Chinese, and Russian

Can run analyses in English, Chinese, and Russian Multilingual Taxonomy Can run analyses in English, Chinese, and Russian

Outline Analyst-Driven Taxonomy For simultaneous highlighting in all languages: Run taxonomy separately on each language-specific dataset One multilingual taxonomy using <or> Separate language-specific taxonomies Merge scored results

Merge Scored Results

The drill-down contains matches in all 3 languages. Multilingual Drill-Down The drill-down contains matches in all 3 languages. English example

The drill-down contains matches in all 3 languages. Multilingual Drill-Down The drill-down contains matches in all 3 languages. Chinese example

The drill-down contains matches in all 3 languages. Multilingual Drill-Down The drill-down contains matches in all 3 languages. Russian example

End-to-end data analysis Methodology End-to-end data analysis Data Loading Data Cleansing Data-Driven Analysis Analyst-Driven Analysis Visualizations

OLAP: Topics by Language

Link Analysis: Topics by Language

Conclusion Outline PolyAnalyst allows you to run multi-lingual analyses in original languages of data Work with multilingual datasets Work in 14 different languages Identify language-specific characteristics Get the most information out of the data Less subjective; avoid errors in translation

Outline Alternatives Machine Translation API Microsoft (current) SDL (upcoming)

Contacting Megaputer Questions?