SEQUENCE PACKAGE ANALYSIS: A New Natural Language Understanding Method for Performing Data Mining of Help-Line Calls and Doctor- Patient Interviews AMY.

Slides:



Advertisements
Similar presentations
Testing Relational Database
Advertisements

Tuning Jenny Burr August Discussion Topics What is tuning? What is the process of tuning?
AMEP Assessment Task Bank Professional Development Kit Question Items
Teaching grammar better! Hugh Dellar The University of Westminster Heinle Cengage.
LISTENING COMPREHENSION
Using Problem Solving in NAMI signature programs An instructional module for people who have already been trained to facilitate a NAMI support group or.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 13 Experiments and Observational Studies.
Substitute FAQs SubFinder Overview. FAQs Do I have to have touch-tone service to use SubFinder? No, but you do need a telephone that can be switched from.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Corpus 06 Discourse Characteristics. Reasons why discourse studies are not corpus-based: 1. Many discourse features cannot be identified automatically.
1 Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang, Assistant Professor Dept. of Computer Science & Information Engineering National Central.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
An expert system is a package that holds a body of knowledge and a set of rules on a subject that has been gained from human experts. An expert system.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
ACOS 2010 Standards of Mathematical Practice
Module Two. Purpose Simulation type activity of how to conduct a Student Session Provide information on the events of the day Agenda Student Mission Possible.
Natural Language Processing and Speech Enabled Applications by Pavlovic Nenad.
Test Taking Tips How to help yourself with multiple choice and short answer questions for reading selections A. Caldwell.
Dementia Awareness Alzheimer’s Society. ________________________________________________________________________________________ alzheimers.org.uk What.
Dr. Abdelrahim Hamid Mugaddam. Words, phrases, clauses and sentences have certain kinds of structures not others. There are ways of signaling the beginnings,
Second Annual Research Symposium of the Human Language Technology Research Institute Sequence Package Analysis: A New Natural Language Intelligence Method.
TNEEL-NE. Slide 2 Connections: Communication TNEEL-NE Health Care Training Traditional Training –Health care training stresses diagnosis and treatment.
How to make a good presentation
Textbook Analysis UNIVERSIDADE FEDERAL DE MINAS GERAIS FACULDADE DE LETRAS COURSE: The Communicative Approach PROFESSOR: Deise Prina Dutra STUDENTS: Augusto.
Communication Skills Anyone can hear. It is virtually automatic. Listening is another matter. It takes skill, patience, practice and conscious effort.
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
Focus Education Assessing Reading: Meeting Year 2 Expectations Year 2 Expectations: Word Reading Decode automatically and fluently Read accurately.
Copyright © 2010 Pearson Education, Inc. Chapter 13 Experiments and Observational Studies.
Experiments and Observational Studies. Observational Studies In an observational study, researchers don’t assign choices; they simply observe them. look.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 13 Experiments and Observational Studies.
What Is Active Listening?
1 Computational Linguistics Ling 200 Spring 2006.
Kindergarten Reading Comprehension AP3.  Once signed in on the PMRN, the K-2 Demo link is at the left on the main homepage.  The K-2 Demo has first.
Teaching Productive Skills Which ones are they? Writing… and… Speaking They have similarities and Differences.
ORAL EXAMINATION INDIVIDUAL PRESENTATION What you should know about Part one? What is an individual presentation? What is an individual presentation?
SEQUENCE PACKAGE ANALYSIS: A NEW WAY TO UNDERSTAND NATURAL LANGUAGE DATA ACROSS DIFFERENT LANGUAGES AND DIALECTS AMY NEUSTEIN, Ph.D. LINGUISTIC TECNOLOGY.
Socratic Seminars EXPECTATIONS FOR A SUCCESSFUL DISCUSSION.
Mining for What’s Missing: How to Find What’s Not in the Speech Application’s Vocabulary AMY NEUSTEIN, Ph.D. LINGUISTIC TECNOLOGY SYSTEMS
Working with Students with Attention Deficit Hyperactivity Disorder ADHD/ADD.
Academia Británica Pulling teeth UTTERANCE above ALL March ̍11 UTTERANCE above ALL Academia Británica Pulling teeth March ̍11 um, so...what are we talkin’about?
MENTORING ACCORDING TO THE PRACTICE OF CENTRAS Constantza Mamaia 2- 3 June 2011.
Data Base Systems Some Thoughts. Ethics Guide–Nobody Said I Shouldn’t Kelly make a backup copy of his company’s database on CD and took it home and installed.
Comparing and Ranking Documents Once our search engine has retrieved a set of documents, we may want to Rank them by relevance –Which are the best fit.
Inha University 2011 English Education Program. Welcome to Effective Communication in the Classroom EJ 417 Mondays from 10:00-11:50 Wednesdays from 11:00-11:50.
ME AS A LEADER BLOCK 3. I am 18 years old, I have an older brother and a younger brother, I also work as a waitress at Rams Horn and I plan to go to college.
CHAPTER 19 Communication Skills.
Unit B2-4 Employability in Agriculture/Horticulture Industry.
Interviewing Rules How to interview like a champ.
YALE LAW SCHOOL POLICY SCIENCES CENTER ANNUAL INSTITUTE Using a New Method of Natural Language Intelligence for Performing Wiretap Analysis Amy Neustein,
IBT integrated speaking Question 3: Fit & Explain Question 4: General & Specific Question 5: Problem & Solution Question 6: Lecture Summary.
Chapter 10 Algorithmic Thinking. Learning Objectives Explain similarities and differences among algorithms, programs, and heuristic solutions List the.
Developing Communication Skills Developing Listening Techniques.
1 Core English 1 Listening Task – p 158 Rhetorical Function Questions.
Unit 2 The Nature of Learner Language 1. Errors and errors analysis 2. Developmental patterns 3. Variability in learner language.
Lexical, Prosodic, and Syntactics Cues for Dialog Acts.
The Non Fictional “Hal”? Rather than make humans conform to computer-speak, design computers to understand conversational dialog. KeyWordsSequence Packages.
Preparing Visually- Disabled Instructors to Teach Online Thomas J. Tobin, Ph.D. Westmoreland County Community College.
Skills For Effective Communication
Unit D2-4 Employability in Agriculture/Horticulture Industry.
Sequence Package Analysis A New Data Mining Tool to Speed Up Wiretap Analysis Amy Neustein, Ph.D. Linguistic Technology Systems
Year R Stay and Play Talk. Why?  Communication is the number one skill. Without it, children will struggle to make friends, learn and enjoy life.
Script Writing Techniques. Write like you speak In most cases, writing for the ear is more informal than writing to be read. You may find that it improves.
Input, Interaction, and Output Input: (in language learning) language which a learner hears or receives and from which he or she can learn. Enhanced input:
Documenting Conversation Toshihide Nakayama Documentary Linguistics Workshop 2016.
LEARNING UNIT 7 (Week 11) Making A Business Telephone Call ENGLISH FOR PROFESSIONAL COMMUNICATION.
Izyan Safwani Binti Ismail (P76364). In the learning process, one might find that some people can learn English language very quickly and some people.
Natural Language Processing and Speech Enabled Applications
A New Conversational Query Language (C-QL) For The “Emotionally Intelligent” Smartphone Amy Neustein, Ph.D.
Decisions The next set of slides is to review the decisions that you need to make to create an effective public speaking event.
Presentation transcript:

SEQUENCE PACKAGE ANALYSIS: A New Natural Language Understanding Method for Performing Data Mining of Help-Line Calls and Doctor- Patient Interviews AMY NEUSTEIN, Ph.D. LINGUISTIC TECNOLOGY SYSTEMS PRESENTATION TO NLUCS Workshop at ICEIS University of Portugal April 13, 2004

WHY DO WE NEED A NEW NATURAL LANGUAGE METHOD? 1) In the real world speakers do not always use “key” words that appear in the application vocabulary, which can lead to a poor word match between the user’s input and the application vocabulary. 2) To build a Statistical Language Model to accommodate to the various ways users speak requires a large data corpus that is costly to assemble, and still there is no guarantee that an accurate word match will be found.

APPLICATIONS OF SEQUENCE PACKAGE ANALYSIS: 1) An “add on” layer of intelligence to audio data mining programs used for recorded help-line calls to extract business intelligence data and to detect early warning signs of caller frustration. 2) An “add on” layer of intelligence for mining doctor-patient interviews to uncover important medical history data, often buried in the ambiguity of patient dialog.

How Does Sequence Package Analysis (SPA) Work? SPA provides a “filter” for the front end of a speech recognizer, using generic templates that can be deployed in many different applications and languages; SPA can be used with vector-based models that hold spaces and determine “global weighting” of lexical items. SPA parses NL dialog to locate a series of related turns that are discretely packaged as a sequence of conversational interaction. SPA locates entire sequence packages rather than isolated key words, operating on the principle that it is easier to find a generic sequence package in a dialog than specific keywords. That is, speakers are more likely to vary in their choice of keywords than in their conversational sequence patterns, making it more difficult for an speech application to represent a speaker’s wide range of word choices than to represent actual conversational sequence patterns.

METHODOLOGICAL BASIS OF SPA SPA draws mainly from the field of conversation analysis: the study of the orderly properties of interactive dialog that revolve around the turn-taking process and other sequentially based features that are part of that process, such as the production of recycled turn beginnings when there is an overlap with a prior turn. SPA focuses on social action and how human-machine and human-human dialog is accomplished as a situated, interactive event. The discourse structures are therefore analyzed for their social interactive value rather than solely for their grammatical discourse structure.

ALGORITHMIC DESIGN OF SPA SPA algorithms, which are currently under development, consist of sequences that are either small segments of dialog or large sequences that can potentially span the entire dialog. But regardless of the size of the sequence package, the purpose of SPA is to locate the indigenous patterns in the dialog that evolve as the dialog unfolds. By using SPA to parse Natural Language dialog, those features which are evolving and dynamic (e.g., early warning signs of caller frustration; or a patient’s concerns about an illness) can be detected by grammars that are flexible enough to recognize dynamic patterns.

THE HEURISTIC VALUE OF SPA 1. Building Application Vocabularies: The SPA method of parsing dialog allows the discovery of new words, to be added to the application vocabulary, by locating the generic sequence packages in which such words appear. 2. Gathering Business Intelligence and Medical History Data: By tracking the nature and frequency of sequence packages, the system can identify important business intelligence data and medical history data that would have ordinarily eluded the system.

VALIDATION OF SEQUENCE PACKAGE ANALYSIS Does the addition of SPA improve speech recognition capabilities? Hypothesis “A”: By adding an SPA filter to a speech recognizer to improve analysis of speech input, one can significantly streamline the corpus of data required to build a Statistical Language Model. Hypothesis “B”: By adding an SPA filter to a Statistical Language Model that contains the full spectrum of possible utterances (as opposed to a streamlined corpus of data), the SLM can better differentiate among multiple utterances accepted by the recognizer.

USING SPA IN THE CALL CENTER: MINING HELP-LINE CALLS FOR BUSINESS INTELLIGENCE DATA A caller needs a service call but rather than use words in the application vocabulary such as “service call” or “technician” this is what the frustrated caller says either to the IVR-driven auto attendant at the help-line desk or to the human agent at the call center. Caller: “I really can’t do this myself. I can’t get this to work without someone coming here. I really don’t know what to do with this.”

Finding the Sequence Package in the Dialog Example The sequence package consists of a repeated use of pronouns (and similar unnamed referents), standing in place of nouns, in very close proximity: a short, condensed complaint-- referenced by pronouns (“I really can’t do this myself”) the amplification of the source of the trouble (and the request for assistance) but with the frequent use of pronouns that have no stated subject/object referents (“I can’t get this to work without someone coming here”) a recycling of the first part of the complaint with the same patterned use of pronouns in place of nouns (“I really don’t know what to do with this”)

FILTERING THE INPUT First, the SPA “filter” would direct the speech engine to the second part of the complaint utterance-- the amplification of the source of the trouble (and request for assistance): “I can’t get this to work without someone coming here” Second, rather than run the whole utterance through the SLM, only the second part of the complaint would be run through the SLM to find its closest statistical approximation. Third, once the closest word match is made to this second part of the complaint, the SLM would then add this “new” phrase to the application vocabulary.

MINING HELP-LINE CALLS FOR SIGNS OF CALLER FRUSTRATION An SPA-driven mining program would look for conversational sequence patterns [instead of key words or changes in prosody] to detect signs of caller frustration. While speakers vary widely in their choice of words and in their stress patterns [some speakers may increase their pitch when upset while others may not], their conversational sequence patterns -- which are derived from the highly systematic properties that guide the production of talk-- nevertheless remain consistent across a wide spectrum of callers.

Early Signs of Caller Australian Help-Line Desk Caller: “I’ve installed Office 97 and…I was a bit stupid. I went into uninstall and um pulled off a whole stack of items off the uninstall and it was a very silly thing to do so now when I start up my computer I get a screen um which say um a black- a black and white screen which says never delete this item. It’s a message screen and every time I start up it comes up……[deleted text]……………………………………… Caller: “I’m wondering if I reinstall will I wipe out [my documents]” Agent: “Okay, well look I could certainly have a technician look at the problem for you; we do charge for are you aware of that?” Caller: “I’m just asking a question - I’m just wondering whether or not I should uninstall Microsoft Word?”

USING SPA TO LOCATE THE RELEVANT CONVERSATIONAL SEQUENCE PATTERNS Step One: Locate the pre-question phrases to reports of troubles and requests for assistance: “I’m wondering if” “I’m just asking a question” “I’m just wondering whether or not” Step Two: Quantify the number of times and the proximity of such pre-question phrases. Step Three: Determine if they escalate or, in the alternative, diminish?

ANALYSIS The caller to the Australian help-line began her report of the trouble as a long winded narrative, but with the noticeable absence of a request for help. The caller later produced pre-question phrases when she made her request for help; however, these phrases began to escalate (by being combined with one another) just at the point where she began to show signs of frustration: “I’m just asking a question - I’m just wondering whether or not I should uninstall Microsoft Word?” As one can see, such conversational sequence patterns evolve within the dynamic flow of dialog. By applying an SPA approach one can pinpoint these indigenous features of talk that evade standard speech recognizers.

MINING MEDICAL INTERVIEWS THE PROBLEM: Patients often give very important medical history data about themselves and other family members at the wrong place in the medical encounter (such as at the very end of the medical interview or during a routine physical exam) when the doctor is less likely to be paying attention in that he has already gone over those areas with the patient. When patients give medical information at the wrong place in the interview, the data can be lost because the doctor’s attention is now focused on other things.

MEDICAL INTERVIEWS The Solution: SPA locates specific conversational sequence patterns in which crucial medical history data is embedded. By locating those sequence package templates, important medical history data can be extracted--similar to the way business intelligence data can be extracted from help-line calls.

ILLUSTRATION Patient withholds vital family history data about osteosarcoma (bone cancer). Patient discloses this information at the point in the medical encounter (viz., during a brief medical exam) when discussions of family history data were no longer the main topic. Patient embeds this history data about bone cancer in the form of a narrative -- as if she were casually telling a “story” to a neighbor or friend-- presumably hoping that by downplaying its significance the doctor would give it much less attention than had she come out with it directly when queried about family illnesses.

DIALOG SAMPLE Patient: “I become terribly worried about my pain, which reminds me of the arthritic pain that my sister had, which turned out to be bone cancer, so I worry whenever I have pain because I don’t know if it is what she had.”

THE SEQUENCE PACKAGE TEMPLATE: A HIGH USAGE OF NARRATIVE PHRASES IN CLOSE PROXIMITY SEQUENCE PACKAGE DIVIDED INTO 4 PARTS: a short condensed and somewhat nonspecific concern preceded by a narrative phrase: I become terribly worried about my pain an expansion of the concern, citing the troublesome datum (“bone cancer”), which is embedded with two narrative predicates: which reminds me of the arthritic pain that my sister had which turned out to bone cancer

SEQUENCE PACKAGE, CONT. a recycling of the nonspecific concern preceded by a narrative phrase: so I worry whenever I have any pain a reference back to the expanded concern, but only with the use of pronouns that serve as anaphors, referring back to the expanded concern: because I don’t know if it is what she had

EXTRACTING MEDICAL HISTORY DATA BY USING SPA The SPA “filter” would direct the speech engine to search for specific content material embedded within the two narrative predicates, appearing in the second part of the four-part sequence package (“which reminds me of…which turned out to be...”) By searching the sequence package templates, the mining program uncovers important family history data (arthritic pain, ultimately diagnosed as bone cancer) that the patient buried in the interview by using an informal narrative style, replete with anaphors and non specific referents, and by offering this family history data AFTER the physician had already completed his review of family medical history.

Mining Wiretapped Communications disguised The following example shows how by applying an SPA approach to wiretapped dialog, one can flag important security information that is cleverly disguised by the suspects:

ILLUSTRATION Speaker “A” is trying to educate Speaker “B” about a new meeting place whose location is very important. Any confusion or misunderstanding about this meeting place could spoil the plans. But Speaker “A” is very clever: First, he stays away from buzz words (such as naming a bridge, a tunnel or a street). Second, he refrains from making any comments about how vital it is to get these instructions right.

Dialog Example Juniors? (the question mark shows an upward intonation) second pause (speaker then pauses briefly) Speaker “A”: Come to the intersection near Juniors? (the question mark shows an upward intonation) second pause (speaker then pauses briefly) Speaker “B”: 1.2 second pause Speaker “A”: You know the thoroughfare with the big traffic light? Speaker “B”: Juniors, yeah.

THE SEQUENCE PACKAGE Speaker “A”: Come to the intersection near Juniors? Speaker “B”: 1.2 seconds of silence A noun referent (“Juniors”) with an upward intonation A brief pause, giving the listener the chance to show recognition or ask for clarification. Silence by the listener which indicates lack of understanding or confusion.

SEQUENCE PACKAGE CONT. Speaker “A”: You know the thoroughfare with the big traffic light? Speaker “B”: Juniors, yeah. Speaker “A” produces a clarification of the noun referent (“Juniors”) (“You know the thoroughfare with...”) Speaker “B” produces a repeat of noun referent (“Juniors”) - the source of the recognition trouble - followed by a recognitional marker (“Yeah”)--which demonstrates to Speaker “A” that he has “corrected” the misunderstanding. But had he simply produced a recognitional marker (‘yeah’) without mentioning the source of the trouble (“Juniors”), there would be no indication to the other speaker that he now recognizes the importance of this meeting place.

CODA SPA provides a new NLU method for designing intelligent software packages that can serve as “filter” for the front end of a speech recognizer. Since the SPA templates are generic, they can be deployed in many different applications and across many languages to do the following: 1) extract business intelligence data from call center recordings; 2) detect early warning signs of caller frustration in a help-line call; 3)uncover important medical history data buried in the medical interview; and 4)learn the plans and operations of suspected terrorists.