L.A.S.I. Feasibility Presentation Presented by: CS410 Red Group November 12, 2012 Linguistic Analysis for Subject Identification
Team Red Staff Chart Introduction Societal Problem Case Study Proposed Solution Major Component Diagram Algorithm The Competition Risk Conclusion Outline 2 November 12, Red Group
Team Red Staff Chart 3 Scott Minter Project Co Leader Software Specialist Brittany Johnson Project Co Leader Documentation Specialist Dustin Patrick Algorithm Specialist Expert Liaison Richard Owens Documentation Specialist Communication Specialist Aluan Haddad Algorithm Specialist Software Specialist Erik Rogers Marketing Specialist GUI Developer November 12, Red Group
What is a theme? 4 November 12, Red Group
A specific and distinctive quality, characteristic, or concern. 1 1 “Theme” Merriam Webster 5 November 12, Red Group
What are you looking for when you are identifying a theme? 6 November 12, Red Group
Who What When Where Why How 5 W’s & 1 H 7 November 12, Red Group
Bill’s stove was broken. He has been saying for months that he would go to the appliance store to buy a new one. He had some free time yesterday, so he drove to the store to buy a new stove Red Group November 12, 2012
WhoBill WhatHe travelled to some place WhenYesterday WhereThe store WhyTo buy a stove because his broke HowBy driving Red Group November 12, 2012
Bill drove to the store yesterday to buy a new stove because his broke Red Group November 12, 2012 The Theme from the 5 W’s & 1 H
Why are themes important? Comprehension Summarization Assists in communication between people 11 November 12, Red Group
Societal Problem It is difficult for people to identify a common theme over a large set of documents in a timely, consistent, and objective manner. 12 November 12, Red Group
How long does it take? Finding a theme over multiple documents is a time-consuming process. The average reading speed of an adult is 250 words per minute. 2 2 Thomas "What Is the Average Reading Speed and the Best Rate of Reading?" 13 November 12, Red Group
Consistency and Objectivity The criteria for evaluation may vary from person to person. Large quantities of documents must be mentally digested, assessed, and interrelated. 14 November 12, Red Group
Dr. Patrick Hester “My research interests include multi-objective decision making under uncertainty, probabilistic and non probabilistic uncertainty analysis, critical infrastructure protection, and decision making using modeling and simulation.” 3 - Dr. Hester Ph. D. from Vanderbilt University, 2007 Major: Risk and Reliability Engineering and Management 15 3 Patrick Hester Website November 12, Red Group
Dr. Hester is a systems analyst and researcher ▫He Must Conduct extensive research Quickly become familiar with client systems Formulate concise, objective assessments LASI will help with all of this Red Group November 12, 2012
Assessment Improvement Design (A.I.D.) Preliminary Problem statement Identified from document Problem statement then used to find Critical Operational Issues (COI’s) COIs used to find Measures of Effectiveness (MOE’s) MOE’s used to find Measures of Performance (MOP’s) 17 November 12, Red Group
Customer Contact Situational Awareness Meeting Will NCSOSE be needed? Client Goes Elsewhere no yes Document Gathering Process Document Analysis Is Customer satisfied? no Problem Statement Presentation yes Current Method 18 Continue on to the rest of the A.I.D Process November 12, Red Group
LASI: Linguistic Analysis for Subject Identification THEMES LASI 19 November 12, Red Group
Our Proposed Solution LASI is a linguistic analysis decision support tool used to help determine a common theme across multiple documents. It is our goal with LASI to: ▫accurately find themes ▫be system efficient ▫provide consistent results 20 November 12, Red Group
What do we mean by “linguistic analysis”? The contextual study of written works and how the words combine to form an overall meaning. 21 November 12, Red Group
Linguistic analysis involves SyntacticSemantic Logical grammar Statistical Data Alphabetical Frequencies Word Counts Parts of Speech Word Dependencies Relating syntactic structures to language- independent meanings Extracting meaning and conceptional arguments Summarization 22 November 12, Red Group
The Wills and Will Nots of LASI What LASI Will DoWhat LASI Will Not Do Analyze multiple documents to find common themes Provide statistical data to help a user make a decision Provide a concise synopsis Provide a single theme 23 November 12, Red Group
Who Would This Appeal To? Researchers Consultants Academics Students 24 November 12, Red Group
Benefits To The Customer Time saving Objective output Consistent output Cost saving solution 25 November 12, Red Group
How does LASI fit into our Case Study? 26 November 12, Red Group
Customer Contact Situational Awareness Meeting Will NCSOSE be needed? Client Goes Elsewhere no yes Document Gathering Process Document Analysis Is the Customer satisfied? no Problem Statement Presentation yes Before LASI 27 November 12, 2012 Continue on to the rest of the A.I.D Process 410 Red Group
Customer Contact Situational Awareness Meeting Will NCSOSE be needed? Client Goes Elsewhere no yes Document Gathering Process LASI Aided Document Analysis Is the Customer satisfied? no Problem Statement Presentation yes 28 After LASI November 12, 2012 Continue on to the rest of the A.I.D Process 410 Red Group
Major Functional Components User Interface: - Multi-Level Views - Weighted Phrase List - Detailed Breakdown - Step by Step Justification Software High End Notebook PC - Computation Quad-Core CPU - Primary Memory 8.0 GB DDR3 RAM - Document Storage Solid State Storage ~$1500 USD Algorithm: Extrapolates the most likely congruence of themes and ideas across all documents in the input domain Hardware 29 November 12, Red Group
Linguistic Analysis Algorithm Secondary Analysis: Associative Identification Bind Pronouns to Nouns, Updating Frequency Identify Potential Noun Phrases Bind Adjectives to Nouns Primary Analysis: Word Count and Syntactic Assessment Identify Corresponding Parts of Speech Determine Frequency by Grammatical Role Traverse Document in Word-Wise Manner Tertiary Analysis: Semantic Relationship Assessment Identify Potential Synonyms Assess Potential Subject- Object-Verb Relationships Output List of Weighted Themes 30 November 12, Red Group
The Competition 31 November 12, Red Group
The Competition 32 November 12, Red Group
WordStat 33 November 12, Red Group
Stanford CoreNLP 34 November 12, Red Group
ReadMe 35 November 12, Red Group
Automap 36 November 12, Red Group
Risk Matrix Customer Risks C1 -- Product Interest C2 -- Maintenance C3 -- Trust Technical Risks T1 -- System Limitations T2 -- Scanned Text Recognition T3 -- Jargon Recognition T4 – Illegal Character Handling 37 November 12, Red Group
Customer Risks C1. Product Interest Probability 2 Impact 4 Mitigation: LASI offers unique functionality and user friendliness. C2. Maintenance Probability 3 Impact 2 Mitigation: LASI will be a free, open source application allowing the community to maintain and extend it over time. C3. Trust Probability 3Impact 3 Mitigation: LASI will provide a step by step breakdown of output analysis and algorithm reasoning 38 November 12, Red Group
Technical Risks T1. System Limitations Probability 4 Impact 2 Mitigation: LASI will be designed from the ground up in native C++ for memory and CPU efficient code. T2. Scanned Text Recognition Probability 4 Impact 3 Mitigation: LASI will implement an optical character recognition algorithm to handle scanned text 39 November 12, Red Group
Technical Risks T3. Jargon Recognition Probability 3 Impact 2 Mitigation: LASI will have domain specific dictionaries and feature intuitive contextual inference. T4. Illegal Character Handling Probability 4 Impact 2 Mitigation: LASI will providers contextual inference, synonym recognition and statistical methods 40 November 12, Red Group
LASI is feasible. LASI is a decision support tool not a decision making tool. Implications of success affect a wide area of study and professions. In order for LASI to succeed the output needs to immediately usable and the interface user- friendly. Conclusion 41 November 12, Red Group
References 1."Theme." Def. 1b. Merriam Webster. N.p., n.d. Web. 19 Oct Thomas, Mark. "What Is the Average Reading Speed and the Best Rate of Reading?" What Is the Average Reading Speed and the Best Rate of Reading? Web. 19 Oct “Patrick Hester" Old Dominion University. N.p., n.d. Web. 24 Sept Stanislaw Osinski, Dawid Weiss. 13 August, Carrot 2. 9/25/2012. ”WordStat” Provalis Research. Web. 24 Sept “ReadMe: Software for Automated Content Analysis” Web. 24 Sept "AlchemyAPI Overview." AlchemyAPI. N.p., n.d. Web. 19 Oct "AutoMap:." Project. N.p., n.d. Web. 19 Oct "CL Research Home Page." CL Research Home Page. N.p., n.d. Web. 19 Oct November 12, Red Group