Linguistic Analysis for Subject Identification LASI Linguistic Analysis for Subject Identification Milestone Presentation Presented by: CS410 Red Group September 20, 2018
Outline Team Red Staff Chart Document Parsing Introduction Weighter September 20, 2018 Outline Team Red Staff Chart Document Parsing Introduction Weighter Problem Statement GUI Flow LASI in our Case Study GUI Screenshots Risk Matrix Functional Components Competition Matrix Conclusion Algorithms Milestones
Team Red Staff Chart Scott Minter Dustin Patrick Richard Owens September 20, 2018 Team Red Staff Chart Scott Minter Project Co Leader Software Specialist Brittany Johnson Project Co Leader Documentation Specialist Dustin Patrick Algorithm Specialist Expert Liaison Richard Owens Documentation Specialist Communication Specialist Aluan Haddad Algorithm Specialist Software Specialist Erik Rogers Marketing Specialist GUI Developer
September 20, 2018 What is LASI?
LASI: Linguistic Analysis for Subject Identification September 20, 2018 LASI: Linguistic Analysis for Subject Identification THEMES LASI LASI
LASI Identifies Themes (5 W’s & 1 H) September 20, 2018 LASI Identifies Themes (5 W’s & 1 H) Who What When Where Why How
Why are themes important? September 20, 2018 Why are themes important? Comprehension Summarization Assists in communication between people
September 20, 2018 Societal Problem It is difficult for people to identify a common theme over a large set of documents in a timely, consistent, and objective manner.
September 20, 2018 Our Proposed Solution LASI is a linguistic analysis decision support tool used to help determine a common theme across multiple documents. It is our goal with LASI to: accurately find themes be system efficient provide consistent results
What do we mean by “linguistic analysis”? September 20, 2018 What do we mean by “linguistic analysis”? The contextual study of written works and how the words combine to form an overall meaning.
September 20, 2018 Dr. Patrick Hester & Dr. Tom Meyers: The AID Process Assessment Improvement Design Dr. Hester & Dr. Meyers are systems analysts and researchers for NCSOSE Conduct extensive research Quickly become familiar with client systems Formulate concise, objective assessments Dr. Hester Dr. Meyers
Before LASI yes Is the Customer satisfied? September 20, 2018 Before LASI Continue on to the rest of the A.I.D Process Customer Contact yes Is the Customer satisfied? Situational Awareness Meeting Problem Statement Presentation no Will NCSOSE be needed? Document Gathering Process Document Analysis yes Client Goes Elsewhere no
After LASI yes Is the Customer satisfied? September 20, 2018 After LASI Continue on to the rest of the A.I.D Process Customer Contact yes Is the Customer satisfied? Situational Awareness Meeting Problem Statement Presentation no Will NCSOSE be needed? Document Gathering Process Document Analysis yes Client Goes Elsewhere no
Major Functional Components September 20, 2018 Major Functional Components Hardware Software Algorithm: Extrapolates the most likely congruence of themes and ideas across all documents in the input domain High End Notebook PC - Computation Quad-Core CPU - Primary Memory 8.0 GB DDR3 RAM - Document Storage Solid State Storage ~$1500 USD User Interface: - Multi-Level Views - Weighted Phrase List - Detailed Breakdown - Step by Step Justification
Linguistic Analysis Algorithm September 20, 2018 Linguistic Analysis Algorithm Primary Analysis: Word Count and Syntactic Assessment Secondary Analysis: Associative Identification Tertiary Analysis: Semantic Relationship Assessment Traverse Document in Word-Wise Manner Bind Pronouns to Nouns, Updating Frequency Identify Potential Synonyms Identify Corresponding Parts of Speech Bind Adjectives to Nouns Assess Potential Subject-Object-Verb Relationships Determine Frequency by Grammatical Role Identify Potential Noun Phrases Output List of Weighted Themes
September 20, 2018 LASI Milestones
September 20, 2018 Document Parsing
September 20, 2018 Weighter
September 20, 2018 GUI Flow
September 20, 2018 Splash Screen
September 20, 2018 New Project Screen
September 20, 2018 Results Page
Risk Matrix Customer Risks C1 -- Product Interest C2 -- Maintenance September 20, 2018 Risk Matrix Customer Risks C1 -- Product Interest C2 -- Maintenance C3 -- Trust Technical Risks T1 -- System Limitations T2 -- Scanned Text Recognition T3 -- Jargon Recognition T4 – Illegal Character Handling
Customer Risks C1. Product Interest C2. Maintenance C3. Trust September 20, 2018 Customer Risks C1. Product Interest Probability 2 Impact 4 Mitigation: LASI offers unique functionality and user-friendliness. C2. Maintenance Probability 3 Impact 2 Mitigation: LASI will be a free, open source application allowing the community to maintain and extend it over time. C3. Trust Probability 3 Impact 3 Mitigation: LASI will provide a step by step breakdown of output analysis and algorithm reasoning
Technical Risks T1. System Limitations T2. Scanned Text Recognition September 20, 2018 Technical Risks T1. System Limitations Probability 4 Impact 2 Mitigation: LASI will be designed from the ground up in native C++ for memory and CPU efficient code. T2. Scanned Text Recognition Probability 4 Impact 3 Mitigation: LASI will implement an optical character recognition algorithm to handle scanned text
Technical Risks T3. Jargon Recognition Probability 3 Impact 2 September 20, 2018 Technical Risks T3. Jargon Recognition Probability 3 Impact 2 Mitigation: LASI will have domain specific dictionaries and feature intuitive contextual inference. T4. Illegal Character Handling Probability 4 Impact 2 Mitigation: LASI will providers contextual inference, synonym recognition and statistical methods
September 20, 2018 The Competition
Conclusion There is a need for LASI LASI is an algorithm heavy program September 20, 2018 Conclusion There is a need for LASI LASI is an algorithm heavy program Success is beneficial to anyone needing to analyze large sets of documents in a timely, consistent and objective manner
September 20, 2018 References “Patrick Hester" Old Dominion University. N.p., n.d. Web. 24 Sept. 2012 <http://www.odu.edu/directory/people/p/pthester>. "Tom Meyers." NCSOSE. N.p., n.d. Web. 22 Nov. 2012. <http://www.ncsose.org/index.php?option=com_jresearch>. Stanislaw Osinski, Dawid Weiss. 13 August, 2012 . Carrot 2. 9/25/2012 <http://project.carrot2.org>. ”WordStat” Provalis Research. Web. 24 Sept. 2012. <http://provalisresearch.com/products/content-analysis-software/>. “ReadMe: Software for Automated Content Analysis” Web. 24 Sept. 2012. <http://gking.harvard.edu/node/4520/rbuild_documentation/ readme.pdf> "AlchemyAPI Overview." AlchemyAPI. N.p., n.d. Web. 19 Oct. 2012. <http://www.alchemyapi.com/api/>. "AutoMap:." Project. N.p., n.d. Web. 19 Oct. 2012. <http://www.casos.cs.cmu.edu/projects/automap/>. "CL Research Home Page." CL Research Home Page. N.p., n.d. Web. 19 Oct. 2012. <http://www.clres.com/>.