Presentation is loading. Please wait.

Presentation is loading. Please wait.

ACITA 12 demo outline v0 Dr David Mott (IBM UK) International Technology Alliance In Network & Information Sciences International Technology Alliance In.

Similar presentations


Presentation on theme: "ACITA 12 demo outline v0 Dr David Mott (IBM UK) International Technology Alliance In Network & Information Sciences International Technology Alliance In."— Presentation transcript:

1 ACITA 12 demo outline v0 Dr David Mott (IBM UK) International Technology Alliance In Network & Information Sciences International Technology Alliance In Network & Information Sciences

2 other data Reference data Supporting the "Analyst" doc27 CE Facts InferenceRationale Argumentation Query Analysts Conceptual Model Assumption s Uncertainty CE Tools NLP Requirements Product Linked data web Structured data CE Facts The analyst does not have time to read all the reports

3 Purpose To demonstrate our current capability in Fact Extraction using CE To demonstrate the cycle of creating of new analysts concepts and the inference of high value information To explore how a man-machine interface may be built based on CE, rationale and a mixture of analysts reasoning and linguistic reasoning [To explore potential benefits and costs of a phrase based approach to NLP (as opposed to textual patterns).]

4 Approach - technology Analysts Conceptual Model (including rules) Product other data Reference data Linked data web CE

5 Approach Select a small set of sentences and see what could be inferred from it: –Conceptual model –Analyst rules to perform forensic analysis –Linguistic rules to support extraction of relevant facts from reports List what type of information would be needed to support this in the general case –Verbnet, … Design an approach to generalise this example –Drives the next year research. Run this general approach on all sentences and see what happens –Do we need to change the set of sample sentences? Not just write some linguistic rules and see what happens

6 Scenario - input Use SYNCOIN reports –These include call monitoring and reported texts of the conversations. –A SYNCOIN report may contain several individual sentences '02/24/10 - RT: 2345hrs -- (Delayed report) - Cell phone call monitored on 02/23/10; ET: 0957hrs from an unidentified male (7001408055) in Rashid to an unidentified male (7678112233) in Amin-Habib. The call came immediately following an IED attack of U.S. convoy on Airport Road. The two parties were arguing, the caller stated: "Your team is a failure, we cannot operate this way." The recipient replied: "The materials must have been defective, the design was perfect." Mixture of grammatical and informal text, including specific abbreviations

7 Scenario – Analyst’s task Initially we analyse text for information on conversations –Should be useful to find who talks to whom Then analyst has an idea for a new concept: –“replay conversation” where A talks to B, then B talks to C –Might be useful to track conversations and flow of information? Analyst creates concept and system analyses SYNCOIN data –Some replay messages are detected and shown to the analyst for manual inspection –But also leads to an interesting “rosetta stone” effect, of decoding some informal message codes (see below) –Earlier reports now have increased forensic value system assists analyst to locate earlier reports that were previously unnoticed and decodes their meaning in the light of this knowledge

8 The “carpet code” Conversation1: –'02/24/10 - RT: 2345hrs -- (Delayed report) - Cell phone call monitored on 02/23/10; ET: 0957hrs from an unidentified male (7001408055) in Rashid to an unidentified male (7678112233) in Amin-Habib. The call came immediately following an IED attack of U.S. convoy on Airport Road. The two parties were arguing, the caller stated: "Your team is a failure, we cannot operate this way." The recipient replied: "The materials must have been defective, the design was perfect." Conversation2: –'02/24/10 - Cell call is monitored between unknown caller (7678112233) in Amin to Amir Mahallati (7115452376) in Bayaa. The unidentified caller stated: "The team is a failure! The carpet doesn't match! The carpet maker needs to be replaced." The recipient said: "The measurements were perfect, the installers must have failed.“ Common person/phone number: (7678112233) Carpet = IED Measurements = design Importance of earlier message was missed, now it increases in importance: –'02/01/10 - ET: 0345hrs -- Cell phone call monitored between an unidentified male (7678112233) in Amin-Hibib, Iraq //MGRSCOORD: 38S ND 13 05// and an unidentified male (7115452376) in Bayaa //MGRSCOORD: 38S MB 38 81//. The caller stated: "Start buying carpets for the house like we discussed." The call lasted 10 seconds.'

9 Conceptual Model Call monitoring: –date … Call –Type: cell phone –sender, recipient, text, date, length –sequencing of utterances and assignment to speakers –importance of call Phone numbers –Location –association with people Replay conversation –call1, call2, middle person –similarity relationship between texts Code links – links between two words that express a code? Suitable “expresses” are needed as well

10 Preprocessing Principle: turn ungrammatical conventions into an equivalent and correct grammatical phrase that the parser can handle –eg remove “;” Specific patterns for “identity tagging”: –“( XXX )” -> “tagged as XXX” Dates eg: –“02/23/10; ET: 0957hrs ” -> “, estimated time 02/23/10 t 0957hrs,” (to be determined what is best) MGRS

11 New general linguistic processing Handle passive sentences –Is/was/were + past participle = passive Identity tagging –Use “tagged as” patterns constructed in preprocessing, to infer “sameas” links between things with same tag are these valid inferences of identity from other info (eg cell no?) Dialog context –Includes a set of entities already encountered –Assume all sentences in a SYNCOIN report are in the same dialog context, and no dialog context across reports –All “stands for” things added to the report dialog context –Anaphoric references are de-referenced by searching in the dialog context: “the call” (via last thing of type mentioned) “it” (via last thing mentioned)

12 Domain specific steps - system Generate call entities and call monitor situations (Optional) domain specific information to filter out unreasonable “same as” from identity tags Use additional sentences in report, together with anaphoric reference via dialog context to add the text to the calls. Rules to detect replay conversations (Optional) detect increased relevance of replay conversation, eg “immediately following …” (Optional) check text of replay conversations to detect any obvious similarities to increase likelihood of replay conversation (Optional) perform further inference on how information is passed across a network of people

13 Domain specific steps - GUI Display replay conversations to user with the texts aligned where possible Allow user to review rationale of the conversation as replay (linguistic and analyst reasoning) –Why construction of calls and monitors and their participants –Why “same as” inferences? –Why text similarity –Why is it a replay conversation –Why is it high value? Allow user to accept/deny replay conversations (Optional) Show links between people established by replay conversations Allow user to add new code word links and establish new search criteria on reports Allow user to review the significance of previous reports on the basis of the new code keywords


Download ppt "ACITA 12 demo outline v0 Dr David Mott (IBM UK) International Technology Alliance In Network & Information Sciences International Technology Alliance In."

Similar presentations


Ads by Google