I NFORMATION AND T ELECOMMUNICATION T ECHNOLOGY C ENTER Data Mining of Blood Handling Incident Databases Costas Tsatsoulis Information and Telecommunication.

I NFORMATION AND T ELECOMMUNICATION T ECHNOLOGY C ENTER Data Mining of Blood Handling Incident Databases Costas Tsatsoulis Information and Telecommunication Technology Center Dept. of Electrical Engineering and Computer Science University of Kansas tsatsoul@ittc.ku.edu

I NFORMATION AND T ELECOMMUNICATION T ECHNOLOGY C ENTER Background Incident reports collected for handling of blood products An initial database was collected to allow experimentation Goals: –Allow the generation of intelligence from data Unique events Event clusters Event trends Frequencies –Simplify the job of the QA Similar reports Less need for in-depth causal analysis –Allow cross-institutional analysis

I NFORMATION AND T ELECOMMUNICATION T ECHNOLOGY C ENTER Annual Accidental Deaths in U.S.A.

I NFORMATION AND T ELECOMMUNICATION T ECHNOLOGY C ENTER Institute of Medicine Recommendation November 1999 Establish a national focus of research, to enhance knowledge base about patient safety Identify and learn from errors through both mandatory and voluntary reporting systems Raising standards and expectations through oversight organizations Create safety systems through implementation of safe practices at the delivery level

I NFORMATION AND T ELECOMMUNICATION T ECHNOLOGY C ENTER Near Miss Event Reporting Useful data base to study system’s failure points Many more near misses than actual bad events Source of data to study human recovery Dynamic means of understanding system operations

I NFORMATION AND T ELECOMMUNICATION T ECHNOLOGY C ENTER The Iceberg Model of Near-Miss Events 1/2,000,000 fatalities 1/38,000 ABO incompatible txns 1/14,000 incorrect units transfused 1/2,000,000 1/38,000 1/14,000 Near-Miss Events

I NFORMATION AND T ELECOMMUNICATION T ECHNOLOGY C ENTER Intelligent Systems Developed two separate systems: –Case-Based Reasoning (CBR) –Information Retrieval (IR) Goal was to address most of the needs of the users: –Allow the generation of intelligence from data Unique events Event clusters Event trends Frequencies –Simplify the job of the QA Similar reports Less need for in-depth causal analysis –Allow cross-institutional analysis

I NFORMATION AND T ELECOMMUNICATION T ECHNOLOGY C ENTER Case-Based Reasoning Technique from Artificial Intelligence that solves problems based on previous experiences Of significance to us: –CBR must identify a similar situation/problem to know what to do and how to solve the problem Use CBR’s concept of “similarity” to identify: –similar reports –report clusters –frequencies

I NFORMATION AND T ELECOMMUNICATION T ECHNOLOGY C ENTER What is a Case and how do we represent it? An incident report is a “case” Cases are represented by: –indexes descriptive features of a situation surface or in-depth or both –their values symbolic “Technician” numerical “103 rpm” sets “{Monday, Tuesday, Wednesday}” other (text, images, …) –weights indicate the descriptive significance of the index

I NFORMATION AND T ELECOMMUNICATION T ECHNOLOGY C ENTER Finding Similarity Define degrees of matching between attributes of an event report. For example: –“Resident” and “MD” are similar –“MLT,” “MT,” and “QA/QC” are similar A value may match perfectly or partially –“MLT” to “MLT” –“MLT” to “MT” Different attributes of the event report are weighted The sum of the matching attributes with their degree of match and their weights, defines similarity Cases matching over some predefined degree of similarity are retrieved and considered similar

I NFORMATION AND T ELECOMMUNICATION T ECHNOLOGY C ENTER Information Retrieval Index, search and recall text without any domain information Preprocess document –remove stop words –stemming Use some representation for documents –vector-space model vector of terms with their weight = tf * idf tf = term frequency = (freq of word)/(freq of most frequent word) idf = inverse document frequency = log 10 ((total docs)/(docs with term)) Use some similarity metric between documents –vector algebra to find the cosine of angle between vectors

I NFORMATION AND T ELECOMMUNICATION T ECHNOLOGY C ENTER CBR for From the incident report features selected a subset as indexes Semantic similarity defined –(OR, ER, ICU, L&D) –(12-4am, 4-8am), (8am-12pm, 12-4pm), (4-8pm, 8pm-12am) Domain-specific details defined Weights assigned –fixed –conditional weight of some causal codes based on whether they were established using a rough or in-depth analysis

I NFORMATION AND T ELECOMMUNICATION T ECHNOLOGY C ENTER IR for No deletion of stop words –“or” vs. “OR” No stemming Use the vector space model and the cosine comparison measure

I NFORMATION AND T ELECOMMUNICATION T ECHNOLOGY C ENTER

Experiments Database of approx. 600 cases Selected 24 reports to match against case base CBR retrieval - CBR_match_value EXPERIMENT 1 IR retrieval - IR_match_value EXPERIMENT 2 Combined retrieval EXPERIMENTS 3-11 –W CBR *CBR_match_value +W IR *IR_match_value –weights range from 0.9 to 0.1 in increments of 0.1 –(0.9,0.1), (0.8,0.2), (0.7,0.3), …, (0.2,0.8),(0.1,0.9) CBR retrieval with all weights set to 1 EXPERIMENT 12 No retrieval threshold set

I NFORMATION AND T ELECOMMUNICATION T ECHNOLOGY C ENTER Evaluation Collected top 5 cases for each report for each experiment Because of duplication, each report had 10-20 cases retrieved for all 12 experiments A random case was added to the set Results sent to experts to evaluate –Almost Identical –Similar –Not Very Similar –Not Similar At All

I NFORMATION AND T ELECOMMUNICATION T ECHNOLOGY C ENTER Preliminary Analysis Determine agreement/disagreement with expert’s analysis –is a case similar? –is a case dissimilar? Establish accuracy (recall is more difficult to measure) False positives vs. false negatives What is the influence of the IR component? Are the weights appropriate? What is the influence of varying selection thresholds?

I NFORMATION AND T ELECOMMUNICATION T ECHNOLOGY C ENTER Results with 0.66 threshold CBRIRCBR+IR (90::10) CBR equal weights Retr.Non- retr. Retr.Non-retr.Retr.Non-retr.Retr.Non-retr. Retrievable0.900.100.440.560.450.550.170.83 Non- retrievable 0.750.250.070.930.220.780.120.88

I NFORMATION AND T ELECOMMUNICATION T ECHNOLOGY C ENTER Results with 0.70 threshold CBRIRCBR+IR (90::10) CBR equal weights Retr.Non- retr. Retr.Non-retr.Retr.Non-retr.Retr.Non-retr. Retrievable0.770.230.380.620.320.680.030.97 Non- retrievable 0.450.550.070.930.070.930.030.97

I NFORMATION AND T ELECOMMUNICATION T ECHNOLOGY C ENTER Combined Results CBRIRCBR+IR (90::10) CBR equal weights Retr.Non- retr. Retr.Non- retr. Retr.Non- retr. Retr.Non-retr. Retrievable0.90 0.77 0.10 0.23 0.44 0.38 0.56 0.62 0.45 0.32 0.55 0.68 0.03 0.17 0.97 0.83 Non- retrievable 0.75 0.45 0.25 0.55 0.07 0.93 0.07 0.22 0.93 0.78 0.03 0.12 0.97 0.88 Increasing selection threshold

I NFORMATION AND T ELECOMMUNICATION T ECHNOLOGY C ENTER Some preliminary conclusions The weights used in CBR seem to be appropriate and definitely improve retrieval In CBR, increasing the acceptance threshold improves selection of retrievable cases but also increases the false positives IR does an excellent job in identifying non-retrievable cases Even a 10% inclusion of IR to CBR greatly helps in identifying non-retrievable cases

I NFORMATION AND T ELECOMMUNICATION T ECHNOLOGY C ENTER Future work Plot performance versus acceptance threshold –identify best case selection threshold –Integrate the analysis of the second expert Examine how CBR and IR can be combined to exploit each one’s strengths: –CBR performs initial retrieval –IR eliminates bad cases retrieved Look into temporal distribution of retrieved reports and adjust their matching accordingly Examine a NLU system for incident reports that have longer textual descriptions Re-run on different datasets Get our hands on large datasets and perform other types of data mining (rule induction, predictive models, probability networks, supervised and unsupervised clustering, etc.)

I NFORMATION AND T ELECOMMUNICATION T ECHNOLOGY C ENTER Data Mining of Blood Handling Incident Databases Costas Tsatsoulis Information and Telecommunication.

Similar presentations

Presentation on theme: "I NFORMATION AND T ELECOMMUNICATION T ECHNOLOGY C ENTER Data Mining of Blood Handling Incident Databases Costas Tsatsoulis Information and Telecommunication."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

I NFORMATION AND T ELECOMMUNICATION T ECHNOLOGY C ENTER Data Mining of Blood Handling Incident Databases Costas Tsatsoulis Information and Telecommunication.

Similar presentations

Presentation on theme: "I NFORMATION AND T ELECOMMUNICATION T ECHNOLOGY C ENTER Data Mining of Blood Handling Incident Databases Costas Tsatsoulis Information and Telecommunication."— Presentation transcript:

Similar presentations

About project

Feedback