Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 The Automatic SNOMED Coding System Data Overview System Design Experiments Future Work By Weihang ZHANG Supervisor: Prof. Jon PATRICK.

Similar presentations


Presentation on theme: "1 The Automatic SNOMED Coding System Data Overview System Design Experiments Future Work By Weihang ZHANG Supervisor: Prof. Jon PATRICK."— Presentation transcript:

1 1 The Automatic SNOMED Coding System Data Overview System Design Experiments Future Work By Weihang ZHANG Supervisor: Prof. Jon PATRICK

2 2 Data Overview – The Pathology Text 400K of pathology texts from the SWAPS Anatomical Pathology Database Every pathology text is indexed by “RequestID” A set of diagnoses for each report (pathology text), presented as SNOMED RT codes

3 3 Data Overview – The Pathology Text Insight into a text (RequestID=“1”) CLINICAL HISTORY Biopsy of discoid erythematosus like lesion from right cheek ? DLE. MACROSCOPIC LABELLED `RIGHT CHEEK LESION'. An ellipse 12 x 3mm with subcutis to 3mm. A poorly defined pale nodular lesion 3 x 3mm. It appears to abut the surgical margin. Representative sections embeded, A tips face on, B lesion and surgical margin. (MR 17/4) TA MICROSCOPIC Section shows hyperkeratosis with occasional follicular plugging, epidermal atrophy and severe sundamage to dermal collagen. A dense chronic inflammatory cell infiltrate, both superficial and deep is present, mainly in a perivascular and periadnexal distribution. No liquefaction degeneration of the basal layer, no dermal oedema and no interface dermatitis are seen. PAS stain reveals no thickening of the epidermal basement membrane and only an occasional fungal spore on the skin surface. Immunofluorescence for immunoglobulins and complement fractions are negative. The differential diagnosis rests between chronic discoid erythematosus, lymphocytic infiltration of skin of Jessner and the plaque type of polymorphous light eruption. The presence of marked solar damage to collagen, the absence of basal liquefaction degeneration and the negative immunofluorescence favours polymorphous light eruption. A reaction to drugs or an insect bite is also a possibility. No evidence of malignancy. Reported 24/4/98

4 4 Data Overview – The SNOMED Code The SNOMED Codes Assigned The SNOMED CodesExplained The SNOMED Codes Explained

5 5 Data Overview - Sequence Number and Multi-lable One pathology text v.s. many SNOMED Code Example- RequestID ’1’: 4 sequences, 3 codes

6 6 Data Overview - DiagnosisCodes Distribution The first 9 codes are selected for experiments (excluding ‘None’) All the left codes are considered as “others” Uniformly Random Select 10K pathology texts from 400K texts database

7 7 Data Overview – Attributes inspection on the texts Attributes inspection: Self-Separated Distribution (SVM…?)

8 8 System Design – Text-to-Vector Extracting Configuration (xml)  Properly de-coupled classes  Flexibly interchange methods  Freely switch the output file type  Prepared for unsupervised experiments Uniformly Random Selection  10K texts are selected from 400K dataset Stratified Resample  10-fold Cross Validation Output (ARFF,DAT)  ARFF : For Weka  DAT : For SVM-light and MaxEnt

9 Text to VectorText to Vector

10 10 System Design – Classifiers Generation Machine Learner Selection  SVM-Light  MaxEnt  J48(Weka)  … Classifiers Generation  Depending on the selected Features and MLs LM SelectionLM Selection Classifiers GenerationClassifiers Generation

11 11 System Design – Code Assembling Assembly line  Text-to-Vector conversion  Vector Deliver Classifiers – the workers  All classifiers work together, assign the vector their classification results Code AssemblingCode Assembling

12 12 3 sets of comparisons (winners)  Preprocessing: Stem-All-Words v.s. Stem-None  Indexing: Weight-Entropy v.s. Word-Frequency v.s. Boolean-Weight  Learning: Machine Learner Performances SVM v.s. Maximum-Entropy (MaxEnt) v.s. Weka J48 Baseline 1 :  Stem-None + Weight-Entropy + SVM ?  The trade-off between TIME and ACCURACY Baseline 1~ :  Stem-None + Boolean-Weight + MaxEnt Experiments –

13 13 SNOMED Concepts participation  Add SNOMED Concept IDs as extra feature  Replacement words with Concept IDs  Use SNOMED Concepts instead of text Negation (in company with SNOMED-Concept extraction) Try N-gram (Bigram, Trigram) Insight to the sections of text – section hiding / focusing ,,, …  Full structured texts??? Future Work –


Download ppt "1 The Automatic SNOMED Coding System Data Overview System Design Experiments Future Work By Weihang ZHANG Supervisor: Prof. Jon PATRICK."

Similar presentations


Ads by Google