Download presentation
Presentation is loading. Please wait.
Published byValentine Paul Modified over 9 years ago
1
Incremental Context Mining for Adaptive Document Classification Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Rey-Long Liu Yun-Ling Lu
2
Motivation Objective Introduction Overview of the approach Incremental context mining for ACclassifier Experiments Conclusions Personal Opinion Review Outline
3
Motivation Adaptive document classification (ADC) that adapts a DC system to the evolving contextual requirement of each document category, so that input documents may be classified based on their contexts of discussion.
4
Objective 1.CR terms should be mined by analyzing multiple documents from multiple categories. 2.Inappropriate feature may introduce the problems of inefficiency and errors. 3.ADC may serve as the basis for supporting efficient and high-precision DC.
5
1.Introduction Two components of ACclassifier (Adaptive Context- based Classifier). 1. An incremental context miner 2. Document classifier. Both components work on a given text hierarchy in which a node corresponds to a document category.
6
2.Overview of the approach CR of 管理學院 CR of 資管 CR of MISCR of DSS CR of 財管 CR of 管理學
7
3-1.An incremental context miner 管理學院 資管 MISDSS 財管 管理學
8
3-2.An incremental context miner 資管 MIS Computer 5/20 Dos 10/20 EC 2/20 Manage 5/30 BtoB 3/20 Computer 10/40 Notebook 3/40 DSS Computer 3/15
9
3-3.CR MIS Computer 15/90 Dos 10/90 Manage 5/90 EC 3/90 BtoB 3/90 Notebook 3/90 CR : Contextual Requirement of the category DSS Computer 3/15 EC 10/15
10
Strength: w serving as a context word for the documents under c TFIDF (Term Frequency * Inverse Document Frequency) 3-4. TFIDF
11
3-5. TFIDF Strength ( W computer, C MIS)= Strength ( W dos, C MIS)=
12
3-6. The incremental context miner 資管 MISDSS S(computer)=0.909 S(dos)=2 S(EC)=0.476 S(computer)=0.022 S(computer)>0.909 電機
13
3-7.An incremental context miner
14
4-1. DOA Given a document d to be classified, the basic idea is to compute the degree of acceptance (DOA). The DOA is computed based on the strengths of d ’ s distinct words on c.
15
4-2. Two phases of classifier (1)The estimation of DOA for each category. (2)The identification of the winner category.
16
4-3. Estimation of DOA for each C DOA of 管理學 院 DOA of 資管 DOA of MISDOA of DSS DOA of 財管 DOA of 管理學
17
4-4. DOA Frequency:5 D 1 : 5000 minSupport:0.001 If w is a strong context word in c and occurs many times in d, c is more likely to “ accept ” d.
18
4-5. Constraint I New Di Computer 20/40 DOS 10/40 Java 2/40 Mouse 3/40 Delphi 1/40
19
4-6. Constraint II 資管課程 作業系統 S(DOS) =2 演算法 S(DOS) =0.9982 資訊網路 S(DOS) =0.6 電子商務 S(DOS) =0.003 資料結構 S(DOS) =1.112
20
4-7. Given a document to be classified MISDSS New Di Computer 20/40 DOS 10/40 S(computer)=0.909 S(dos)=2 S(EC)=0.476 S(computer)=0.022 If w is a strong context word in c and occurs many times in d, c is more likely to “ accept ” d.
21
4-8. DOA DOA MIS =0.909 * 20/40 = 0.4545 DOA MIS =2 * 10/40 = 0.5 DOA MIS =0.9545 DOA MIS of D new
22
4-9. Complete the DOA of all Category
23
4-9. The document classifier
24
5-1. correct classification Builting from the 1100 documents for initial training.
25
5-2. correct classification Baseline :allowed to use 5000 features in their feature set.
26
5-3. correct classification Using all training documents to build their feature set and classifiers.
27
5-4. Consider the test document entitled “Setting up Email in DOS with today’s ISP using a dialup PPP TCP/IP connection”. Baseline systems: “Software”,””Windows”,and “Operating Systems” ACclassifier:”TCP/IP”,”connection”,”comput ernetworking”,”userID”
28
5-5. cumulative training & testing time(sec.) The time spent by ACclassifier grew slower when about 1400 training documents were entered.
29
5-6. cumulative training & testing time(sec.) The time spent by ACclassifier grew slower when about 1400 training documents were entered.
30
6. Conclusions 1.Efficient mining of the contextual requirements for high-precision DC. 2.Incremental mining without reprocessing previous documents. 3.Evolutionary maintenance of the feature set. 4.Efficient and fault-tolerant hierarchical DC.
31
7.Personal Opinion It ’ s acceptable on purity in hierarchy.
32
8.Review
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.