Incremental Context Mining for Adaptive Document Classification Advisor ： Dr. Hsu Graduate ： Chien-Shing Chen Author ： Rey-Long Liu Yun-Ling Lu.

Incremental Context Mining for Adaptive Document Classification Advisor ： Dr. Hsu Graduate ： Chien-Shing Chen Author ： Rey-Long Liu Yun-Ling Lu

Motivation Objective Introduction Overview of the approach Incremental context mining for ACclassifier Experiments Conclusions Personal Opinion Review Outline

Motivation Adaptive document classification (ADC) that adapts a DC system to the evolving contextual requirement of each document category, so that input documents may be classified based on their contexts of discussion.

Objective 1.CR terms should be mined by analyzing multiple documents from multiple categories. 2.Inappropriate feature may introduce the problems of inefficiency and errors. 3.ADC may serve as the basis for supporting efficient and high-precision DC.

1.Introduction Two components of ACclassifier (Adaptive Context- based Classifier). 1. An incremental context miner 2. Document classifier. Both components work on a given text hierarchy in which a node corresponds to a document category.

2.Overview of the approach CR of 管理學院 CR of 資管 CR of MISCR of DSS CR of 財管 CR of 管理學

3-1.An incremental context miner 管理學院資管 MISDSS 財管管理學

3-2.An incremental context miner 資管 MIS Computer 5/20 Dos 10/20 EC 2/20 Manage 5/30 BtoB 3/20 Computer 10/40 Notebook 3/40 DSS Computer 3/15

3-3.CR MIS Computer 15/90 Dos 10/90 Manage 5/90 EC 3/90 BtoB 3/90 Notebook 3/90 CR : Contextual Requirement of the category DSS Computer 3/15 EC 10/15

Strength: w serving as a context word for the documents under c TFIDF (Term Frequency * Inverse Document Frequency) 3-4. TFIDF

3-5. TFIDF Strength ( W computer, C MIS)= Strength ( W dos, C MIS)=

3-6. The incremental context miner 資管 MISDSS S(computer)=0.909 S(dos)=2 S(EC)=0.476 S(computer)=0.022 S(computer)>0.909 電機

3-7.An incremental context miner

4-1. DOA Given a document d to be classified, the basic idea is to compute the degree of acceptance (DOA). The DOA is computed based on the strengths of d ’ s distinct words on c.

4-2. Two phases of classifier (1)The estimation of DOA for each category. (2)The identification of the winner category.

4-3. Estimation of DOA for each C DOA of 管理學院 DOA of 資管 DOA of MISDOA of DSS DOA of 財管 DOA of 管理學

4-4. DOA Frequency:5 D 1 : 5000 minSupport:0.001 If w is a strong context word in c and occurs many times in d, c is more likely to “ accept ” d.

4-5. Constraint I New Di Computer 20/40 DOS 10/40 Java 2/40 Mouse 3/40 Delphi 1/40

4-6. Constraint II 資管課程作業系統 S(DOS) =2 演算法 S(DOS) =0.9982 資訊網路 S(DOS) =0.6 電子商務 S(DOS) =0.003 資料結構 S(DOS) =1.112

4-7. Given a document to be classified MISDSS New Di Computer 20/40 DOS 10/40 S(computer)=0.909 S(dos)=2 S(EC)=0.476 S(computer)=0.022 If w is a strong context word in c and occurs many times in d, c is more likely to “ accept ” d.

4-8. DOA DOA MIS =0.909 * 20/40 = 0.4545 DOA MIS =2 * 10/40 = 0.5 DOA MIS =0.9545 DOA MIS of D new

4-9. Complete the DOA of all Category

4-9. The document classifier

5-1. correct classification Builting from the 1100 documents for initial training.

5-2. correct classification Baseline :allowed to use 5000 features in their feature set.

5-3. correct classification Using all training documents to build their feature set and classifiers.

5-4. Consider the test document entitled  “Setting up Email in DOS with today’s ISP using a dialup PPP TCP/IP connection”.  Baseline systems: “Software”,””Windows”,and “Operating Systems”  ACclassifier:”TCP/IP”,”connection”,”comput ernetworking”,”userID”

5-5. cumulative training & testing time(sec.) The time spent by ACclassifier grew slower when about 1400 training documents were entered.

5-6. cumulative training & testing time(sec.) The time spent by ACclassifier grew slower when about 1400 training documents were entered.

6. Conclusions 1.Efficient mining of the contextual requirements for high-precision DC. 2.Incremental mining without reprocessing previous documents. 3.Evolutionary maintenance of the feature set. 4.Efficient and fault-tolerant hierarchical DC.

7.Personal Opinion It ’ s acceptable on purity in hierarchy.

8.Review

Incremental Context Mining for Adaptive Document Classification Advisor ： Dr. Hsu Graduate ： Chien-Shing Chen Author ： Rey-Long Liu Yun-Ling Lu.

Similar presentations

Presentation on theme: "Incremental Context Mining for Adaptive Document Classification Advisor ： Dr. Hsu Graduate ： Chien-Shing Chen Author ： Rey-Long Liu Yun-Ling Lu."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Incremental Context Mining for Adaptive Document Classification Advisor ： Dr. Hsu Graduate ： Chien-Shing Chen Author ： Rey-Long Liu Yun-Ling Lu.

Similar presentations

Presentation on theme: "Incremental Context Mining for Adaptive Document Classification Advisor ： Dr. Hsu Graduate ： Chien-Shing Chen Author ： Rey-Long Liu Yun-Ling Lu."— Presentation transcript:

Similar presentations

About project

Feedback