Presentation is loading. Please wait.

Presentation is loading. Please wait.

Feng Zhang, Guang Qiu, Jiajun Bu*, Mingcheng Qu, Chun Chen College of Computer Science, Zhejiang University Hangzhou, China Reporter: 洪紹祥 Adviser: 鄭淑真.

Similar presentations


Presentation on theme: "Feng Zhang, Guang Qiu, Jiajun Bu*, Mingcheng Qu, Chun Chen College of Computer Science, Zhejiang University Hangzhou, China Reporter: 洪紹祥 Adviser: 鄭淑真."— Presentation transcript:

1 Feng Zhang, Guang Qiu, Jiajun Bu*, Mingcheng Qu, Chun Chen College of Computer Science, Zhejiang University Hangzhou, China Reporter: 洪紹祥 Adviser: 鄭淑真 Date:2010/10/26 1

2  The textual advertising market is becoming the substantial source of the Web revenue  Contextual advertising has played an important role in it.  Relevance between content and ads leads users to click and browse the ads and brings the advertisers potential increase in revenue. 2

3  The key step of contextual advertising  Keyword extraction affects the accuracy of the advertising system directly  Research has been done on English keyword extraction.  There is little work existing on Chinese keyword extraction. 1. The unique characteristics of Chinese language 2. The Internet and Webadvertising market have just started in China 3

4  News and email query extraction  TFIDF  The closed captioning of TV news  Mail subjec  Information extraction  Extract phrases  The extraction techniques adopted are different from keyword extraction.  Keyword extraction in case of English  Keyphrase Extraction Algorithm (KEA)  three features  TFIDF  Distance  (number of words before firstword/all words)  Term frequency 4

5  DataProcess 5

6  Candidate selection criterions 1. The length of a candidate is as least two words. 2. The candidate occurs in different places in the same document  Considered as the identical one  Its value of features will be combined 6

7  Building the classifier(Using C4.5 decision tree algorithm)  Feature selection.  Binary Value  Linguistic features.  noun, verb …  Named Entity.  Name,Place …  Numeric Value  Length.  Length of the candidate  Length of the document  Sentence number of the document 7

8  Building the classifier(Using C4.5 decision tree algorithm)  Feature selection.  Location.  First (nth phrase/all phrases),(nth sentence/all sentences)  Last (nth phrase/all phrases),(nth sentence/all sentences)  TFIDF.  Traditional  log 2 ( TF +1)  log 2 ( IDF +1)  Information entropy.  H ( x ) = −( T/N )*log 2 ( T/N )  Diameter.  Last(nth phrase)-first(nth phrase)  Last(nth sentence)-first(nth sentence) 8

9  Corpus construction.  Contains 2200 documents  2000 for training and 100 for testing  Labeling.  Submit the candidates in a document to Google  Performance measures  Top − N = CorrectNum/TotalNum 9

10  Algorithm comparison experiment. 10

11  Feature contribution experiment. 11

12  Feature contribution experiment.  To analyze other features’ influences 12

13  The experimental results show that our approach is promising and has a large improvement over KEA and Yih’s work, ignoring the difference of the language.  We attribute the superior performance to the appropriate features we select and the classification algorithm we adopt. 13


Download ppt "Feng Zhang, Guang Qiu, Jiajun Bu*, Mingcheng Qu, Chun Chen College of Computer Science, Zhejiang University Hangzhou, China Reporter: 洪紹祥 Adviser: 鄭淑真."

Similar presentations


Ads by Google