Download presentation
Presentation is loading. Please wait.
Published byMadeline Norton Modified over 9 years ago
1
Final Presentation Tong Wang
2
1.Automatic Article Screening in Systematic Review 2.Compression Algorithm on Document Classification
3
Automatic Article Screening Review Question: Vitamin C for preventing and treating common cold? Data set: 17 References articles. 664 Not references articles.
4
Problem Definition Input : document d classes(c1 = Reference, c2 = not a reference) Output: predicted class of d Goal: find all articles belong to c1(Reference)
5
Build Features “Bag of Words” assumption: the order of words in a document can be neglected Preprocessing: tokenization, lemma, remove stop words, remove some part of speech. Need a step: Name Entity Recognizer(NER), it labels sequences of words which are the name of things. It is implemented by linear chain Conditional Random Field(CRF)
6
Build features Vector space model Extract vocabulary over all articles. Each document can be represented by a vector, value in each dimension is the word frequency in this article N = size of vocabulary w1, w2, w3, w4… wN d1 1 0 2 0 … 0 d2 0 1 0 0 … 0
7
Naïve Bayes
8
Logistic Regression
9
Discuss Define loss matrix, give high penalty for false negative. Another way is to use Cosine distance to compute similarity between articles. Wiki def: Use other nlp probability model, like LSA, LDA
10
Compression The basic idea is the data contains patterns that occur with a certain regularity will be compressed more efficiently It is generally inexpensive
11
d(x, y) = c(x y)/(c(x) + c(y)) x: A document c(x) : size of compressed file x xy: the file obtained by concatenating x and y d(x,y) – 1/2 >= 0 X X y y xy C(x) C(y) C(xy)
12
Compression Matrix a1 a2 a3 a4…. b1 d(b1, a1) d(b1, a2) b2 d(b2, a1) d(b2, a2) b3 b4 …
13
Experiments Two groups of drug review(ADHD) articles. Two groups of machine learning articles. Each group has 15 articles Intuitively d(ADHD, ADHD) < d(ADHD, machine learning) d(machine learning, machine learning) < d(ADHD, machine learning)
15
Future work More experiments Compare cosine(x, y) and d(x, y)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.