Download presentation
Presentation is loading. Please wait.
2
Introduction Algorithm Framework Future work Demo
3
Introduction Algorithm Framework Future work Demo
4
A web-based document comparator Calculate accurate similarity between 2 documents
5
Introduction Algorithm Framework Future work Demo
6
Preprocessing Vector space Similarity calculation
7
Lowercase Stop words filtering Stemming
8
Stemming › Porter Stemming Algorithm › E.g. cat – cats meet – meeting agree – agreed correct - correctness
9
Build dictionary 1 › word -> frequency Sort the keys of dictionary 1 Build dictionary 2 › key -> (index, count) Build binary vectors › index -> occurrence
10
Vectors v1 and v2 Similarity = v1 * v2 / (norm(v1) * norm(v2))
11
Algorithms coded in Python › Dynamic typing › Not good at numerical operations Solution: numpy
12
A Python extension module Written mostly in C Define numerical array and matrix types and basic operations on them
13
Python code › a = range(10000000) › b = range(10000000) › c = [] › for i in range(len(a)): c.append(a[i] + b[i]) Takes up to 10 seconds on a several GHz processor
14
Numpy code › import numpy as np › a = np.arrange(10000000) › c = a + b Almost Instant
15
Vector dot product Vector normalization Vector zero filling
16
Introduction Algorithm Framework Future work Demo
17
Django › The web framework for perfectionists with deadlines
18
Python › Numpy › Porter Stemming jQuery
19
Alwaysdata › Django 1.3 › Python 2.6
20
Introduction Algorithm Framework Future work Demo
21
Support file uploading and comparison Add HTML5 features
22
Introduction Algorithm Framework Future work Demo
23
http://imds.alwaysdata.net http://imds.alwaysdata.net
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.