Presentation is loading. Please wait.

Presentation is loading. Please wait.

 Introduction  Algorithm  Framework  Future work  Demo.

Similar presentations


Presentation on theme: " Introduction  Algorithm  Framework  Future work  Demo."— Presentation transcript:

1

2  Introduction  Algorithm  Framework  Future work  Demo

3  Introduction  Algorithm  Framework  Future work  Demo

4  A web-based document comparator  Calculate accurate similarity between 2 documents

5  Introduction  Algorithm  Framework  Future work  Demo

6  Preprocessing  Vector space  Similarity calculation

7 Lowercase Stop words filtering Stemming

8  Stemming › Porter Stemming Algorithm › E.g.  cat – cats  meet – meeting  agree – agreed  correct - correctness

9  Build dictionary 1 › word -> frequency  Sort the keys of dictionary 1  Build dictionary 2 › key -> (index, count)  Build binary vectors › index -> occurrence

10  Vectors v1 and v2  Similarity = v1 * v2 / (norm(v1) * norm(v2))

11  Algorithms coded in Python › Dynamic typing › Not good at numerical operations  Solution: numpy

12  A Python extension module  Written mostly in C  Define numerical array and matrix types and basic operations on them

13  Python code › a = range(10000000) › b = range(10000000) › c = [] › for i in range(len(a)):  c.append(a[i] + b[i])  Takes up to 10 seconds on a several GHz processor

14  Numpy code › import numpy as np › a = np.arrange(10000000) › c = a + b  Almost Instant

15  Vector dot product  Vector normalization  Vector zero filling

16  Introduction  Algorithm  Framework  Future work  Demo

17  Django › The web framework for perfectionists with deadlines

18  Python › Numpy › Porter Stemming  jQuery

19  Alwaysdata › Django 1.3 › Python 2.6

20  Introduction  Algorithm  Framework  Future work  Demo

21  Support file uploading and comparison  Add HTML5 features

22  Introduction  Algorithm  Framework  Future work  Demo

23  http://imds.alwaysdata.net http://imds.alwaysdata.net

24


Download ppt " Introduction  Algorithm  Framework  Future work  Demo."

Similar presentations


Ads by Google