Download presentation
Presentation is loading. Please wait.
Published byCali Hairfield Modified over 10 years ago
1
The Development of Sharing Publication Citation Information Website with Article Search System Using OKAPI BM25 Author Hartono (26405055) Supervisors Resmana Lim, M.Eng. Adi Wibowo, M.T.
2
The need to obtain the necessary scientific journal. Limited access to obtaining scientific journal. The need to get article information, not only by harvesting, but also manual. The need to obtain better search result. Background
3
Problem : How to get article information by harvesting from external journal site? How to input article which formated BibTex, XML or PDF into database? How to harvest article automatically at a certain period? How to do indexes of article exist in database? How to search by using OKAPI BM25 of existing article in database? Goal : To develop information-sharing site for more complete article information and make user get the desired information Problem & Goal
4
Context Diagram
6
Harvesting Process metadataformat verb example : http://citeseerx.ist.psu.edu/oai2? verb=ListMetadataFormats listidentifiers verb example : http://citeseerx.ist.psu.edu/oai2? verb=ListIdentifiers&from=2010-03-17&until=2010-03- 18&metadataPrefix=oai_dc getrecord verb example : http://citeseerx.ist.psu.edu/oai2? verb=GetRecord&identifier=oai:CiteSeerXPSU:10.1.1.1.2918 &metadataPrefix=oai_dc listrecord verb example : http://citeseerx.ist.psu.edu/oai2?verb=ListRecords&from=201 0-03-17&until=2010-03-18&metadataPrefix=oai_dc
7
Article Management Process
8
Indexing Process
9
Title Process Description Process
10
Content Process Creator Process
11
Explode Process Stop Word Process
12
Stemming Process Hitung f(qi,D) Process
13
Total Artikel Process Hitung IDF Process
14
Avgdl Process Search Process
15
OKAPI Process User Management Process
16
Message Management
17
Entity Relationship Diagram (ERD)
18
OKAPI BM25 Okapi BM25 is a function of ratings used search engines to give ratings on the desired documents based on relevance to a given query. OKAPI BM25 Formula Inverse Document Frequency
19
Article example : Article Example TitleDescriptionContent Oai1complex stockhastNumer analysi Model complex real detail analysi build Oai2 Managed abstrach build Manner detail Join creation numer make possibl Oai3 Structur detail possibl Real abstrach world Make detail usual manner Oai4Build world explorAnalysi detailManaged stockhast replicating complex explor
20
Manual : Manual & Program IDF Calculation Program :
21
Keyword example : complex Manual : Program : Manual & Program OKAPI Calculation
22
Article : 500 Keyword : Network System Search result= 198 article Result maybe relevan= 29 article Relevan article result = 12 Recall = 12/12 *100% = 100% Precision = 12/198 *100% = 6% Recall Precision Oai identifierRelevanSearch rank oai:CiteSeerXPSU:10.1.1.1.3301tidak15 oai:CiteSeerXPSU:10.1.1.1.8714tidak12 oai:CiteSeerXPSU:10.1.1.11.3246ya8 oai:CiteSeerXPSU:10.1.1.131.2961tidak6 oai:CiteSeerXPSU:10.1.1.133.114ya3
23
Recall Precision Continue Oai identifierRelevanSearch rank oai:CiteSeerXPSU:10.1.1.133.5166tidak16 oai:CiteSeerXPSU:10.1.1.134.7415tidak25 oai:CiteSeerXPSU:10.1.1.135.7151tidak13 oai:CiteSeerXPSU:10.1.1.138.8592ya5 oai:CiteSeerXPSU:10.1.1.143.7835ya24 oai:CiteSeerXPSU:10.1.1.143.9199tidak28 oai:CiteSeerXPSU:10.1.1.147.3140ya9 oai:CiteSeerXPSU:10.1.1.148.6013ya10 oai:CiteSeerXPSU:10.1.1.149.7229tidak18 oai:CiteSeerXPSU:10.1.1.2.8672tidak29 oai:CiteSeerXPSU:10.1.1.2.876ya4 oai:CiteSeerXPSU:10.1.1.28.2069tidak21 oai:CiteSeerXPSU:10.1.1.28.3751tidak23 oai:CiteSeerXPSU:10.1.1.31.5233ya17
24
Recall Precision Continue Oai identifierRelevanSearch rank oai:CiteSeerXPSU:10.1.1.32.3394tidak19 oai:CiteSeerXPSU:10.1.1.34.422ya20 oai:CiteSeerXPSU:10.1.1.37.133tidak26 oai:CiteSeerXPSU:10.1.1.37.886tidak27 oai:CiteSeerXPSU:10.1.1.46.7941ya1 oai:CiteSeerXPSU:10.1.1.5.5436ya2 oai:CiteSeerXPSU:10.1.1.61.8860tidak22 oai:CiteSeerXPSU:10.1.1.62.5142tidak14 oai:CiteSeerXPSU:10.1.1.8.4971tidak11 oai:CiteSeerXPSU:10.1.1.94.3465ya7
25
Keyword : music model Search result = 150 article Result maybe relevan = 30 article Relevan article result = 14 Recall = 14/14 *100% = 100% Precision = 14/150 *100% = 9.3% Recall Precision Continue Oai identifierRelevanSearch rank oai:CiteSeerXPSU:10.1.1.10.1860ya19 oai:CiteSeerXPSU:10.1.1.10.2860tidak29 oai:CiteSeerXPSU:10.1.1.111.3072ya18 oai:CiteSeerXPSU:10.1.1.127.8691ya21 oai:CiteSeerXPSU:10.1.1.130.1856ya6 oai:CiteSeerXPSU:10.1.1.133.7089tidak27 oai:CiteSeerXPSU:10.1.1.140.3374tidak10
26
Recall Precision Continue Oai identifierRelevanSearch rank oai:CiteSeerXPSU:10.1.1.140.8940ya25 oai:CiteSeerXPSU:10.1.1.142.7598tidak12 oai:CiteSeerXPSU:10.1.1.149.6567ya30 oai:CiteSeerXPSU:10.1.1.152.2688ya11 oai:CiteSeerXPSU:10.1.1.154.24tidak16 oai:CiteSeerXPSU:10.1.1.154.2529ya20 oai:CiteSeerXPSU:10.1.1.155.1750tidak33 oai:CiteSeerXPSU:10.1.1.16.7401tidak32 oai:CiteSeerXPSU:10.1.1.17.1013ya1 oai:CiteSeerXPSU:10.1.1.18.6229tidak13 oai:CiteSeerXPSU:10.1.1.2.6849tidak31 oai:CiteSeerXPSU:10.1.1.2.8672tidak8 oai:CiteSeerXPSU:10.1.1.20.3633ya15 oai:CiteSeerXPSU:10.1.1.31.5233ya7
27
Recall Precision Continue Oai identifierRelevanSearch rank oai:CiteSeerXPSU:10.1.1.32.5049tidak24 oai:CiteSeerXPSU:10.1.1.34.7828ya4 oai:CiteSeerXPSU:10.1.1.4.677ya5 oai:CiteSeerXPSU:10.1.1.4.7323ya3 oai:CiteSeerXPSU:10.1.1.5.1181tidak23 oai:CiteSeerXPSU:10.1.1.5.4681tidak17 oai:CiteSeerXPSU:10.1.1.52.4788tidak28 oai:CiteSeerXPSU:10.1.1.57.3576tidak14 oai:CiteSeerXPSU:10.1.1.59.9118tidak9
28
Keyword : music analysis Search result = 116 article Result maybe relevan = 23 article Relevan article result= 10 Recall = 10/10 *100% = 100% Precision = 10/116 *100% = 8.6% Recall Precision Continue Oai identifierRelevanSearch rank oai:CiteSeerXPSU:10.1.1.10.2860ya22 oai:CiteSeerXPSU:10.1.1.10.3132ya2 oai:CiteSeerXPSU:10.1.1.140.3374tidak3 oai:CiteSeerXPSU:10.1.1.140.8940tidak9 oai:CiteSeerXPSU:10.1.1.145.8953ya5 oai:CiteSeerXPSU:10.1.1.149.6567tidak23 oai:CiteSeerXPSU:10.1.1.154.2529ya19
29
Recall Precision Continue Oai identifierRelevanSearch rank oai:CiteSeerXPSU:10.1.1.155.1750ya17 oai:CiteSeerXPSU:10.1.1.155.4454ya10 oai:CiteSeerXPSU:10.1.1.156.2520ya20 oai:CiteSeerXPSU:10.1.1.18.6229tidak13 oai:CiteSeerXPSU:10.1.1.2.6849tidak21 oai:CiteSeerXPSU:10.1.1.2.8672ya1 oai:CiteSeerXPSU:10.1.1.25.747tidak18 oai:CiteSeerXPSU:10.1.1.29.4192tidak11 oai:CiteSeerXPSU:10.1.1.34.7828ya7 oai:CiteSeerXPSU:10.1.1.4.7323tidak4 oai:CiteSeerXPSU:10.1.1.5.1181tidak16 oai:CiteSeerXPSU:10.1.1.5.4681ya15 oai:CiteSeerXPSU:10.1.1.155.1750ya17 oai:CiteSeerXPSU:10.1.1.52.4788tidak12
30
Recall Precision Continue Oai identifierRelevanSearch rank oai:CiteSeerXPSU:10.1.1.59.9118tidak6 oai:CiteSeerXPSU:10.1.1.6.3984tidak14 oai:CiteSeerXPSU:10.1.1.6.757tidak8
31
Article : 500 Indexing Time Jumlah artikelWaktu yang diperlukan (dtk) 100 artikel 805.1392138 detik 200 artikel 1646.911684 detik 300 artikel 2509.824728 detik 400 artikel 3514.183314 detik 500 artikel 4744.517922 detik
32
Article : 500 Indexing Time Jumlah artikelWaktu yang diperlukan (dtk) 100 artikel 805.1392138 detik 200 artikel 1646.911684 detik 300 artikel 2509.824728 detik 400 artikel 3514.183314 detik 500 artikel 4744.517922 detik
33
Article : 500 Keyword : computer analysis search result: 140 artikel, Time :0.549877882004 second Search Time
34
Keyword : user applications search result : 92 artikel, Time : 0.547022104263 second Search Time Continue
35
Keyword : work scheme search result : 92 artikel, Time : 0.491093873978 second Search Time Continue
36
Keyword : high image transform search result : 101 artikel, Time : 0.498678922653 second Search Time Continue
37
Keyword : network search result : 76 artikel, Time : 0.270733833313 second Search Time Continue
38
Conclusion 1.System only can perform metadata harvesting process with oai_dc metadataformat. 2.System only can updating automatically on the approved url. 3.Time needed by system to generated keyword-related article is varied, according the number of articles produced. 4.Recall on search result is very good, because it has an average of 100% while the precision is bad enough because it had an average of less than 10%. The result was good enough because of all articles that may be relevant if they are rated less than 30. Conclusion
39
Suggestion 1.The system can be developed in order to become data providers. 2.The system can be dynamically able to harvest other metadata formats. Suggestion
40
Thank You For Your Attention
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.