Download presentation
Presentation is loading. Please wait.
Published byAustin Blake Modified over 9 years ago
1
Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1
2
Agenda Business Requirements Client Overview Business Problem Business Goal Solution and Scope Technical Specification System Context Architecture Overview Components & Modules Security Model Document indexing Search Explained Implementation Plan Resource & Costs Development Environment Production Environment Success Criteria Prototype Q&A 2
3
Multi-National Manufacturing & Sales Corporation Business Growth - Multiple Applications - Multiple Repositories Business Problem 3
4
Business Goal Organize Intellectual Capital and Assets Accessibility - Connect knowledge workers securely to relevant information Productivity - Increase productivity and reduce re-work by leveraging knowledge and expertise Client Overview 4
5
Solution Enterprise Knowledge Management Platform 5
6
System Context 6
7
Components & Modules 7
8
Architectural Overview 8
9
Security Model Integrated with existing GLOCO's security infrastructure Any access requires authentication To follow a link in search results, user may need additional authorization for repository access 9
10
Document indexing Document is anything that a search result can point at Documents are external to the search engine Documents include text and metadata Lucene sees each document as a set of named fields 10
11
How search works Lucene sees each document as a set of named fields A record is created for each document to store some fields o URL is usually a stored field The main index is keyed by search term (i.e. inverted) o Typical text fields are tokenized, filtered, and stemmed into terms o Indexed fields may be discarded after processing o For each term, a list of document IDs is stored to help locate records o Also stores frequency and proximity Search involves retrieval of document IDs by term, and stored fields by the document ID 11
12
Resource / Cost Plan 21 weeks total effort 13 member team including GLOCO and Innova INNOVA supports full SDLC with phases Solution Outline, High Level Design, Detailed Design Build / Test / Deploy and Post Production Support 12
13
SLATES - Development Environment Developer workstation to host Virtual Images. Developer workstation to share development Search Servers Fully configured environment to unit test and development 13
14
SLATES - QA / Test and Production Sticky load balancer to remember the serving tomcat Each Search server to hold multiple instances. Shared / Cached Network storage to share index Similar configuration for both QA and Production environment 14
15
Success Criteria and Benchmarks Most important project success criteria are: 10% time and resource savings on certain R&D activities 75% positive feedback on user surveys 50% of the target user group are actively using the system 5% of available documents have user-defined tags 15
16
User 1 Searches for the keyword 'Blood Glucose' 16
17
User 1 gets back the results with the keyword ‘blood glucose’ 17
18
User 1 adds tag ‘diabetes’ to a result 18
19
Tag ‘diabetes’ is immediately available for searching 19
20
User 2 searches for keyword ‘diabetes’ 20
21
User 2 gets back a result for keyword ‘diabetes’ 21
22
User 2 clicks on keyword ‘bp testing’ in the tag cloud 22
23
User 2 gets more results for keyword ‘bp testing’ 23
24
Thank you! Innova would like to thank: Zoya Kinstler Jeff Parker Basem Naseim Valar Jayaprakash Classmates Harvard University Extension School 24
25
Questions? 25
26
Reference Slides 26
27
27
28
Index Growth Index size is a percentage of the document corpus size Maintenance trade-off: o Expensive segment merges - load all segments, write a new one o Fragmented index is expensive to query - must read all segments Lucene index segments are write-once - helps with concurrency Updates are done as delete - re-add. Updates should be batched o Direct tagging is inefficient 28
29
Scalability (Source: Mark Miller, "Scaling Lucene and Solr", Lucid Imagination, 2010) Query volume is scaled by replication Index size and indexing load is scaled by sharding 29
30
Phase 1 - Work Break Down Chart 21 weeks total effort 13 member team including GLOCO and Innova INNOVA supports full SDLC with phases - Solution Outline,High Level Design, Detailed Design, Build / Test / Deploy and Post Production Support 30
31
Use Case - Search and Tag 31
32
Hardware / Software - Detailed Configuration 32
33
Interface Specification 33
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.