CyberMiner Software Architecture Group Kimberly West, Nadia Noori, Stanislav Minkevych Basic Goal : Web Search Engine that : Accepts list of keywords Returns list of URLs whose description contains any of the given keywords Uses KWIC Key Word In Context to maintain database of URL & description
Requirements Specification Functional : After input, the descriptor part of the line is circularly shifted by repeatedly removing the first word and appending it to the end of the line Outputs a list of all circular shifts of the descriptor parts of all lines in alphabetically ascending order, together with their corresponding URLs No noise words such as “a”, “the”, or “of” at the start of output list lines Grow indices with possible later additions
Requirements Specification Non-Functional : Easily Understood & Used – clear use capabilities, features, simplicity to design Portability/ Reuse – not restricted to certain operating systems, machines, or certain developers, anyone can use the system & understand its architecture to adapt it to their environment, few system limitations Traceability – object oriented style using abstract data types, each process is linked to a specific individual module Good Performance & Responsive – readily & easily reacts to changes, output to input ratio, time factor
Components & Connections : Indexing Repository contains the full HTML of every web page documents are stored one after the other and are prefixed by ID, length, and URL requires no other data structures to be used in order to access it (helps with data consistency and makes development easier) Index keeps information about each document, is a fixed width index, ordered by docID contains current document status, pointer into the repository, a document checksum, various statistics If the document has been crawled, also contains a pointer into a variable width file called docinfo which contains its URL and title Otherwise the pointer points into the URL list which contains just the URL
Line Storage Create, access, and possibly delete character, words, and lines listens for InputEvent using the interface LSListener Store the lines LineStorage generates event called LSEvent
Line Storage Procedure setchar (l-line, w-word, c-char, a) Function char (l-line, w-word, c-char) returns an character representing the c-th character in the w-th word of l-th line return blank if out-of-range Function word ( l-line) returns the number of words in line l
Subprogram call System I/O Implicit invocation Master Control Line Storage Alphabetizing Control Input Input medium Output Output medium Circular Shift Searcher
CyberMiner Engine Searches indexed keywords Uses Boolean arguments Case-sensitivity selector