How Google and Microsoft taught search to “understand” the Web Austin Granger Chris Hesemann
Knowledge of the Web String searching does not always convey the true meaning of content. Search by knowledge, not by sub-string matching. Extracting and categorizing concepts allows for knowledge-based searching.
“Web of Concepts” Extract raw data (phone numbers, addresses, prices, etc.). Link related entities together (e.g., link actor to movie). Categorize information about each entity (what does this store sell, what has this author written, how highly are they reviewed?).
Search engines discover webpages, parse them into objects and data, process them and store the data, updating existing entries as needed. “Concept web” stored in vast databases. – Not traditional databases. – Based on graph theory, not relational model. – Database consists of nodes and links.
Memory Cloud To make this efficient we must traverse the entire graph in milliseconds. One solution – “memory cloud.” – Store entire database within memory at all times. Example: Google search “blowfish” – Results: Show company, encryption algorithm, sushi – New results: Suggest “pufferfish”
Limitations Currently only works in English. Including other languages increases the complexity exponentially, we’ve got a long way to go. Dissecting language to understand searches written in normal language, not just keywords. The Future of Knowledge Searching