Presentation is loading. Please wait.

Presentation is loading. Please wait.

Who Needs All Those Indexes ? One is Enough Bruce Lindsay IBM Almaden Research Center

Similar presentations


Presentation on theme: "Who Needs All Those Indexes ? One is Enough Bruce Lindsay IBM Almaden Research Center"— Presentation transcript:

1 Who Needs All Those Indexes ? One is Enough Bruce Lindsay IBM Almaden Research Center bgl@almaden.ibm.com

2 Stored Data is Heterogeneous Most stored data is NOT well structured –Text & Semi-structured –Sparse, multi-valued, & multi-occurrence attributes Much value latent in un-structured data Text analytic tools can extract value –Beyond the words: names, roles, concepts, … Text analytics: searching for meaning in the content –Semantic & knowledge driven analysis –Expensive: big dictionaries, byte-by-byte, big inputs and outputs –Stateless  easy scale-out

3 Text Analytics analytic1 analytic2 to Index Derive { } from inputs –Language, words (stems, part-of-speech, …) –Context (title, bold, anchor text, …) –Concepts (person, organization, role, product, …) –Classification (complaint, fraud, spam, xxx, …) –Meta-data (to/from, subject, date, title, abstract, reference, …) Domain and customer specific analysis offer most value Analytics produced attributes induce index schema Object Dictionary Attributes/ Values Attributes/ Values Data Source

4 Text Indexing Logical index over MANY entries per object –Large index – even with aggressive compression –Non-transactional Scale-out needed –Capacity - single index too big for one (commodity) node –Ingest thruput – concurrent insert to index fragments –Query response – fan-out / in for query parallelism Query –Predicates over matches –Match scoring – magic weighting of predicate importance & position –Query planning & optimization probably needed

5 What about Data Processing? select / project / join / aggregate Add “value” postings to index for keys and measures Select: { }   {obj1} Project: { }   {val2} Join: { }   {obj2} Project: { }   {measVal} Aggregation: sum({measVal})

6 Architecture Obj  storeMgr Indexer … scale-out … Analytics Query  queryPlanner  queryDriver  ranked results ObjStore Obj Indexer Obj Queue Obj Analytics Index Fragment file Analytics Index Fragment

7 Conclusions Derived value from un-structured objects –Much value latent in un-structured data –Value extracted via analytic tools –Value captured in scalable index –Value exploited via query and data processing Architecture –Index independent object store schema –Application choice of object analytics induces index schema –Scaled-out analytics and index


Download ppt "Who Needs All Those Indexes ? One is Enough Bruce Lindsay IBM Almaden Research Center"

Similar presentations


Ads by Google