Presentation is loading. Please wait.

Presentation is loading. Please wait.

RSCTC 2008 Rough Sets in Data Warehousing Infobright Community Edition (ICE)

Similar presentations


Presentation on theme: "RSCTC 2008 Rough Sets in Data Warehousing Infobright Community Edition (ICE)"— Presentation transcript:

1 www.infobright.org www.infobright.com slezak@infobright.com RSCTC 2008 Rough Sets in Data Warehousing Infobright Community Edition (ICE)

2 2 Data Warehousing

3 3

4 4

5 5 Technology Layout

6 6 Two-Level Computing Large Data (10TB) and Mixed Workloads

7 7 Rough Sets Sport? = Yes Classes of records with the same values of the subset of the attributes

8 8 Information Systems Data-based knowledge models, classifiers... Database indices, data partitioning, data sorting... Difficulty with fast updates of structures...

9 Packs storing the values of records for column Salary We can imagine the set of all records relevant to the given query, that is satisfying its SQL filter SELECT COUNT(*) FROM Employees WHERE Salary > $ Rough Sets in Infobright Salary > $ Using Knowledge Grid, we verify, which packs are irrelevant (disjoint with the set), relevant (fully inside the set) and suspect (overlapping) We do not need irrelevant packs. We do not need to decompress relevant ones: we store their local COUNT(*) in the corresponding Data Pack Nodes

10 10 Information Systems in Infobright Query minOUT max Nulls sum match ??? pattern

11 11 SELECT MAX(A) FROM T WHERE B>15; STEP 1STEP 2STEP 3DATA

12 Order Number Order Date Part ID Quantity$Amt 005200702142345001500.00 00520070214334125250.25 00620070215334100212.50 Supplier ID Effective Date Expiry Date Part ID Description A45620050315Null234Pre-measured coffee packets – gold blend A45620061201Null235Pre-measured coffee packets – silver blend A45620060501Null3344-cup Cone coffee filters; quantity 50 Order Detail Table – assume many more rows Supplier/Part Table – assume many more rows Advanced Knowledge Nodes Pack 1Pack 2 Pack 101 Pack 210 Pack 300

13 13 Community Inspirations  Count Distinct  Count(*) on Self-Joins  Decision Trees  Contingencies  New Objectives  New Schemas  New Volumes  New Queries  New KNs  New Data Types  SQL Extensions  Feature Extraction  Data Compression

14 14 Conclusion  Technology based on interaction between rough and precise operations, open for adding new structures  Full product, simple framework, ad-hoc analytics, good load speed, 10:1 „all inclusive” compression  The core technology based on more data mining, rough sets, computing with rough values, et cetera  Infobright Community Edition (ICE) ready for a free usage and study, as well as open for contributions

15 15 References  D. Ślęzak, J. Wróblewski, V. Eastwood, P. Synak: Bright- house: An Analytic Data Warehouse for Ad-hoc Queries. PVLDB 1(2): 1337-1345 (2008).  M. Wojnarski, C. Apanowicz, V. Eastwood, D. Ślęzak, P. Synak, A. Wojna, J. Wróblewski: Method and System for Data Compression in a Relational Database. US Patent Application, 2008/0071818 A1.  J. Wróblewski, C. Apanowicz, V. Eastwood, D. Ślęzak, P. Synak, A. Wojna, M. Wojnarski: Method and System for Storing, Organizing and Processing Data in a Relational Database. US Patent Application, 2008/0071748 A1.

16 THANK YOU!!! www.infobright.org www.infobright.com slezak@infobright.com RSCTC 2008


Download ppt "RSCTC 2008 Rough Sets in Data Warehousing Infobright Community Edition (ICE)"

Similar presentations


Ads by Google