Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Supporting personalized ranking over categorical attributes Presenter : Lin, Shu-Han Authors : Gae-won.

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab N.Y.U.S.T. I. M. Supporting personalized ranking over categorical attributes Presenter : Lin, Shu-Han Authors : Gae-won."— Presentation transcript:

1 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Supporting personalized ranking over categorical attributes Presenter : Lin, Shu-Han Authors : Gae-won You, Seung-won Hwang, Hwanjo Yu Information Sciences 178(2008)

2 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Motivation Objective Methodology Experiments Conclusion Comments

3 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation 3 Categorical attributes’ problem of information retrieval's personal ranking Categorical attributes do not have an inherent ordering. How to rank the relevant data by categorical attribute. For example, how can we… Find old female with the preference of soda drink. NameageGender Favorite Drink Buy Jane30FemaleCokeCoke, Milk Mary25FemalePepsiCoke, Pepsi Tom21MaleWaterMilk, Water Denny26MaleCokeMilk, Juice Tina11FemalePepsiRed Wine, Pepsi

4 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objectives Enable a uniform ranked retrieval over a combination of categorical attributes and numerical attributes. Support ranking of binary representation of categorical attribute  Binary encoding  Sparsity 4 NameFemale Jane1 Mary1 Tom0 Denny0 Tina1 NameCokePepsiWater Jane100 Mary010 Tom001 NameCokePepsiWater Jane101 Mary110 Tom101 Multi-valued attribute with bounded cardinality (item set, bc=2)Single-valued attribute

5 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Overview 5 (1)(1) (2)(2) (3)(3)

6 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Rank formulation 6 F= 0.5*age + 3*female + …

7 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Rank processing (TA) A Simple example query: Find old female with the preference of soda drink. Transform into F= age + female 1.Candidate identification 1.Sorted Access age and female 2.Find top-k sa(age) and sa(female), e.g., k=1, sa(age)={o 1 }; sa(female)={o 2 } 2.Candidate reduction 1.O1=30+0 2.O2=25+1 3.O1 with the highest F score 3.Termination 1.O1 !> F(30,1)=31 // upper bound score 2.Another round of sorted access to consider more candidates, e.g., sa(age)={O4}; sa(female)={O3} 7

8 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Bitmap – binary encoding F=v1+v2+v3+v4, k=2 1)K={}, C={1111} ( Initailization ) 2)OID=excute(C) 3)OID={o4},|OID|>0,K={[o4,4]} 4)C={0111/1011/1101/1110} ( Expansion ) 5)K.count < k, Back to 2) 6)… 8 v1v2v3v4 O11011 O20100 O30111 O41111 o51011

9 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Bitmap– sparsity Single-valued attribute F=w1v1+w2v2+…+w6v6 ranked weight w1 ≧ w2 ≧ w3;w4 ≧ w5 ≧ w6 for simple, all w=1,k=2 1)K={}, C={100.100.100} ( Initailization ) 2)OID=excute(C) 3)OID={o4},|OID|>0, K=OID={[o4,2]} 4)C={010.100.100/ 100.010.100/100.100.010} ( Expansion ) 5)K.count<k, Back to 2) 6)… 9 Attribute1Attribute2 v1V2V3V4V5V6V4V5V6 O1100001010 O2010010100 O3010100010 O4100100100 o5001001010

10 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Bitmap– sparsity Multi-valued attribute with bounded cardinality 10 Attribute1Attribute2 v1-1V1-2V1-3V1-4V2-1V2-2V2-3 O11001101 O20101110 O30110110 O41001101 o50011011

11 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments 11 UCI’s sparsity of indicating variable 22% of dataset consist only the categorical attributes. 56% of combination of numerical & categorical attributes.

12 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments – synthetic data 12

13 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments – real-life data 13

14 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 14 Conclusions This paper studies  How to support rank formulation  Processing over data with categorical attributes  Instead of adopting existing numerical algorithms, develop a bitmap-based approach to  Binary encoding  Sparsity  Single-valued  Multi-valued with bounded cardinality

15 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 15 Comments Advantage  … Drawback  … Application  …


Download ppt "Intelligent Database Systems Lab N.Y.U.S.T. I. M. Supporting personalized ranking over categorical attributes Presenter : Lin, Shu-Han Authors : Gae-won."

Similar presentations


Ads by Google