Presentation is loading. Please wait.

Presentation is loading. Please wait.

Benefits of InterSite Pre-Processing and Clustering Methods in E-Commerce Domain Sergiu Chelcea, Alzennyr Da Silva, Yves Lechevallier, Doru Tanasa, Brigitte.

Similar presentations


Presentation on theme: "Benefits of InterSite Pre-Processing and Clustering Methods in E-Commerce Domain Sergiu Chelcea, Alzennyr Da Silva, Yves Lechevallier, Doru Tanasa, Brigitte."— Presentation transcript:

1 Benefits of InterSite Pre-Processing and Clustering Methods in E-Commerce Domain Sergiu Chelcea, Alzennyr Da Silva, Yves Lechevallier, Doru Tanasa, Brigitte Trousse AxIS Research Team INRIA Sophia Antipolis and Rocquencourt

2 Motivations To show on the clickstream dataset proposed for ECML/PKDD 2005 Discovery challenge the benefits of our InterSite pre-processing method proposed by Tanasa in his PhD Thesis (2005) And the benefits of a new crossed clustering method developed by Lechevallier&Verde and published in (2003, 2004) on Web logs 2 main viewpoints: User and web site charge

3 Plan 1. Intersite Data Pre-Processing - introduction of user’s intersite visit « Group of SessionIDs » - first statistical Intersite analysis 2. Crossed Clustering Approach - confusion table with classes of time periods and classes of product types - analysis on the most used shop: shop 4 3. Conclusions

4 Table 1. Format of page requests ShopIDDateIP addressSessionIDPageReferrer 111074585663213.151.91.186939dad92c4…84208dca/ 111074585670213.151.91.18687ee02ddcff…7655bb9e/ct/?c=148http://www.shop2.cz Table 2. Number of requests per shop ShopIDSite name (shop)#Requests 10www.shop1.cz509,688 11www.shop2.cz400,045 12www.shop3.cz645,724 14www.shop4.cz1,290,870 15www.shop5.cz308,367 16www.shop6.cz298,030 17www.shop7.cz164,447 Data pre-processing Initial data:

5 Data pre-processing Tanasa & Trousse (IEEE Intelligent Systems 2004) Tanasa ‘s Thesis (2005)

6 Table 3. Transformed log lines DatetimeIPSessionIDURLReferrer 2004-01-20 09:01:03213.151.91.186 939dad92c4…84208dcahttp://www.shop2.cz/- 2004-01-20 09:01:10213.151.91.186 87ee02ddcff…7655bb9ehttp://www.shop2.cz/ct/?c=148http://www.shop2.cz/ Data pre-processing Data Structuration SessionID a single visit on each shop Towards the notion of user’s intersite visit: we group such SessionIDs that belongs to a single user (same IP) into a « Group of SessionIDs ». We compare the Referer with the URLs previously accessed (in a reasonable time window) 522,,410 SessionIDs into 397,629 Groups, equivalent to a 23.88% reduction; Data fusion, data cleaning

7 Relational DB model Data summarisation

8 Fig. 1. Visits per days and hours: (a) globally, (b) multi-shop Data pre-processing Low number of new visits on Saturdays and Sundays during the lunch time The high number of new visits on Tuesdays and Wednesdays Same results a) and b)

9 Crossed Clustering Aproach for Time Periods/Product Analysis Data: Selection of ls pages in shop 4 (the most used) Method developed by Yves Lechevallier & Rosanna Verde (2003,2004)

10 Crossed Clustering Aproach for Time Periods/Product Analysis Relational BD model : We add easily a crossed table Line: an individual (weekday, one hour) 7 days X 24 hours = 168 individuals Column: a multi-categorical variable representing the number of products requested by users into the specific time slice Method developed by Yves Lechevallier & Rosanna Verde (2003,2004)

11 Crossed Clustering Aproach for Time Periods/Product Analysis Table 4. Quantity of products requested by weekday x hour and registered on shop 4 Weekday x HourProduct (number of requests) Monday_0 Built-in electric hobs (10), Built-in dish washers 60cm (64), Corner single sinks (50),... Monday_1 Free standing combi refrigerators (44), Corner single sinks (50), Built-in hoods (60),... … … Sunday_22 Built-in microwave ovens (27), Built-in dish washers 45cm (38), Built-in dish washers 60cm (85),... Sunday_23 Built-in freezers (56), Kitchen taps with shower (45), Garbage disposers (32),...

12 Crossed Clustering Aproach for Time Period/Product Analysis Table 5. Confusion table Product_1Product _2Product _3Product _4Product _5Total Period_ 12847508432842265247115951 Period_ 21130531492129511895961067253 Period _3331075565236699534520370151173 Period _4226824632230200516527659132028 Period _5957620477197212339755159664 Period _61783351525493921124019479 Period _7150191429786081397601445335 Total963191768391140121879884915490883 57,7%

13 Crossed Clustering Aproach for Time Period/Product Analysis Example of one surprising result: the class Product 5 is defined by one type of products « Free standing combi refrigerators » consulted predominantly on Fridays from 17:00 to 20:00 (class period 6) 57,7% of such a product type requested on this period

14 Conclusions 1. Intersite Data Pre-Processing - structuration into user’s intersite visits « Group of SessionIDs » - first statistical Intersite analysis - anomalies and recommandations for the dataset 2. Crossed Clustering Approach - first application of such a method on time periods of Web logs and in e-commerce domain - promising results

15 Data pre-processing Inconsistency problems: - table kategorie: found repeated entries and different entries with same ID - for some page types (dt, df) the given parameter represented actually a specific product, not the given product description (from products table). - extra parameters equivalent to the give ones for some page types: i.e. for ct page type, id is equivalent to the given c parameter - missing values (descriptions) in tables: 3 values in product table and 64 in category table - multiple site SessionIDs: 13 cross-server visits had same SessionID on the visited sites (up to 4 sites); SessionID should change on each new site; - multiple IP SessionIDs: 3690 visits (SessionIDs) were done from more than one IP (anonymization proxies ?).


Download ppt "Benefits of InterSite Pre-Processing and Clustering Methods in E-Commerce Domain Sergiu Chelcea, Alzennyr Da Silva, Yves Lechevallier, Doru Tanasa, Brigitte."

Similar presentations


Ads by Google