© Copyright , Blue Martini Software. San Mateo California, USA 1 1 Integrating E-Commerce and Data Mining: Architecture and Challenges Llew Mason Joint work with Suhail Ansari, Ron Kohavi, Zijian Zheng Blue Martini Software WEB-KDD Workshop August, 2000
© Copyright , Blue Martini Software. San Mateo California, USA 2 2 Outline ò E-Commerce: A Killer Domain ò Integrated Architecture ò Data Collection ò Analysis ò Challenges ò Summary
© Copyright , Blue Martini Software. San Mateo California, USA 3 3 Killer Domain E-Commerce ò Data records are plentiful ò Electronic collection provides reliable data ò Enables closed-loop analysis ò Insight can easily be turned into action ò Success can be directly measured e.g., Return on investment (ROI)
© Copyright , Blue Martini Software. San Mateo California, USA 4 4 Business Data Definition Customer Interaction Analysis Integrated Architecture Stage Data Deploy Results Build Data Warehouse
© Copyright , Blue Martini Software. San Mateo California, USA 5 5 Business Data Definition Customer Interaction Analysis Integrated Architecture Stage Data Deploy Results Build Data Warehouse Business facing Products, content Attributes Shared meta-data
© Copyright , Blue Martini Software. San Mateo California, USA 6 6 Business Data Definition Customer Interaction Analysis Integrated Architecture Stage Data Deploy Results Build Data Warehouse Build store Test before production Transform for efficiency Zero down-time
© Copyright , Blue Martini Software. San Mateo California, USA 7 7 Business Data Definition Customer Interaction Analysis Integrated Architecture Stage Data Deploy Results Build Data Warehouse Customer facing Multiple Touchpoints Integrated Data Collection
© Copyright , Blue Martini Software. San Mateo California, USA 8 8 Business Data Definition Customer Interaction Analysis Integrated Architecture Stage Data Deploy Results Build Data Warehouse Build warehouse Automated using meta-data Reduces pre-processing Transform for analysis
© Copyright , Blue Martini Software. San Mateo California, USA 9 9 Business Data Definition Customer Interaction Analysis Integrated Architecture Stage Data Deploy Results Build Data Warehouse Analysis Data transformations Exploration Modeling
© Copyright , Blue Martini Software. San Mateo California, USA 10 Business Data Definition Customer Interaction Analysis Integrated Architecture Stage Data Deploy Results Build Data Warehouse Close the loop Transfer scores, models Personalize
© Copyright , Blue Martini Software. San Mateo California, USA 11 Clickstream Logging ò Web server logs ò Logs every HTTP request - filtering required ò Stateless - must identify users and sessions ò Captures URLs - must map to content ò Can’t understand dynamic content ò Packet sniffers ò Streaming data - must parse to understand content ò Can’t understand encrypted data (SSL) ò Solution : Application server logging
© Copyright , Blue Martini Software. San Mateo California, USA 12 Beyond Clickstream Logging ò Business Event Logging Consider several requests as one logical event ò Add or remove from shopping cart ò Initiate or finalize checkout ò Search ò Register ò Personalization rule evaluation ò Provides business insight ò Difficult to log outside of application server
© Copyright , Blue Martini Software. San Mateo California, USA 13 Aggregation ò Data occurs at multiple granularities Customers Sessions Requests Finer Granularity ò Many interesting attributes need to be aggregated for analysis Customers Orders Cities
© Copyright , Blue Martini Software. San Mateo California, USA 14 Aggregation ò Interesting customer attributes ò What wallet share did each customer spend on books? ò How much is each female customer’s average order amount above the mean value for female customers? ò What is the total amount of each customer’s five most recent purchases over $30? ò What is the frequency of each customer’s purchases? ò How long ago was each customer’s last purchase?
© Copyright , Blue Martini Software. San Mateo California, USA 15 Hierarchies Products ClothingBooks MensWomens 2 Product ID 1$12TF Quantity Price Clothing/Mens Clothing/Womens ò E-Commerce data contains many hierarchies ò How can we use them in analysis? F Books
© Copyright , Blue Martini Software. San Mateo California, USA 16 Analytical Tools ò Reporting ò Who are the top referrers by sales generated? ò What are the top abandoned products? ò What are the conversion rates for each product? ò OLAP ò How do sales vary over time in each geographic region? ò Modeling Algorithms ò What characterizes visitors that do not buy? ò What characterizes customers that prefer promotions? ò Which are the potential cross-sells and up-sells? ò Visualization
© Copyright , Blue Martini Software. San Mateo California, USA 17 E-Commerce Challenges ò Make data mining comprehensible ò Support multiple granularity levels ò Utilize hierarchies ò Support date and time types effectively ò Support external events and changing data ò Identify bots and crawlers ò Handle large amounts of data
© Copyright , Blue Martini Software. San Mateo California, USA 18 Summary ò Integrated E-Commerce and data mining enables effective closed-loop analysis ò Application server logging provides integrated data collection and reduces pre-processing ò Powerful data transformations and a broad suite of analysis techniques are needed ò There are many challenges ahead