Download presentation
Presentation is loading. Please wait.
1
© Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 1 1 Integrating E-Commerce and Data Mining: Architecture and Challenges Llew Mason lmason@bluemartini.com Joint work with Suhail Ansari, Ron Kohavi, Zijian Zheng Blue Martini Software WEB-KDD Workshop August, 2000
2
© Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 2 2 Outline ò E-Commerce: A Killer Domain ò Integrated Architecture ò Data Collection ò Analysis ò Challenges ò Summary
3
© Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 3 3 Killer Domain E-Commerce ò Data records are plentiful ò Electronic collection provides reliable data ò Enables closed-loop analysis ò Insight can easily be turned into action ò Success can be directly measured e.g., Return on investment (ROI)
4
© Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 4 4 Business Data Definition Customer Interaction Analysis Integrated Architecture Stage Data Deploy Results Build Data Warehouse
5
© Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 5 5 Business Data Definition Customer Interaction Analysis Integrated Architecture Stage Data Deploy Results Build Data Warehouse Business facing Products, content Attributes Shared meta-data
6
© Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 6 6 Business Data Definition Customer Interaction Analysis Integrated Architecture Stage Data Deploy Results Build Data Warehouse Build store Test before production Transform for efficiency Zero down-time
7
© Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 7 7 Business Data Definition Customer Interaction Analysis Integrated Architecture Stage Data Deploy Results Build Data Warehouse Customer facing Multiple Touchpoints Integrated Data Collection
8
© Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 8 8 Business Data Definition Customer Interaction Analysis Integrated Architecture Stage Data Deploy Results Build Data Warehouse Build warehouse Automated using meta-data Reduces pre-processing Transform for analysis
9
© Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 9 9 Business Data Definition Customer Interaction Analysis Integrated Architecture Stage Data Deploy Results Build Data Warehouse Analysis Data transformations Exploration Modeling
10
© Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 10 Business Data Definition Customer Interaction Analysis Integrated Architecture Stage Data Deploy Results Build Data Warehouse Close the loop Transfer scores, models Personalize
11
© Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 11 Clickstream Logging ò Web server logs ò Logs every HTTP request - filtering required ò Stateless - must identify users and sessions ò Captures URLs - must map to content ò Can’t understand dynamic content ò Packet sniffers ò Streaming data - must parse to understand content ò Can’t understand encrypted data (SSL) ò Solution : Application server logging
12
© Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 12 Beyond Clickstream Logging ò Business Event Logging Consider several requests as one logical event ò Add or remove from shopping cart ò Initiate or finalize checkout ò Search ò Register ò Personalization rule evaluation ò Provides business insight ò Difficult to log outside of application server
13
© Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 13 Aggregation ò Data occurs at multiple granularities Customers Sessions Requests Finer Granularity ò Many interesting attributes need to be aggregated for analysis Customers Orders Cities
14
© Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 14 Aggregation ò Interesting customer attributes ò What wallet share did each customer spend on books? ò How much is each female customer’s average order amount above the mean value for female customers? ò What is the total amount of each customer’s five most recent purchases over $30? ò What is the frequency of each customer’s purchases? ò How long ago was each customer’s last purchase?
15
© Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 15 Hierarchies Products ClothingBooks MensWomens 2 Product ID 1$12TF Quantity Price Clothing/Mens Clothing/Womens ò E-Commerce data contains many hierarchies ò How can we use them in analysis? F Books
16
© Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 16 Analytical Tools ò Reporting ò Who are the top referrers by sales generated? ò What are the top abandoned products? ò What are the conversion rates for each product? ò OLAP ò How do sales vary over time in each geographic region? ò Modeling Algorithms ò What characterizes visitors that do not buy? ò What characterizes customers that prefer promotions? ò Which are the potential cross-sells and up-sells? ò Visualization
17
© Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 17 E-Commerce Challenges ò Make data mining comprehensible ò Support multiple granularity levels ò Utilize hierarchies ò Support date and time types effectively ò Support external events and changing data ò Identify bots and crawlers ò Handle large amounts of data
18
© Copyright 1998-2000, Blue Martini Software. San Mateo California, USA 18 Summary ò Integrated E-Commerce and data mining enables effective closed-loop analysis ò Application server logging provides integrated data collection and reduces pre-processing ò Powerful data transformations and a broad suite of analysis techniques are needed ò There are many challenges ahead
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.