Technology for E-commerce Helena Ahonen-Myka
In this part... n search tools n metadata n personalization n collaborative filtering n data mining
Search tools n the site has to be accessible n site architecture and navigation structure is important n … but some users prefer search n keep users on the site n usage can be monitored: useful knowledge about the users’ needs
Users’ preferences n search: 50% n navigation: 20% n mixed: the rest...
Search tools n Indexer: gathers the words from documents (HTML pages, local files, database records) and puts them into an index file n Search engine: accepts queries, locates the relevant pages in the index, and formats the results in an HTML page
Remote vs local search n search tool can reside in a different server, also in a remote location n indexing may take a lot of processing time, and the resulting index may need a lot of space n local software may be faster
Indexer n local: scans directories n web spider: an indexing robot begins at a given page, then follows the links and stores words of the pages n ’robots.txt’ file: which robots allowed n HTML meta elements:
Indexer n link structure should reach all the pages that should be indexed n non-text links (imagemaps etc.): robots may not be able to follow links -> provide also text links n frames: provide some navigational links to give a context, if the page is retrieved by a query
Search page n search forms are the user interface of the search engine n simple form: just a text field and a button n or a(n advanced) search page: boolean search, date ranges, subscopes...
Search results n the occurrences of the query terms are located from the index n the results are sorted according to their (assumed) relevance to the query n results page should have the same look-and-feel than the other pages on the site
Why searches fail? n empty searches: people just put the search button without giving any words n wrong scope: people think they are searching the entire web n vocabulary mismatch: terms are too specific, too general, just not used n spelling mistakes n query requirements not met
Why searches fail? n problems with query syntax: spaces, parentheses, etc. n capitalization and special characters: exact matches required n stopwords: some common words are not indexed n short words: short words are not indexed n numbers are not indexed
No-matches pages n answer pages to the user if the search does not return any matches n should have the same look-and-feel than the other pages + navigation aids + search again field n explanations why the search might have failed and what to do next
Some usability issues n web design: strong sense of structure and navigation support n some people do not like to search n people who search end up in some page: they should know where they are n people need to move around in the neighborhood n search should be available on every page
Some usability issues n scoped search: difficult for the users to understand what is the scope -> scope should be stated clearly, and a search to the entire site has to be offered easily n boolean search is difficult: ’cats and dogs’ vs ’cats or dogs’ -> ’or’ could be used in the query, ’and’ in the ordering
Metadata n often a search results in a long list of matches; many of them may be irrelevant n metadata can make the queries more powerful
HTML meta elements How to complete memo cover sheets <meta name=”copyright” content=”© 2000 Acme”.. <meta name=”keywords” content=”corporate, guidelines, cataloging”>
Metadata n RDF (Resource Description Framework): –Gives means to define metadata for XML and HTML documents –Give means to interchange it between different applications on the Web n Example: Dublin Core metadata –Contains 15 elements (title, creator, date…)
Dublin Core n Dublin Core Metadata Elements: Content: Title Subject Description Language Relation Coverage Intellectual Property: Creator Publisher Contributor RightsInstance: Date Type Format Identifier
Dublin Core in RDF <RDF:RDF> isPartOf isPartOf </RDF:RDF> n Dublin Core represented in RDF
Searching XML documents n structure of XML documents can be used to make more precise queries, e.g. find Albert Einstein in Author element only n problem: how the user specifies the structure
Searching XML documents n 1) The user specifies the hierarchy in the query: Einstein in Author n 2) The user makes a simple query, but the search engine presents the alternative contexts: Einstein can be in Author or in Street or in School
Using links n good site: many links into the site, particularly from other good sites n text surrounding the link describes (probably) what the target of the link is about n the knowledge above + the contents of the page itself are taken into account n e.g. Google (
Natural language queries n E.g. Ask Jeeves n questions and answers prepared by human editors n user’s query is mapped to the prepared queries
Personalization n goal: the right people receive the right information at the right time n but: people do not like to state complex queries, or initialize a service (like answering a questionaire) n user profiles have to be generated and stored, preferably automatically
User profiles n may contain data like: interests, geographical area, age n could be collected once, and shared with many services n trust of the user: the profile should only be used to offer better service, and only if the user wants to let some service to use it
Recommendations n users who bought this book also bought these books / liked these cd’s etc. n rating movies, tv programs, wines… n recommending paths on a site
Recommendations n based on the user’s former behavior and profile data n based on social (collaborative) filtering: what similar users liked
User’s former behavior n if used as the only source: the user never sees anything new n particularly a new user hardly gets any recommendations
Collaborative filtering n draws on the experiences of a population or community of users n the profile information of the target user is compared to the profiles of nearest- neighbor users n look for correlation between users in terms of their ratings: recommend items that are included in the neighbors profile but not in the target user’s profile
Collaborative filtering n Problems: n cannot recommend new items (some users have to rate an item before it can be recommended) n unusual user may not get (good) recommendations: no neighbors that are close enough
Matching engines n Apply one set of complex characteristics to another n e.g., recruiting sites: match a job seeker and a job
Data mining for e-commerce n users’ behavior on the web site provides a lot of information: n Which pages the users view? n Which paths the users navigate? n How long the users spend on the site? n What is the rate of viewing a product and purchasing it?
Data mining process n Gathering the data n Cleaning/preprocessing the data n Transforming the data n Analysis / finding general models n Interpreting the results n Using the knowledge
Data collection n clickstream logging: web server logs or packet sniffers n business event logging
Clickstream logging n web log: page requested, time of request, client HTTP address, etc. n lot of requests for images -> have to be filtered out n users and user sessions difficult to identify n requests for a page: the same page, but different dynamic content
Clickstream logging n more efficient at the application server layer n instead of just pages, knowledge on products n user and session tracking possible n also track of information absent in web server logs: pages that were aborted while being downloaded
Business event logging n looking at subsets of requests as one logical event or episode: n add/remove item to/from shopping cart n initiate/finish checkout n search (log keywords and nr of results) n register
From order data to customers n collected data is order-oriented n data for each customer is spread into many records n information on customers is the real target n information for each customer has to be aggregated
From order data to customers n What percentage of each customer’s orders used a VISA credit card? n How much money does each customer spend on books? n What is the frequency of each customer’s purchases?
Model generation n Answer questions like: n What characterizes heavy spenders? n What characterizes customers that prefer promotion X over Y? n What characterizes customers that buy quickly? n What characterizes visitors that do not buy?
Data mining tools n e.g., classification rules IF Income > $80,000 AND Age <= 30 AND Average Session Duration is between 10 AND 20 minutes THEN Heavy spender
Understanding the results n result of a data mining process may be difficult for a business user to understand: e.g. thousands of rules n visualization is important n tailored for a specific domain
Using the results n site structure can be updated n procedures like registering or checking- out can be simplified n metadata can be added to make search more efficient n personalization rules, recommendating systems