Technology for E-commerce Helena Ahonen-Myka. In this part... n search tools n metadata n personalization n collaborative filtering n data mining.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Web Usage Mining Web Usage Mining (Clickstream Analysis) Mark Levene (Follow the links to learn more!)
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
Google Chrome & Search C Chapter 18. Objectives 1.Use Google Chrome to navigate the Word Wide Web. 2.Manage bookmarks for web pages. 3.Perform basic keyword.
Advanced Searching Engineering Village.
Exploring the Deep Web Brunvand, Amy, Kate Holvoet, Peter Kraus, and David Morrison. "Exploring the Deep Web." PPT--Download University of Utah.
Web- and Multimedia-based Information Systems. Assessment Presentation Programming Assignment.
Information Retrieval in Practice
Chapter 12: Web Usage Mining - An introduction
Macromedia Dreamweaver 4 Foundation Level Course.
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
Mastering the Internet, XHTML, and JavaScript Chapter 7 Searching the Internet.
RDF Kitty Turner. Current Situation there is hardly any metadata on the Web search engine sites do the equivalent of going through a library, reading.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.
Web Usage Mining - W hat, W hy, ho W Presented by:Roopa Datla Jinguang Liu.
Recommender systems Ram Akella November 26 th 2008.
© Copyright , Blue Martini Software. San Mateo California, USA 1 1 Integrating E-Commerce and Data Mining: Architecture and Challenges Llew Mason.
Searching and Researching the World Wide: Emphasis on Christian Websites Developed from the book: Searching and Researching on the Internet and World Wide.
1 The World Wide Web. 2  Web Fundamentals  Pages are defined by the Hypertext Markup Language (HTML) and contain text, graphics, audio, video and software.
Overview of Search Engines
The Internet & The World Wide Web Notes
IBM User Technology March 2004 | Dynamic Navigation in DITA © 2004 IBM Corporation Dynamic Navigation in DITA Erik Hennum and Robert Anderson.
Chapter 10 Publishing and Maintaining Your Web Site.
Section 13.1 Add a hit counter to a Web page Identify the limitations of hit counters Describe the information gathered by tracking systems Create a guest.
An Introduction to Content Management. By the end of the session you will be able to... Explain what a content management system is Apply the principles.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
With Windows 7 Comprehensive© 2012 Pearson Education, Inc. Publishing as Prentice Hall1 PowerPoint Presentation to Accompany GO! with Windows 7 Comprehensive.
Databases C HAPTER Chapter 10: Databases2 Databases and Structured Fields  A database is a collection of information –Typically stored as computer.
1 © Netskills Quality Internet Training, University of Newcastle Metadata Explained © Netskills, Quality Internet Training.
Search Engines and Information Retrieval Chapter 1.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
Understanding the Web Site Development Process. Understanding the Web Site Development You need a good project plan Larger projects need a project manager.
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
ITIS 1210 Introduction to Web-Based Information Systems Chapter 27 How Internet Searching Works.
Chapter Chapter 3 Internet Agents. Chapter Contents Background Web Search Agents Information Filtering Agents Notification Agents Other Service.
Do's and don'ts to improve your site's ranking … Presentation by:
Chapter 9 Publishing and Maintaining Your Site. 2 Principles of Web Design Chapter 9 Objectives Understand the features of Internet Service Providers.
XP New Perspectives on The Internet, Sixth Edition— Comprehensive Tutorial 3 1 Searching the Web Using Search Engines and Directories Effectively Tutorial.
The Internet 8th Edition Tutorial 4 Searching the Web.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
4 1 SEARCHING THE WEB Using Search Engines and Directories Effectively New Perspectives on THE INTERNET.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
ITGS Databases.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
SEO Friendly Website Building a visually stunning website is not enough to ensure any success for your online presence.
User Modeling and Recommender Systems: Introduction to recommender systems Adolfo Ruiz Calleja 06/09/2014.
Search Engine Optimization Presented By:- ARKA Softwares Effective! Affordable! Time Groove
General Architecture of Retrieval Systems 1Adrienn Skrop.
Avi Rappoport, SearchTools.com InternetWorld NY 2001 Site Search That Doesn't Stink.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
Attributes and Values Describing Entities. Metadata At the most basic level, metadata is just another term for description, or information about an entity.
Search can be Your Best Friend You just Need to Know How to Talk to it IW 306 Ágnes Molnár.
Search Engine Optimization
Information Retrieval in Practice
Information Architecture
Search Engines and Search techniques
Search Engine Architecture
Internet Searching: Finding Quality Information
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
Web Mining Ref:
Databases.
Computer Literacy BASICS: A Comprehensive Guide to IC3, 3rd Edition
Attributes and Values Describing Entities.
Application of Dublin Core and XML/RDF standards in the KIKERES
Session 2: Metadata and Catalogues
Introduction to Information Retrieval
Presentation transcript:

Technology for E-commerce Helena Ahonen-Myka

In this part... n search tools n metadata n personalization n collaborative filtering n data mining

Search tools n the site has to be accessible n site architecture and navigation structure is important n … but some users prefer search n keep users on the site n usage can be monitored: useful knowledge about the users’ needs

Users’ preferences n search: 50% n navigation: 20% n mixed: the rest...

Search tools n Indexer: gathers the words from documents (HTML pages, local files, database records) and puts them into an index file n Search engine: accepts queries, locates the relevant pages in the index, and formats the results in an HTML page

Remote vs local search n search tool can reside in a different server, also in a remote location n indexing may take a lot of processing time, and the resulting index may need a lot of space n local software may be faster

Indexer n local: scans directories n web spider: an indexing robot begins at a given page, then follows the links and stores words of the pages n ’robots.txt’ file: which robots allowed n HTML meta elements:

Indexer n link structure should reach all the pages that should be indexed n non-text links (imagemaps etc.): robots may not be able to follow links -> provide also text links n frames: provide some navigational links to give a context, if the page is retrieved by a query

Search page n search forms are the user interface of the search engine n simple form: just a text field and a button n or a(n advanced) search page: boolean search, date ranges, subscopes...

Search results n the occurrences of the query terms are located from the index n the results are sorted according to their (assumed) relevance to the query n results page should have the same look-and-feel than the other pages on the site

Why searches fail? n empty searches: people just put the search button without giving any words n wrong scope: people think they are searching the entire web n vocabulary mismatch: terms are too specific, too general, just not used n spelling mistakes n query requirements not met

Why searches fail? n problems with query syntax: spaces, parentheses, etc. n capitalization and special characters: exact matches required n stopwords: some common words are not indexed n short words: short words are not indexed n numbers are not indexed

No-matches pages n answer pages to the user if the search does not return any matches n should have the same look-and-feel than the other pages + navigation aids + search again field n explanations why the search might have failed and what to do next

Some usability issues n web design: strong sense of structure and navigation support n some people do not like to search n people who search end up in some page: they should know where they are n people need to move around in the neighborhood n search should be available on every page

Some usability issues n scoped search: difficult for the users to understand what is the scope -> scope should be stated clearly, and a search to the entire site has to be offered easily n boolean search is difficult: ’cats and dogs’ vs ’cats or dogs’ -> ’or’ could be used in the query, ’and’ in the ordering

Metadata n often a search results in a long list of matches; many of them may be irrelevant n metadata can make the queries more powerful

HTML meta elements How to complete memo cover sheets <meta name=”copyright” content=”© 2000 Acme”.. <meta name=”keywords” content=”corporate, guidelines, cataloging”>

Metadata n RDF (Resource Description Framework): –Gives means to define metadata for XML and HTML documents –Give means to interchange it between different applications on the Web n Example: Dublin Core metadata –Contains 15 elements (title, creator, date…)

Dublin Core n Dublin Core Metadata Elements: Content: Title Subject Description Language Relation Coverage Intellectual Property: Creator Publisher Contributor RightsInstance: Date Type Format Identifier

Dublin Core in RDF <RDF:RDF> isPartOf isPartOf </RDF:RDF> n Dublin Core represented in RDF

Searching XML documents n structure of XML documents can be used to make more precise queries, e.g. find Albert Einstein in Author element only n problem: how the user specifies the structure

Searching XML documents n 1) The user specifies the hierarchy in the query: Einstein in Author n 2) The user makes a simple query, but the search engine presents the alternative contexts: Einstein can be in Author or in Street or in School

Using links n good site: many links into the site, particularly from other good sites n text surrounding the link describes (probably) what the target of the link is about n the knowledge above + the contents of the page itself are taken into account n e.g. Google (

Natural language queries n E.g. Ask Jeeves n questions and answers prepared by human editors n user’s query is mapped to the prepared queries

Personalization n goal: the right people receive the right information at the right time n but: people do not like to state complex queries, or initialize a service (like answering a questionaire) n user profiles have to be generated and stored, preferably automatically

User profiles n may contain data like: interests, geographical area, age n could be collected once, and shared with many services n trust of the user: the profile should only be used to offer better service, and only if the user wants to let some service to use it

Recommendations n users who bought this book also bought these books / liked these cd’s etc. n rating movies, tv programs, wines… n recommending paths on a site

Recommendations n based on the user’s former behavior and profile data n based on social (collaborative) filtering: what similar users liked

User’s former behavior n if used as the only source: the user never sees anything new n particularly a new user hardly gets any recommendations

Collaborative filtering n draws on the experiences of a population or community of users n the profile information of the target user is compared to the profiles of nearest- neighbor users n look for correlation between users in terms of their ratings: recommend items that are included in the neighbors profile but not in the target user’s profile

Collaborative filtering n Problems: n cannot recommend new items (some users have to rate an item before it can be recommended) n unusual user may not get (good) recommendations: no neighbors that are close enough

Matching engines n Apply one set of complex characteristics to another n e.g., recruiting sites: match a job seeker and a job

Data mining for e-commerce n users’ behavior on the web site provides a lot of information: n Which pages the users view? n Which paths the users navigate? n How long the users spend on the site? n What is the rate of viewing a product and purchasing it?

Data mining process n Gathering the data n Cleaning/preprocessing the data n Transforming the data n Analysis / finding general models n Interpreting the results n Using the knowledge

Data collection n clickstream logging: web server logs or packet sniffers n business event logging

Clickstream logging n web log: page requested, time of request, client HTTP address, etc. n lot of requests for images -> have to be filtered out n users and user sessions difficult to identify n requests for a page: the same page, but different dynamic content

Clickstream logging n more efficient at the application server layer n instead of just pages, knowledge on products n user and session tracking possible n also track of information absent in web server logs: pages that were aborted while being downloaded

Business event logging n looking at subsets of requests as one logical event or episode: n add/remove item to/from shopping cart n initiate/finish checkout n search (log keywords and nr of results) n register

From order data to customers n collected data is order-oriented n data for each customer is spread into many records n information on customers is the real target n information for each customer has to be aggregated

From order data to customers n What percentage of each customer’s orders used a VISA credit card? n How much money does each customer spend on books? n What is the frequency of each customer’s purchases?

Model generation n Answer questions like: n What characterizes heavy spenders? n What characterizes customers that prefer promotion X over Y? n What characterizes customers that buy quickly? n What characterizes visitors that do not buy?

Data mining tools n e.g., classification rules IF Income > $80,000 AND Age <= 30 AND Average Session Duration is between 10 AND 20 minutes THEN Heavy spender

Understanding the results n result of a data mining process may be difficult for a business user to understand: e.g. thousands of rules n visualization is important n tailored for a specific domain

Using the results n site structure can be updated n procedures like registering or checking- out can be simplified n metadata can be added to make search more efficient n personalization rules, recommendating systems