Understanding the Content Index. Review: The Search Engine.

Slides:



Advertisements
Similar presentations
Basic Web Design UVICELL Week 5 Choosing a Domain Name, Hosting and Marketing Your Web Site Week 5 Choosing a Domain Name, Hosting and Marketing Your Web.
Advertisements

Information Retrieval in Practice
CS345 Data Mining Web Spam Detection. Economic considerations  Search has become the default gateway to the web  Very high premium to appear on the.
Parametric search and zone weighting Lecture 6. Recap of lecture 4 Query expansion Index construction.
IS 360 Web Promotion. Slide 2 Overview How to attract visitors.
Search engines fdm 20c introduction to digital media lecture warren sack / film & digital media department / university of california, santa.
Search Engine Optimization (SEO)
Search Engine Optimization. What is SEO? Search engine optimization (SEO) is the process of improving the visibility of a website or a web page in search.
IDK0040 Võrgurakendused I Building a site: Publicising Deniss Kumlander.
“ The Initiative's focus is to dramatically advance the means to collect,store,and organize information in digital forms,and make it available for searching,retrieval,and.
Increasing Website ROI through SEO and Analytics Dan Belhassen greatBIGnews.com Modern Earth Inc.
On-Site Strategies for Optimizing Your Local Business. sunclouddesign.com/talks.
SEO & Content Marketing | April 2015 bradforster.org Winning at SEO & Content Marketing.
SEO for Web Designers By Alfredo Palconit, Jr.. I. What is SEO? A process of improving a site’s traffic and rank from organic search engine results. Notes:
Search Engine Optimization (SEO) Week 07 Dynamic Web TCNJ Jean Chu.
Adding metadata to web pages Please note: this is a temporary test document for use in internal testing only.
Slide 1 Today you will: think about criteria for judging a website understand that an effective website will match the needs and interests of users use.
Search Engine Optimization: Understanding the Engines & Building Successful Sites Zohaib Ahmed Google Analytics Individual Qualified March 2012.
Strengths: SEO – Moderate Page Placement Inbound Links: 11 Onsite Lead Generation Mobile Optimization Onsite Blogging -API To Social Sites - Facebook,
Click to add your Name and Class Delete this box & replace with a suitable picture File menu >Insert > Picture Tip : Want to change the design template?
Introduction to SEO August 2011 NowSourcing, Inc..
Search Engine optimization.  Search engine optimization (SEO) is the process of affecting the visibility of a website or a web page in a search engine's.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
WHAT IS A SEARCH ENGINE A search engine is not a physical engine, instead its an electronic code or a software programme that searches and indexes millions.
Guerrilla Marketing Tactics Building a proper web Presence March 24, 2010 Session 3.
Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei.
Search Engine Rank Placement (SERP) Search Engine Optimization (SEO) Search Engine Marketing (SEM) Search Engines & Webmaster Tools Automated Submissions.
Web Searching. How does a search engine work? It does NOT search the Web (when you make a query) It contains a database with info on numerous Web sites.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
استاد : مهندس حسین پور ارائه دهنده : احسان جوانمرد Google Architecture.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Search Engines Reyhaneh Salkhi Outline What is a search engine? How do search engines work? Which search engines are most useful and efficient? How can.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
Vector Spaces RANK © 2016 Pearson Education, Inc..
Understanding Search Engines. Basic Defintions: Search Engine Search engines are information retrieval (IR) systems designed to help find specific information.
A Brief Digression on Search Engine Optimization (SEO)
Big Builder 2009 Introduction to SEO. Why is the Internet Important Around 90% of all home buyers search the internet BEFORE the walk in your sales center!
Searching the World Wide Web: Meta Crawlers vs. Single Search Engines By: Voris Tejada.
Understanding Google’s PageRank™ 1. Review: The Search Engine 2.
SEO Friendly Website Building a visually stunning website is not enough to ensure any success for your online presence.
Search Engine Know- How: How To Optimize Your Content, Navigation Pages, & Documents For Search Engines.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Search Engine Optimization (SEO)  Some simple HINTS & TIPS for the Beginner.
What is Seo? Search Engine Optimization for Dummies.
Week 1 Introduction to Search Engine Optimization.
SEO and SEA Search engine optimization and Search engine advertising Wesley Lacroix IBK.
Search Engine Optimization Presented By:- ARKA Softwares Effective! Affordable! Time Groove
Think Digital, Think Ally Digital Media 1of19 SEO Press Release Strategy 2015.
The Anatomy of a Large-Scale Hypertextual Web Search Engine (The creation of Google)
Search Engine Optimization
Information Retrieval in Practice
SEARCH ENGINE OPTIMIZATION.
How do Web Applications Work?
Always Check Content before Submission on Online Platforms One of the most important tips before submitting any online post is to check the content.
Searching for Information
Search Engine Optimization (SEO)
Search Engine Optimization
Search Engines and Search techniques
Methods and Apparatus for Ranking Web Page Search Results
BTEC NCF Dip in Comp - Unit 15 Website Development Lesson 04 – Search Engine Optimisation Mr C Johnston.
By Tommy Koh – SEO GEEK PTE LTD
UVU Annual Web Audit, SEO and Accessibility Coordination
Strategic Internet Marketing & Search Engine Optimization May 25, 2006
Objective % Explain concepts used to create websites.
Aidan Hogan CC Procesamiento Masivo de Datos Otoño 2018 Lecture 7 Information Retrieval: Ranking Aidan Hogan
يقول رسول الله صلى الله عليه وسلم ”انما الاعمال بالنيات وانما لكل امرىء ما نوى فمن كانت هجرته الى الله ورسوله فهجرته الى الله ورسوله ومن كانت هجرته الى.
Guerrilla Marketing Tactics
Search Engine Optimization (SEO)
Improve SEO Of Your Website By This 5 Ways When you search about how to make your SEO right, then you can find millions of results about this all. If.
Top 10 Ways to Boost Your On-Page SEO in Did you know that 93% of internet experiences begin with a search engine? The top result on Google has.
Presentation transcript:

Understanding the Content Index

Review: The Search Engine

Review: Content Indices Content indices store pages in compressed form; inverted files. A basic content index has terms, identification numbers (i.d.s), and occurrences. 1 (review) 1 2 (content) 2, 4, 15 3 (indices) 3, 5, 16

Review: Content Indices Content indices are the raw material needed to conduct a relevant search. A valuable search ranks the relevant material ensuring you get the best of the available information. To get the best possible search results, you have to combine many page measurements.

Types of indexes A single page can be processed to derive a number of indices: Content indices (Semantics) –Text, data, metadata Structural indices (Structures) –Tags, links, parent-child relationships File indices (Supports) –Availability & relationship to NMP files (pdf, mov, avi, etc)

Re-focus on the content index (ci) A simple inverted file allows you to claim that you have “indexed the web.” In 1998 Alta Vista had the largest web index (repository of page indices). Google surpassed AV In mid 2001, AllTheWeb surpassed Google—for about 3 months. Google took the lead in late 2001, and has been the leader in “pages indexed” ever since. So what? Number of pages indexed is a measure, not a business. You have to have the ability to process those indices in a meaningful way to create value.

Refine the CI The simple inverted file index is insufficient; it doesn’t capture all the available information in a web page, web site, and internet domain. Consider the example below: 1 (review) 1 2 (content) 2, 4, 15 3 (indices) 3, 5, 16 What about the relationship of this page index to the other pages in the site? What about the structure of the page? Is there information that can refine our understanding of the quality of the content?

The CI in detail, continued We can refine the quality of the page CI by replacing simple occurrences with vector space measurements. What makes sense? How should we structure the vector measure? First, look at page structure.

The CI in detail, continued Understanding the Content Index Scores [Text from each slide]

Replace Occurrence with a Vector The vector is developed using a heuristic; a set of rules. Different people might use different heuristics to obtain a valuable measure. Those differences create value differentiations. One heuristic might be to create a vector to measure occurrence in the TITLE, META DESCRIPTION, and BODY tags.

CMS matrix Replace Occurrence with a Vector (2) 1 (review) 1 2 (content) 2, 4, 15 3 (indices) 3, 5, 16 Becomes 1 (review) 2 [1,1,0] 2 (content) 1[1,1,0] 3 [1,1,4], 4 [1,1,2], 5 [0,1,1] 3 (indices) 1 [1,1,0], 3 [1,1,3], 4 [1,1,1], 5 [1,1,4]

What Does a Vector Do? You can now calculate a “content score” that subdivides the set of available information into those that satisfy the search parameters, and which inherently ranks pages in relation to each other. Examining the previous index: 1 (review) 2 [1,1,0] 2 (content) 1[1,1,0] 3 [1,1,4], 4 [1,1,2], 5 [0,1,1] 3 (indices) 1 [1,1,0], 3 [1,1,3], 4 [1,1,1], 5 [1,1,4] Suppose you search for “content indices”:

Calculate the score for “content indices”: 2 (content) 1[1,1,0] 3 [1,1,4], 4 [1,1,2], 5 [0,1,1] 3 (indices) 1 [1,1,0], 3 [1,1,3], 4 [1,1,1], 5 [1,1,4] A basic content score heuristic takes the sum of each vector and multiplies it by the summed vector of the partner term in each page. Page 1 Content Score for S=(1+1+0) X (1+1+0) = 4 Page 3 Content Score for S=(1+1+4) X (1+1+3) = 30 Page 4 Content Score for S=(1+1+2) X (1+1+1) = 12 Page 5 Content Score for S=(0+1+1) X (1+1+4) = 12

Caveats & Observations Obviously, different search terms will result in a different content scores. Therefore, indices must find a balance between accuracy and economy (i.e. speed). The CS vector is an evolving construct. The color example illustrates this lesson. The relevant calculation heuristics are also evolving. This is the basis of the “arms race” between Search Engines (Google) and marketers (spammers). The dimensional limit to the vector is a fertile area for research, especially in the field of SEO. Content Scores are factored with other index measures to yield the final SERP for a given search term or string. In other words, this is still only a piece of the puzzle of Excellent Content Management.

Discussion: What are the appropriate vector dimensions, and what are the ramifications for content? Title, meta, body... What else? What direct measures, what derivatives? Research Question: How should pictures be factored into a page’s content score?