Presentation is loading. Please wait.

Presentation is loading. Please wait.

Searching or the challenge of the Google Myth

Similar presentations

Presentation on theme: "Searching or the challenge of the Google Myth"— Presentation transcript:

1 Searching or the challenge of the Google Myth
International Output Database and Marketing Conference Den Hauge 5-9 September 2005 Jesper Ellemose Jensen Statistics Denmark,

2 The Google Challenge I can not Google it. Therefore the information does not exists From finding web sites to locating specific documents Are our own search engines still valid then? Google is to a large extend the perfect brand. A large number of our users believe that if they can not Google the information then they are absolutely positive sure that the information does not exist. A senior manager at one point asked the question: Why don’t we just use Google on our site. And the answer to that question is that: We have no control over the way Google works. But it is a very relevant question. In the beginning of the Internet the job of the search engines and internet indexes like Yahoo was to point potential users to our websites today the major search engines are not used for finding the front page of our web sites but for locating information deep inside our information structure. But the problem is that we need a lot of control if we also wants to control what the proper search result is. Just because a search phrase appears deep inside a PDF documents there is no garantie that the documents has meaning for any significant amount of users. 11/14/2018

3 User appreciation of the search engine on
Question: How do you rank the search engine on Searching is a very significant part of all web sites. So in our yearly user satisfaction surveys we have included questions about how the users rank or judge the search engine on our website. More than 30 percent of all users thinks that the search engine on is Good or very good. This is also an example of how you can efficiently lie or manipulate statistics – as the evaluation of the search engine changes enormously with you understanding of the word “Average” . Maybe we should treat it Note that we unfortunately still works with an aggregated output database and the web site proper. This double entry or two string dissemination strategy generates a lot of confusion for our users. 11/14/2018

4 User appreciation of the search engine on
Question:How do you judge the result, when you search by entering data into the search field in the All data in percentages In the we have improve especially in relation to Registered users. If you are a frequent user of the Statbank (farmer or Miner) you are more or less forced to register as it will give you access to large retrivals. 11/14/2018

5 Continues improvements needed
DST.DK is judged equally in all 3 years STATBANK.DK has improved for registered users – but not for non-registered users (tourists) We are confident that we have improved but user expectations have also grown => Ever ongoing process => Number of zero searches As can be seen from the various user satisfaction / user appreciation surveys we have not been extremely success full in meeting users demands / expectations. DST.DK is more or less status quo and for the Statbank system we have improved the search engine for registered users (Farmers / Miners) but not for the non registered users who properly are the users who needs the search engine the most. So we have improved but unfortunately the users expectations to us has also grown in the same period, nearly creating a situation of Status Quo. So we must continuously strive to improve our search facilities. So we have tried to understand the problems our users have when they use our search engine and we have tried to figure out why they are not satisfied with our search engine 11/14/2018

6 Terms / phrases / words used in texts
In your structured and unstructured texts you have series of words terms, phrases that are more or less common for the way statistical offices express themselves. These terms comes from the International nomenclatures, or from our institutional cultures. The institutional cultures may be more or less formal ranging from socialization like “this is the way we usually phrases it ” to explicit communication strategies and policies dealing defining the proper institutional writing. The problem encountered by every search engine is that it must help the user with his or her vocabulary to get a meaning full result from a sometimes totally different vocabulary. 11/14/2018

7 Reasons for dissatisfaction
White wine but not red wine Nomenclatures confuse the searching The housing figure Daily language Something about Furniture Turnover, Employees, Prices, Consumption etc. etc. The strict use of nomenclatures can very often confuse users. Dailys language Something about. 11/14/2018

8 Reasons for dissatisfaction
It’s the: Web site structure Search engine It’s never because of: the users vocabulary we the statistical office don’t know formulation of the problem => Free text search will never work if the users does not understand the way we describe things A user encountering a zero result will conclude that something is rotten in the State of Statistics Denmark. Or more specifically with the information structure used by the site or with the search engine we use It is never because we don’t have the information, or even worse because we are not supposed to have the information. It is also certainly never because the user uses a word or phrase not used by the Statistical office. 11/14/2018

9 Reasons for dissatisfaction
Efficient searches assumes: The web site has the information The user has a clear concept of the necessary data How the problem they are working on can be described or solved using statistics => The challenge may be pedagogical and not technical Even though we more or less is the sole source for statistical information users very often search for information they thing we have / or should have. And some times this is information that we. From our search logs and from the telephone support we provide it is increasingly clear that a number of users mainly students but also professional has no real concept of how there problem can be describe using statistics. This leads to the conclusion that the problem / challenge we are facing is more pedagogical than technical in nature. 11/14/2018

10 Number of zero searches
In optimizing our search engine on the Statbank we have closely monitored the number of searches who returns zero results. Each month we look at the list of words who has not given any results. The theory or thinking behind this is that these words must have meaning or significance to our users. So maybe we can improve the search engine by adding these words to the list of synonyms or to the list of known spelling errors. 11/14/2018

11 Efforts to reduce the number of zero searches
Cross Platform searching Spell checking Index is spelled Indeks in Danish but a lot of users spell it in English Removals of endings en, et and ne -> import(s) , car(s) the car Synonyms Prognosis -> Projection Index figure -> Index Islam -> religious community We have done 3 major things to reduce the number of zero searches: There is no formal spell checking but we can see that there is a relative small number of words that are commonly misspelled by our user. Index is the most common word as users tend to spell it in English. Also users tend to search for things like constructionindex in one word. But the only relevant search would be something like Construction cost index for residential buildings 11/14/2018

12 Structured search or alphabetic lists
web site Structured search Yearbook Structured search or alphabetic lists News release Publications Search box and index A simple way of over coming the problem of different terms and phrases is to use structured navigation indexes. The Search is made entirely inside structured metadata. The user can access an alphabetic list of subjects. In this case it is also possible to follow a classification used by all Danish library's. In this example we have total control over which data is relevant when searching for the word “internet”. Tables in Statbank Declarations of content 11/14/2018

13 Structured search or alphabetic lists
works well but users can’t see it Norwegian strategy is much better Free text search A – Z search A simple way of over coming the problem of different terms and phrases is to use structured navigation indexes. The Search is made entirely inside structured metadata. The user can access an alphabetic list of subjects. In this case it is also possible to follow a classification used by all Danish library's. In this example we have total control over which data is relevant when searching for the word “internet”. Structured search 11/14/2018

14 Total traffic
The illustration shows the total number of page requests send to on a monthly basis. 11/14/2018

15 Crawler traffic As Google opens a new session for each link it encounters on your web page it is not possible for us to use the session based figures we normally prefer for internet usages statistics. Our webpage is in fact a perfect example of seasonal variations as students, journalists and business inteligence people all take time of in June, July and August and then again in December. Unfortunately crawler traffic does not follow these seasonal fluctuations creating a situation in the summertime were a large part of the traffic is made up of crawlers and bots. 11/14/2018

16 Google and Microsoft visit us a lot
As it can be seen from the illustration the Google bot and MSN.COM bot each generate about 5 to 8 percent of the total number of page requests forwarded to It is not possible to generate comparable figures for the as our statistics on the Bank is based on the number of displayed tables and not on page impressions. 11/14/2018

17 Can we improve our Google ratings?
Reserve the same scepticism for unsolicited about search engines as you do for "burn fat at night" diet pills or requests to help transfer funds from deposed dictators source: Comply to WAI standards <Title></Title> <Meta tags> Dublin core Visible links – proper names Use the terms and concepts of you users in you internet texts If Google and its results are paramount to a large part of our users the next logical question is can we do anything to make sure that our user actually gets the proper results when using Google. And the answer is yes and no. We can of course by the proper ad words but for some inexplicable reasons I have great ideological problems with this solution. All links should be available to the bot. This means that you still should avoid navigation based on images and flash If the search term is in the title tag you will score very high Google. Meta tags are most likely ignored or play an insignificant role. There could still be problems with flash and java elements which can not be seen by the bot. Today you should expect the bot and the search index behind it to have so much intelligence that all types of “creative” behaviour will be counter productive in the long run. At the moment the best strategy is most likely to be as complaint to the W3.ORG Web Accessability Initiative Guidelines as you can possibly be. 11/14/2018

18 Can find every thing Gives the best results Best user interface
The Google Myths Can find every thing Gives the best results Best user interface Users expect our search engines to work not like Google. But to work the way they think Google works. And that is the challenge. 11/14/2018

Download ppt "Searching or the challenge of the Google Myth"

Similar presentations

Ads by Google