Download presentation
Presentation is loading. Please wait.
Published byAbraham Harvey Modified over 9 years ago
1
WHAT AND HOW CHILDREN SEARCH ON THE WEB Sergio Duarte Torres, Ingmar Weber
6
Motivation
8
Goals of this work Identify and quantify search struggle of young users Retrace stages of child development through their web searches
9
What data was used? US Yahoo! search logs from May to August of 2010 Cleaning steps: User wise: Logs from users without Yahoo! accounts were removed Query wise: Queries issued by a single user were removed Queries with personally identifiable information Non alpha-numerical single token queries Why the cleaning? What could be advantages/disadvantages?
10
An aside about the data Users under 13 years old required the consent of an responsible adult to register at Yahoo! (costs $.50) Some people may lie about their age… General trends are expected to be robust to noise People may lie about their age but … usually they tend to make themselves appear older Where do you think millions of children lie about their age? http://www.uic.edu/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/3850/3075
11
Data segmentation Users grouped based on their reported birth year Age estimated as: 2010 – Birth year Following age buckets were created: 6-7: early elementary 8-9: readers 10-12: advance readers 13-15: teenagers 16-18 : mature teenagers >18: grown ups
12
Data characteristics Data set size Below 10 years oldAbove 10 years old Volume of queries>100K>1M Number of users>10K>100K
13
Methodology: Micro- vs. Macro-Averages User A: 100x cooking 10x science User B: 1x cooking 5x science User C: 2x cooking 10x science Micro avg.: cooking = (100+1+2)/(100+10+1+5+2+10) = 0.80 Macro avg.: cooking = (100/110 + 1/6 + 2/12) / 3 = 0.41 People search mostly for cooking. True? False?
14
Methodology: Detecting Navigational Queries facebook, yahoo mail, google,... How would you do it? Editorial judgments Ask human judges to mark queries a navigational Drawbacks? Click entropy Look at the diversity of the results clicked in response Drawbacks? String similarity heuristics Try to find query as substring in clicked domain Drawbacks?
15
Search Difficulty Outline 1. Query length 2. Natural language usage 3. Click position bias 4. Other signs of click position bias 5. Children expose to adult content 6. Time spent on web results 7. Sessions characteristics
16
Query length Increasing query length through the age groups Slightly bigger gap for non-navigational queries Greater ambiguity in children queries
17
Natural language usage (I) Questions instead of queries what is the only immortal animal? Modal queries I don’t want to go to school Factual queries describe the parts of a cell Superlative queries the fastest dog Targeted queries for kids car photos for kids
18
Natural language usage (II) Greater NL usage at younger ages Teenagers behavior closer to children than adults behavior
19
Click position bias Other explanations?
20
Clicks on ads Children aged 6-9 more likely to click on ads! Evidence of disorientation during the search process
21
How to evaluate search success using click data? How would you do it?
22
Time spent on web results Click duration as a signal of search success. Hassan et al (2010) WSDM ‘10 Short click (0-10 secs): Unsuccessful click Long click (≥ 100 secs): Successful click
23
Children exposed to adult content Likelihood of accidental click on adult content: Click on adult content is short and the action is immediately reverted by a click on a non-adult content
24
Sessions characteristics (I) Shorter sessions in young users Jump to adulthood also occurs in the group of users from 19 to 25
25
Sessions characteristics (II) Query refinding c q q’ q What do refinding queries indicate?
26
Sessions characteristics (III) Click refinding q c c’ c
27
Sessions characteristics (IV) Shorter sessions?
28
Tracing children development on the web: Outline 1. What do children search for? 2. What entities are children interested in? 3. Does the reading level of the clicks varies across ages and education?
29
Classifying queries into topics
30
“sigir 2011”? computers_and_internet/programming_and_development Classifying queries into topics
31
What do children search for? Children and teenager groups have few dominant topics Adults have more diverse query topics Also due to smaller vocabulary
32
Gender differences (I) Which topic is most responsible for gender differences?
33
Gender differences (II)
34
What entities are children interested in? Queries mapped to Wikipedia entities using site search on wikipedia.org/wiki QueryEntity facebook, facebook loginen.wikipedia.org/wiki/Facebook back to school clothes, london schol uniforms en.wikipedia.org/wiki/School_uniform Hummus recipe, ideal proteinen.wikipedia.org/wiki/Hummus How to map web queries to Wikipedia pages?
35
What entities are children interested in? (10-12)
36
What entities are adults interested in? (40+)
37
What entities are children interested in? Greater used of child oriented entities at young ages
38
Does the reading level of the clicks varies across ages? Based on Google reading level classification 70% (kids) vs 50% (adults) of clicks classified as basic
39
Does the reading level of the clicks vary across ages? (II) Reading level also varies according to education level Education level of adults according to US census CIKM 2011. Glasgow, 26 of October
40
Gender: Male Birth year: 1978 ZIP code: 95054 cheap holidays Expected income: $ 31k Expected education: 45% BA Race distribution: 38% w, 47% A Label (Q,D) with $31k, 45%BA,... Q D US Census Data factfinder.census.gov Getting demographics from US census
41
Conclusions Clear behavioral differences between children and adults Although not clean between teenagers and children Sudden jump to adulthood from 19 to 25 years old Stronger position click biased for children, including ads Assistance of question queries Understanding concerns expressed in their queries
42
THANK YOU FOR YOUR ATTENTION
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.