Data and Information Systems Laboratory University of Illinois Urbana-Champaign CCICADA 2012 Meeting March 30, 2012 Web Taxonomies Discovering the Structure of Information Tim Weninger Department of Computer Science University of Illinois Urbana-Champaign, Urbana, IL
Data and Information Systems Laboratory University of Illinois Urbana-Champaign CCICADA 2012 Meeting March 30, 2012 Information wants to be free World Wide Web is decentralized and messy. ›(but it wants to be structured) Taxonomies are used to describe hierarchical structure of data ›Almost always hand crafted Data is made (forced) to fit the taxonomy Information wants to be free!
Data and Information Systems Laboratory University of Illinois Urbana-Champaign CCICADA 2012 Meeting March 30, 2012 Information wants structure Just like political science… in data science… There is no such thing as digital anarchy ›Government will always rise Data democracy ›Let the data decide its own form government
Data and Information Systems Laboratory University of Illinois Urbana-Champaign CCICADA 2012 Meeting March 30, 2012 Let’s discover a taxonomy of a Web site
Data and Information Systems Laboratory University of Illinois Urbana-Champaign CCICADA 2012 Meeting March 30, 2012 Web Graph Web Tree – is a really hard problem How do we traverse the graph? ›BFS ›DFS ›MST ›With Replacement ›Without Replacement ›All links ›Some links
Data and Information Systems Laboratory University of Illinois Urbana-Champaign CCICADA 2012 Meeting March 30, 2012 Web Graph Web Tree? – BFS
Data and Information Systems Laboratory University of Illinois Urbana-Champaign CCICADA 2012 Meeting March 30, 2012 Web Graph Web Tree Lists of links ›WWW2011 work Link paths? Most probable user navigation ›PageRank We’re working on all of those – PageRank seems to work
Data and Information Systems Laboratory University of Illinois Urbana-Champaign CCICADA 2012 Meeting March 30, 2012 Some explorations – BM25 ranks text
Data and Information Systems Laboratory University of Illinois Urbana-Champaign CCICADA 2012 Meeting March 30, 2012 Propagate information backwards – re-rank
Data and Information Systems Laboratory University of Illinois Urbana-Champaign CCICADA 2012 Meeting March 30, 2012 Map taxonomies Assumption ›Two taxonomies from Web sites of similar organizational missions will be similar Lets do integration
Data and Information Systems Laboratory University of Illinois Urbana-Champaign CCICADA 2012 Meeting March 30, 2012 Some early results
Data and Information Systems Laboratory University of Illinois Urbana-Champaign CCICADA 2012 Meeting March 30, 2012 Brand new result --- Breakthrough this morning Cue scary graphs
Data and Information Systems Laboratory University of Illinois Urbana-Champaign CCICADA 2012 Meeting March 30, 2012 Questions? Challenges?