Download presentation
Presentation is loading. Please wait.
Published byOctavia Jennifer Ross Modified over 9 years ago
1
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) Structural Link Analysis from User Profiles and Friends Networks: A Feature Construction Approach William H. Hsu, Joseph Lancaster, Martin S. R. Paradesi, Tim Weninger Monday, 26 March 2007 Laboratory for Knowledge Discovery in Databases Kansas State University http://www.kddresearch.org/KSU/CIS/ICWSM-20070326.ppt
2
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) Link Analysis in Social Networks: The K-State Corpus
3
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) Outline Background, Related Work and Rationale Technical Objective: Link Mining in Social Networks Methodology: Graph Feature Extraction Experimental Results: K-State LJMiner Corpus Continuing Work: Statistical Relational Models
4
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) Problem Definition Given: records of users of weblog or social network service Discover Features of entities: users, communities Relationships: friendship, membership, moderatorship Explanations and predictions for relationships Goals Boost precision and recall of link existence prediction Find relevant features Significance: Recommendations (Friendship, Membership) Problem Statement: Link Mining in Social Networks
5
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) Related Work: Link Mining Getoor and Diehl (2005) - Graphical model representations of link structure Ketkar et al. (2005) - Data mining techniques vs graph-based representation Sarkar & Moore (2005) - Change in link structure across discrete time steps Popescul & Ungar (2003) - ER model to predict links Hill (2003), Bhattacharya & Getoor (2004) – Statistical Relational Learning to resolve identity uncertainty Resig et al. (2004) - Predicting IM online times using friends graph degree McCallum et al. (2005) - Inferring roles and topic categories based on link analysis
6
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) Rationale Limitations of Current State of the Art Do not take graph features into account Limited ability to select, extract features Novel Contribution: Link Mining System Extracts, computes features of network model Towards dependent types for relational link mining Rationale Desired functionality: infer new links from old Evaluation: precision, recall for link existence
7
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) Outline Background, Related Work and Rationale Technical Objective: Link Mining in Social Networks Methodology: Graph Feature Extraction Experimental Results: K-State LJMiner Corpus Continuing Work: Statistical Relational Models
8
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) K-State Test Bed: LJMiner Corpus User Contact Info User Interest, Schools, Friends Community Membership Info
9
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) LiveJournal Topology [1]: Tools and Security Model LJMindMap.com © 2004 mcfnord © 2007 Denga, Inc.
10
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) LiveJournal Topology [2]: Definitions
11
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) Outline Background, Related Work and Rationale Technical Objective: Link Mining in Social Networks Methodology: Graph Feature Extraction Experimental Results: K-State LJMiner Corpus Continuing Work: Statistical Relational Models
12
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) Graph Features [1]: Node, Pair, Link-Dependent uvuvuvuvuv Node-Dependent Features: specific to one node (vertex) within candidate pair Indegree (u) “Source popularity” Outdegree (u) “Source fertility” Outdegree (v) “Target fertility” Indegree (v) “Target popularity” Pair-Dependent Features: specific to one candidate pair of nodes (vertices) Link-Dependent Features: specific to one link (edge) in directed graph uv Common entities: interests, friends, schools, etc. Attributes of common entities Computed from relational query on entities u, v Past, predicted duration Diagnosed cause Computed and stored with relationship set
13
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) Graph Features [2]: Node and Pair Features in LJMiner Graph Features Interest-Related Features
14
Computing & Information Sciences Kansas State University LJCrawler System Design Data acquisition: client, injector, parser Ancillary issues Multi-threading Distribution Storage Analytical postprocessing: LJClipper, LJStats Distinguishing features of LJCrawler Results 200 users/second maximum, 5 users/second allowed Approximately 2 million pages crawled Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007)
15
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) Outline Background, Related Work and Rationale Technical Objective: Link Mining in Social Networks Methodology: Graph Feature Extraction Experimental Results: K-State LJMiner Corpus Continuing Work: Statistical Relational Models
16
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) Network Statistics: Graph Distance 1000 nodes 4000 nodes
17
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) Interpretation of Results 941-node graph (Hsu et al., 2006): LJCrawler v1 output 1000-4000 node graphs: LJCrawler v2 output
18
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) Outline Background, Related Work and Rationale Technical Objective: Link Mining in Social Networks Methodology: Graph Feature Extraction Experimental Results: K-State LJMiner Corpus Continuing Work: Statistical Relational Models
19
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) Results Establishing an Interdisciplinary Research Initiative K-State / KU / UNL collaboration Resources: Linguistic Data Consortium NIST evaluations Involving End Users of Machine Translation Document users Machine learning, data mining, info extraction researchers Novel Applications Social networks and collaborative recommendation Gisting and beyond
20
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) Information Extraction and Intelligent IR Learning models for IE: ontologies Latent semantic analysis Machine Learning Natural language learning Time series learning and understanding Relational and first-order models Automated Reasoning Probabilistic Case-based and analogical Data Mining and Warehousing Grid Computing Continuing Work
21
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) References Knight, K. What’s New in Statistical Machine Translation. Invited Talk, International Joint Conference on Artificial Intelligence (IJCAI-2005), Edinburgh, UK, August, 2005. Knight, K. & Graehl, J. (2005). An Overview of Probabilistic Tree Transducers for Natural Language Processing. In Proceedings of CICLing 2005, p. 1-24. Chiang, D. A hierarchical phrase-based model for statistical machine translation. In Proceedings of the Conference of the Association for Computational Linguistics (ACL 2005), p. 263–270. Koehn, P., Och, F. J., & Marcu, D. (2003). Statistical Phrase-Based Translation. In Proceedings of HLT-NAACL 2003, the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, May 27 - June 1, 2003, Edmonton, CANADA.
22
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) Acknowledgements K-State Lab for Knowledge Discovery in Databases Vikas Bahirwani Tejaswi Pydimarri Andrew King Social Networks, Graph Theory, Graph Algorithms Kirsten Hildrum (IBM T. J. Watson Labs) Todd Easton (K-State, Industrial and Manufacturing Systems Engineering) Machine Learning Dan Roth, Cinda Heeren, Jiawei Han (University of Illinois at Urbana-Champaign) AnHai Doan (University of Wisconsin – Madison)
23
Computing & Information Sciences Kansas State University Boulder, Colorado First International Conference on Weblogs And Social Media (ICWSM-2007) Questions and Discussion
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.