Download presentation
Presentation is loading. Please wait.
Published byMargery Clark Modified over 8 years ago
1
Multi-Modal Bayesian Embeddings for Learning Social Knowledge Graphs Zhilin Yang 12, Jie Tang 1, William W. Cohen 2 1 Tsinghua University 2 Carnegie Mellon University
2
AMiner: academic social network Research interests
3
Text-Based Approach List of publicationsResearch interests Infer
4
Text-Based Approach Term Frequency => “challenging problem” TF-IDF => “line drawing”
5
Knowledge-Driven Approach List of publications Research interests Infer Artificial Intelligence Data Mining Machine Learning Clustering Association Rules Knowledge bases
6
Problem: Learning Social Knowledge Graphs Mike Jane Kevin Jing Natural Language Processing Deep Learning for NLP Recurrent networks for NER Deep Learning
7
Problem: Learning Social Knowledge Graphs Mike Jane Kevin Jing Natural Language Processing Deep Learning for NLP Social network structure Social text Knowledge base Recurrent networks for NER Deep Learning
8
Problem: Learning Social Knowledge Graphs Mike Jane Kevin Jing Deep Learning Natural Language Processing Deep Learning for NLP Recurrent networks for NER Infer a ranked list of concepts Kevin: Deep Learning, Natural Language Processing Jing: Recurrent Networks, Named Entity Recognition
9
Challenges Mike Jane Kevin Jing Natural Language Processing Deep Learning for NLP Recurrent networks for NER Deep Learning Two modalities – users and concepts How to leverage information from both modalities? How to connect these two modalities?
10
Approach Jane Kevin Jing Deep Learning for NLP Recurrent networks for NER Natural Language Processing Deep Learning Learn user embeddings Learn concept embeddings Social KG Model
11
User Embedding Concept Embedding
12
User Embedding Concept Embedding Gaussian distribution for user embeddings Gaussian distribution for concept embeddings Align users and concepts
13
Inference and Learning Collapsed Gibbs sampling Iterate between: 1.Sample latent variables
14
Inference and Learning Iterate between: 1.Sample latent variables 2.Update parameters Collapsed Gibbs sampling
15
Inference and Learning Iterate between: 1.Sample latent variables 2.Update parameters 3.Update embeddings Collapsed Gibbs sampling
16
AMiner Research Interest Dataset 644,985 researchers Terms in these researchers’ publications Filtered with Wikipedia Evaluation Homepage matching 1,874 researchers Using homepages as ground truth LinkedIn matching 113 researchers Using LinkedIn skills as ground truth Code and data available: https://github.com/kimiyoung/genvector
17
Homepage Matching MethodPrecision@5 GenVector78.1003% GenVector-E77.8548% Sys-Base73.8189% Author-Topic74.4397% NTN65.8911% CountKG54.4823% Using homepages as ground truth. GenVectorOur model GenVector-EOur model w/o embedding update Sys-BaseAMiner baseline: key term extraction CountKGRank by frequency Author-topicClassic topic models NTNNeural tensor networks
18
LinkedIn Matching MethodPrecision@5 GenVector50.4424% GenVector-E49.9145% Author-Topic47.6106% NTN42.0512% CountKG46.8376% GenVectorOur model GenVector-EOur model w/o embedding update CountKGRank by frequency Author-topicClassic topic models NTNNeural tensor networks Using LinkedIn skills as ground truth.
19
Error Rate of Irrelevant Cases MethodError rate GenVector1.2% Sys-Base18.8% Author-Topic1.6% NTN7.2% Manually label terms that are clearly NOT research interests, e.g., challenging problem. GenVectorOur model Sys-BaseAMiner baseline: key term extraction Author-topicClassic topic models NTNNeural tensor networks
20
Qualitative Study: Top Concepts within Topics Query expansion Concept mining Language modeling Information extraction Knowledge extraction Entity linking Language models Named entity recognition Document clustering Latent semantic indexing GenVector Speech recognition Natural language *Integrated circuits Document retrieval Language models Language model *Microphone array Computational linguistics *Semidefinite programming Active learning Author-Topic
21
Qualitative Study: Top Concepts within Topics Image processing Face recognition Feature extraction Computer vision Image segmentation Image analysis Feature detection Digital image processing Machine learning algorithms Machine vision GenVector Face recognition *Food intake Face detection Image recognition *Atmospheric chemistry Feature extraction Statistical learning Discriminant analysis Object tracking *Human factors Author-Topic
22
Qualitative Study: Research Interests Feature extraction Image segmentation Image matching Image classification Face recognition GenVector Face recognition Face image *Novel approach *Line drawing Discriminant analysis Sys-Base
23
Qualitative Study: Research Interests Unsupervised learning Feature learning Bayesian networks Reinforcement learning Dimensionality reduction GenVector *Challenging problem Reinforcement learning *Autonomous helicopter *Autonomous helicopter flight Near-optimal planning Sys-Base
24
Online Test MethodError rate GenVector3.33% Sys-Base10.00% A/B test with live users Mixing the results with Sys-Base
25
Other Social Networks? Mike Jane Kevin Jing Natural Language Processing Deep Learning for NLP Social network structure Social text Knowledge base Recurrent networks for NER Deep Learning
26
Conclusion Study a novel problem Learning social knowledge graphs Propose a model Multi-modal Bayesian embedding Integrate embeddings into graphical models AMiner research interest dataset 644,985 researchers Homepage and LinkedIn matching as ground truth Online deployment on AMiner
27
Thanks! https://github.com/kimiyoung/genvector Code and data:
28
Social Networks Mike Jane Kevin Jing AMiner, Facebook, Twitter… Huge amounts of information
29
Knowledge Bases Computer Science Artificial Intelligence System Deep Learning Natural Language Processing Wikipedia, Freebase, Yago, NELL… Huge amounts of knowledge
30
Bridge the Gap Mike Jane Kevin Jing Computer Science Artificial Intelligence System Deep Learning Natural Language Processing Better user understanding e.g. mine research interests on AMiner
31
Approach Social networkKnowledge base User embeddings Concept embeddings Social KG Model Social text Copy picture
32
Model Documents (one per user) Concepts for the user Parameters for topics
33
Model Generate a topic distribution for each document (from a Dirichlet)
34
Model Generate Gaussian distribution for each embedding space (from a Normal Gamma)
35
Model Generate the topic for each concept (from a Multinomial)
36
Model Generate the topic for each user (from a Uniform)
37
Model Generate embeddings for users and concepts (from a Gaussian)
38
Model General
39
Inference and Learning Collapsed Gibbs sampling for inference Update the embedding during learning Different from LDAs with discrete observed variables Sample latent variables Update parameters Update Embeddings Add picture
40
Methods for Comparison MethodDescription GenVectorOur model GenVector-EOur model w/o embedding update Sys-BaseAMiner baseline: key term extraction CountKGRank by frequency Author-topicClassic topic models NTNNeural tensor networks
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.