Link Distribution in Wikipedia [0324] KwangHee Park.

Slides:

Advertisements

Similar presentations

Advertisements

Yansong Feng and Mirella Lapata

Ouyang Ruofei Topic Model Latent Dirichlet Allocation Ouyang Ruofei May LDA.

Modern Language Association (MLA) International Bibliography Hosted by Gale Cengage Welcome to our Guided Tour Tour takes about 7 minutes. The show will.

A Joint Model of Text and Aspect Ratings for Sentiment Summarization Ivan Titov (University of Illinois) Ryan McDonald (Google Inc.) ACL 2008.

Group 4 Project Presentation

Chang WangChang Wang, Sridhar mahadevanSridhar mahadevan.

Measuring Monolinguality Chris Biemann NLP Department, University of Leipzig LREC-06 Workshop on Quality Assurance and Quality Measurement for Language.

CSC 478 Programming Data Mining Applications Course Summary Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Large-Scale Cost-sensitive Online Social Network Profile Linkage.

CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.

Introduction The large amount of traffic nowadays in Internet comes from social video streams. Internet Service Providers can significantly enhance local.

Multilingual Synchronization focusing on Wikipedia

BPOS LOCALIZATION TEMPLATE Zürich, February 2010.

Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.

GoogleDictionary Paul Nepywoda Alla Rozovskaya. Goal Develop a tool for English that, given a word, will illustrate its usage.

Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.

By Josué A. Ruiz Rodriguez Wyatt Lugo Caballero.  What do you understand about Web tool?

Data Mining By Dave Maung.

A MIXED MODEL FOR CROSS LINGUAL OPINION ANALYSIS Lin Gui, Ruifeng Xu, Jun Xu, Li Yuan, Yuanlin Yao, Jiyun Zhou, Shuwei Wang, Qiaoyun Qiu, Ricky Chenug.

1 Yang Yang *, Yizhou Sun +, Jie Tang *, Bo Ma #, and Juanzi Li * Entity Matching across Heterogeneous Sources *Tsinghua University + Northeastern University.

Yarmouk University Department of Computer Information Systems CIS 499 Yarmouk University Department of Computer Information Systems CIS 499 Yarmouk University.

Content Management Systems Allyson Falkner Spokane County ISD

 Goal recap  Implementation  Experimental Results  Conclusion  Questions & Answers.

Submission doc.: IEEE /0073r0 September 2015 Alaa Mourad, BMW GroupSlide 1 Wireless Coexistence in the Automotive Environment – Interest group.

Topic Modeling using Latent Dirichlet Allocation

Multilingual Synchronization focusing on Wikipedia

Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.

1 Yang Yang *, Yizhou Sun +, Jie Tang *, Bo Ma #, and Juanzi Li * Entity Matching across Heterogeneous Sources *Tsinghua University + Northeastern University.

Link Distribution on Wikipedia [0407]KwangHee Park.

CSC 478 Programming Data Mining Applications Course Summary Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.

Plan for today Introduction Graph Matching Method Theme Recognition Comparison Conclusion.

Link Distribution on Wikipedia [0422]KwangHee Park.

Query Type Classification for Web Document Retrieval In-Ho Kang, GilChang Kim KAIST SIGIR 2003.

classification/classify genus invertebrate kingdom phylum/phyla species vertebrate.

Measuring Monolinguality

Bagrut Project in English

Information on Energy Saving Calculation

Automate your content translation with the Google Translate API.

Template library tool and Kestrel training

Mining the Data Charu C. Aggarwal, ChengXiang Zhai

The Gender Analysis Process in Food For Peace Development Programs

SLOPE = = = The SLOPE of a line is There are four types of slopes

Ag. No. 8.4 EARS and rent PPP for 2017

Project 1 Binary Classification

Warm Up – September 25, 2017 Solve the following: 2x2 – 3x – 5 = 0

People-LDA using Face Recognition

Find API Usage Patterns

Scaffolding the Writing Task for

Semantic Soccer: Implementation on Semantic Wiki Platform

EPAN - eGovernment EPAN Administrative Framework

classification/classify genus invertebrate kingdom phylum/phyla species vertebrate.

Link Distribution in Wikipedia

Name of Method (2 to 3 words)

A Suite to Compile and Analyze an LSP Corpus

Hierarchical Relational Models for Document Networks

Introduction to the Framework: Unit1, Key Topic 1. wested

Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,

A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.

A Method for the Comparison of Criminal Cases using digital documents

Engineering Portfolio

Engineering Portfolio

Workshop: Equipment June 29, 2006.

Jinwen Guo, Shengliang Xu, Shenghua Bao, and Yong Yu

STEPS Site Report.

ABSTRACTS AND EXECUTIVE SUMMARIES

Target Language English Created by Jane Driver.

PROJECT NAME YOUR LOGO [ NAME ] [ DATE ] BUSINESS CASE PRESENTATION

Active AI Projects at WIPO

Presentation transcript:

Link Distribution in Wikipedia [0324] KwangHee Park

Table of contents  Introduction  Cluster using LDA  Experiment  Disease, settlement  Demo  Considering Application

Introduction  Why focused on Link  When someone make new article in Wikipedia, mostly they simply link to other language source or link to similar and related article. After that, that article to be wrote by others  Assumption  Link terms in the Wikipedia articles is the key terms which can represent specific characteristic of articles

Introduction  Problem what we want to solve is  To analyses latent distribution of set of Target document by Clustering of Link term set  Find the Tendency of latent distribution of specific Domain by limiting input document to specific Domain

Process  Terminology  Term set = all of terms in the input documents  Topic = Set of term  {W i,…,W n }  Document = Set of term  {W k,W l,…,W n }  Document = set of part of topic  {T n, T k,…,T m }  {Doc : 1 }  {T n : 0.4, T k : 0.3,… }  Clustering Term set  Find latent distribution of each Document  Group by domain

LDA  The clustering techniques  The LDA model consists of a fixed number of topics  Each topic is modeled as a distribution over words.  A document under LDA is modeled as a distribution over topics. Term Set Topic n Topic Topic 3 Topic 2 Topic 1 Doc 1 Doc2 Doc 3

Experiment  Domain :  Disease  #Doc : 208  #Link terms :  English : 46615, Espanola: 34560, French:, 31747Chinese:, 9286 Korean: 3272  Settlement  #Doc : 1328  #Link term :  English : , Espanola: , French:150921, Chinese:93227, Korean:  Number of Topic  10,20,30,40,50,75,100,125,150,175,200,225,250  Demo site 

Considering Application  Document Classification  Classify domain of target document by calculate similarity between topic distribution of document  Usage : Template recommendation,…  Domain characteristic # of appearance / # of total Doc Topic number Disease Settlement

Template recommendation  Starvation Trenton,_New_Jersey  Starvation  Disease  Trenton,_New_Jersey  Settlement

Thanks

Domain characteristic # of appearance /# of total Doc Topic number Disease Settlement