DEVELOPING METHODS FOR SEMANTIC & NETWROK ANALYSIS OF BLOGS:

Slides:

Advertisements

Similar presentations

Chapter 4 Computer Networks

Advertisements

Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.

Community Detection and Graph-based Clustering

O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.

Mining di Dati Web Web Community Mining and Web log Mining : Commody Cluster based execution Romeo Zitarosa.

Finding your friends and following them to where you are by Adam Sadilek, Henry Kautz, Jeffrey P. Bigham Presented by Guang Ling 1.

Analysis and Modeling of Social Networks Foudalis Ilias.

Clustering Software Artefacts Based on Frequent common changes Presented by Haroon Malik.

Beyond random..  A stratified sample results when a population is separated into two or more subgroups, called strata, and simple random samples are.

Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.

Feb 20, Definition of subgroups Definition of sub-groups: “Cohesive subgroups are subsets of actors among whom there are relatively strong, direct,

Object Detection by Matching Longin Jan Latecki. Contour-based object detection Database shapes: …..

Measurement and Analysis of Online Social Networks By Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, Bobby Bhattacharjee Attacked.

Memoplex Browser: Searching and Browsing in Semantic Networks CPSC 533C - Project Update Yoel Lanir.

Projects ( ) Ida Mele. Rules Students have to work in teams (max 2 people). The project has to be delivered by the deadline that will be published.

1 A Graph-Theoretic Approach to Webpage Segmentation Deepayan Chakrabarti Ravi Kumar

Social Networking Algorithms related sections to read in Networked Life: 2.1,

To Blog or Not to Blog: Characterizing and Predicting Retention in Community Blogs Imrul Kayes 1, Xiang Zuo 1, Da Wang 2, Jacob Chakareski 3 1 University.

A Graph-based Friend Recommendation System Using Genetic Algorithm

CLUSTERING. Overview Definition of Clustering Existing clustering methods Clustering examples.

1/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science

Chapter 3. Community Detection and Evaluation May 2013 Youn-Hee Han

Susan O’Shea The Mitchell Centre for Social Network Analysis CCSR/Social Statistics, University of Manchester

 4. Set collaborative working arrangement to enable students to share their knowledge and skills and to build on one another’s strengths.

WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.

LOGO Identifying Opinion Leaders in the Blogosphere Xiaodan Song, Yun Chi, Koji Hino, Belle L. Tseng CIKM 2007 Advisor ： Dr. Koh Jia-Ling Speaker ： Tu.

DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.

2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.

1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.

1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,

Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Overlapping Community Detection in Networks

LexPageRank: Prestige in Multi-Document Text Summarization Gunes Erkan, Dragomir R. Radev (EMNLP 2004)

Computational Biology Group. Class prediction of tumor samples Supervised Clustering Detection of Subgroups in a Class.

Selected Topics in Data Networking Explore Social Networks:

1 Discovering Web Communities in the Blogspace Ying Zhou, Joseph Davis (HICSS 2007)

Web Page Clustering using Heuristic Search in the Web Graph IJCAI 07.

Energy Models for Graph Clustering Bo-Young Kim Applied Algorithm Lab, KAIST.

COMPUTER NETWORKS Quizzes 5% First practical exam 5% Final practical exam 10% LANGUAGE.

GUILLOU Frederic. Outline Introduction Motivations The basic recommendation system First phase : semantic similarities Second phase : communities Application.

Topics In Social Computing (67810) Module 1 (Structure) Centrality Measures, Graph Clustering Random Walks on Graphs.

SOCIAL NETWORKS Mehmet Ali YILMAZ EEE-448 / Assoc. Prof. Dr. Turgay İBRİKÇİ.

Groups of vertices and Core-periphery structure

What Every Chamber Executive Needs to Know About Blogging, Podcasts and Wikis C. David Gammel High Context Consulting (410)

Social Networks Analysis

Searching corpora.

Data Mining K-means Algorithm

Applications of graph theory in complex systems research

A Network Science Approach to Fake News Detection on Social Media

DEMON A Local-first Discovery Method For Overlapping Communities

Empirical analysis of Chinese airport network as a complex weighted network Methodology Section Presented by Di Li.

Core Methods in Educational Data Mining

Storage Virtualization

Graph Analysis by Persistent Homology

Finding Communities by Clustering a Graph into Overlapping Subgraphs

Using Friendship Ties and Family Circles for Link Prediction

Clustering Algorithms for Noun Phrase Coreference Resolution

Apache Spark & Complex Network

DEBBIE CHENG * LISA HANKIN * JOHN MARK JOSLING

CS 594: Empirical Methods in HCC Social Network Analysis in HCI

Mining Social Networks. Contents  What are Social Networks  Why Analyse Them?  Analysis Techniques.

Blackboard Tutorial (Student)

Graph-Based Anomaly Detection

Social Network Analysis

Discovery of Blog Communities based on Mutual Awareness

Analyzing Two Participation Strategies in an Undergraduate Course Community Francisco Gutierrez Gustavo Zurita

Core Methods in Educational Data Mining

LABORATORY FOR INTERNET STUDIES

Presentation transcript:

DEVELOPING METHODS FOR SEMANTIC & NETWROK ANALYSIS OF BLOGS: DISCOURSES OF ISLAM IN THE RUSSIAN BLOGOSPHERE Olessia Koltsova Lidia Pivovarova Anastasia Kincharova Tatiana Yefimova Elisaveta Tereschenko Yulia Pavlova Neanderthals are now in your backyard Blos covering sheep slaughters for Muslim holiday Kurban-Bairam in Russian cities

PROJECT GOALS to extract a population and samples of all texts of a given topic (Islam) (with hyperlinks) to divide the corpus of texts into semantically similar clusters to detect hyperlink-based subgraphs (cohesive subgroups) in the given corpus to juxtapose semantic clusters and cohesive subgroups

DATA Two one-month collections of Russian blog texts & metadata from Yandex archive Russian blogosphere graph file from Yandex One-month collection contains: ca. 30 mln “proper”posts ca. 210 mln “proper” & microblog posts over bln comments Forming a population of topic-relevant texts is a special task involving human coding & expertise We expect dozens of thousands of relevant posts

UNIT OF SEMANTIC ANALYSIS Entire blogs are multi-topical and can not be clusterized except by fuzzy clustering Problem A: still much noise Single posts are usually uni-topical and can be divided into strict clusters with low noise Problem B: juxtaposing with SNA results Populations of topic-relevant posts from each blog can be units to be fuzzily clusterized with low noise Problem C: blogs with more posts will have lower coefficients of belonging to clusters than single-post blogs In the first case the entire blog is treated as a single text. It blurs clusters and introduces a lot of noise. If irrelevant posts are removed, there will be no noise, still clusters will be blurred In the third case it is decomposed into separate posts; irrelevant posts are deleted; relevant posts are clusterized strictly; the blog is reassembled and assigned a set of weights of belonging to clusters in accordance with proportion of posts of each cluster

PROBLEM C A B C D E A: 50%; E: 100%

UNIT OF NETWORK ANALYSIS Entire blogs: network is easily interpreted Problem 1.1: uncomparable with semantic clusters of posts Problem 1.2: structure of intext and friending links in the Russian blogosphere (fusion of blogplatforms and social network platforms; platform dependence) Posts: data comparable Problem 2.1: too few links between posts Problem 2.2: too many links to non-blog resources Posts and comments: detects real conversational networks Problem 3.1: star-like loosely connected subgraphs with unhomogeneous nodes and ties

PROBLEM 3.1. Posts are connected only with intext links which are few, while comment-links are attached only to the one post. In-text links between comments are usually within one discussion attached to the same post; othet links between comments and between comments and post are extremely rare. If dendro0like structure of comments is allowed, each star my unfold into a tree, but not all platforms allow it, so taking this into considereation will make subnetworks uncomparable.

SOLUTION & NEW PROBLEMS A C D If we take bloggers as nodes, we lose ifnormation on who comments what posts and to what clusters those posts belong. An this is a central question: if most actors are like A, hyperlink communities and semantic clusters coincide; if most are like C, they diverge. Trying to take it into consideration, we get a very complex network that is hard to interpete. It is unclear if there is a suitable software. And finally intext links also exist. E

PROBLEM OF SUBGROUP DETECTION Problem 1: choice of metrics Cliques do not apply N-cliques - ? N-clans - ? K-plexes and cores seem to apply if weighted by the size of the subgroup Problem 2: choice of software It should work with large datasets It should fulfill subgroup detection and clustering

Thank you for listening about our problems