Download presentation
Presentation is loading. Please wait.
Published byBrenda Lydia Blair Modified over 9 years ago
1
Concept Switching Azadeh Shakery
2
Concept Switching: Problem Definition C1C2Ck …
3
Past Work: A Programming Language for Mining Fuzzy ER Graphs ForagerRover bee fly g1 g2 g3 Behavior Term gene1gene2 …
4
Past Work: A Programming Language for Mining Fuzzy ER Graphs Operators: –Neighbor Finding: NBSet WNBSet –Path Finding: Shortestpath Wpath –Set Operators: Union Intersect Cardinality topk Added Features –Type Definition –Function Definition –Seq. Operators Project Reverse Seq2Set Aggregate
5
Past Work: High Level Scripts for Entity Comparison Based on intersection and union of neighbors: NB(e1) NB(e2) / NB(e1) NB(e2) –Tehran, Iran: 27/52 –Baghdad, Iran: 11/52 –Washington, Iran: 0 Based on the shortest path between the two entities –gpcr__g_protein__plc__diacylglycerol –bush__leader__khomeini Based on the length of the shortest path to a base entity Connection to a center node NB(e) NB(c) / NB(e) NB(c)
6
Current Work: Topic/Concept Map Alternative way of accessing information Create an index of information which resides outside that information The topic map describes the information in the documents and databases
7
Multi-Resolution Topic Maps WORDS Word Net High Level Concepts Low resolution High resolution
8
Multi-Resolution Topic Map Static –Discrete Navigation –Challenges: Define resolution Community finding algorithm Summarize Communities Define distance between communities Between which communities do we allow the navigation? Dynamic –Continuous Navigation –Challenge: Define Resolution Online community finding algorithm Summarize communities
9
Challenges Resolution definition – : Resolution –{C 1, C 2, …, C k }: Communities at this level –One way is to define as the link strength threshold – 0 : all links, : No links Community finding algorithm Community distance: –C1, C2 , Similarity(C1, C2) =? |C1 C2| / |C1 C2| Works if communities are allowed to have intersection Community summarization Low resolution Low threshold High resolution High threshold
10
Community Summarization Use the documents to do the summarization Summarize based on the community nodes –Define center nodes to do the summarization: Based on the average MI distance to the other nodes in the community –Slow on very large communities Based on the degree of the nodes –Counts all neighbors as equally important Based on a PageRank like algorithm: –Each node has a centrality value –In each step, each node distributes its centrality to its neighbors proportional to the strength of the link –Do this iteratively until the centrality values converge
11
Community Finding Algorithms: Newman’s Algorithm Newman’s algorithm for detecting community structure in networks: –Modularity: A measure of the quality of a particular division of a network –Modularity measure measures the fraction of the edges in the network that connect vertices of the same type (within community) minus the expected value of the same quantity in the same network with random connections –Consider different divisions of the graph to communities and find the community which maximizes the modularity measure –The number of distinct community divisions grows exponentially in the number of nodes –They use a greedy algorithm to solve the problem –The algorithm is of O((m + n)n)
12
Newman’s Algorithm Communities are of very different sizes –A few very large communities and a lot of small communities No overlapping communities –Definition of neighbor communities is hard Experiments on bee data: –1200 records about apis mellifera (honey bee) –Thr = 0.003 Results
13
Community Finding Algorithms: CPM Clique Percolation Method (CPM) –Locates the kclique communities of unweighted, undirected networks. –Observation: A typical member in a community is linked to many other members, but not necessarily to all other nodes. –A community can be interpreted as a union of smaller complete subgraphs that share nodes. –k-clique community is defined as the union of all k- cliques that can be reached from each other through a series of adjacent k-cliques. –Two k-cliques are said to be adjacent if they share k-1 nodes.
14
Properties of CPM Not too restrictive (compared to cliques) Based on the density of links Local Does not yield cut-nodes or cut-links (whose removal would disjoin the community) Allows overlaps
15
Results thr = 0.05 –228 nodes –1197 edges –CPM: 0 min 0.088 secNewman: 0 min 0.11 sec –16 communities of more that one nodes thr = 0.04 –312 nodes –1483 edges –CPM: 0 min 0.168 secNewman: 0 min 0.21 sec –20 communities of more than one nodes thr = 0.03 –507 nodes –2924 edges –CPM: 0 min 0.511 secNewman: 0 min 0.49 sec –29 communities of more than one node thr = 0.01 –4349 nodes –28595 edges –CPM: 5 min 25.313 secNewman: 1 min 15.21 sec –103 communities of more than one node
16
Sample of Resolution Change neural 0.0889141 nervous 0.0827306 coordination 0.0785593 brain 0.0585748 proboscis 0.0552424 extension 0.0537362 conditioning 0.0487368 learning 0.0470799 system 0.0457777 mushroom 0.0420242 Homeostasis0.037191 olfactory 0.0310599 juvenile 0.0302844 hormone 0.0296593 endocrine 0.0283738 bodies 0.0270794 antennal 0.0237212 conditioned 0.0225992 chemical 0.0216736 reflex 0.021086 proboscis 0.207976 extension 0.20287 conditioning 0.180429 learning 0.143213 conditioned 0.0925473 olfactory 0.0872045 reflex 0.08576 neural 0.178864 Nervous0.178613 coordination 0.167388 brain 0.129372 system 0.116024 mushroom 0.103932 bodies 0.0665873 neurons 0.0592205 homeostasis 0.404266 chemical 0.319049 coordination 0.276685 juvenile 0.297036 hormone 0.292105 jh 0.223579 endocrine 0.18728
17
Concept Switching Construct a topic map for each collection separately Construct one universal topic map
18
Discussion Better ideas for community summarization? Dynamic via static topic maps? Alternative ways of defining resolution
19
Thank you Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.