Presentation is loading. Please wait.

Presentation is loading. Please wait.

JingTao Yao Growing Hierarchical Self-Organizing Maps for Web Mining Joseph P. Herbert and JingTao Yao Department of Computer Science, University or Regina.

Similar presentations


Presentation on theme: "JingTao Yao Growing Hierarchical Self-Organizing Maps for Web Mining Joseph P. Herbert and JingTao Yao Department of Computer Science, University or Regina."— Presentation transcript:

1 JingTao Yao Growing Hierarchical Self-Organizing Maps for Web Mining Joseph P. Herbert and JingTao Yao Department of Computer Science, University or Regina CANADA S4S 0A2 herbertj@cs.uregina.cajtyao@cs.uregina.ca http://www2.cs.uregina.ca/~herbertjhttp://www2.cs.uregina.ca/~jtyao

2 JingTao Yao 2 Growing Hierarchical Self-Organizing Maps for Web Mining Introduction Many information retrieval and machine learning techniques have not evolved to survive the Web environment. There are two major problems in applying some machine learning techniques for Web Mining: 1.The dynamic and ever-changing nature of Web data. 2.Dimensionality and sheer size of Web data.

3 JingTao Yao 3 Growing Hierarchical Self-Organizing Maps for Web Mining Introduction Three domains of application: Web content mining, Web usage mining, and Web structure mining. Self-Organizing Maps (SOM) have been used for: –Web page clustering –Document retrieval –Recommendation systems

4 JingTao Yao 4 Growing Hierarchical Self-Organizing Maps for Web Mining Growing Hierarchical SOMs Growing Hierarchical SOMs are a hybridization of Growing SOMs and Hierarchical SOMs: –Growing SOMs have a dynamic topology of neurons to help solve the dynamic nature of data on the Web. –Hierarchical SOMs are multi-level systems designed to minimize the high dimensionality problem of data. Together, the hybrid system provides a logical solution when considering the combined problem of dynamic, high-dimensional data sources.

5 JingTao Yao 5 Growing Hierarchical Self-Organizing Maps for Web Mining The Consistency Problem The growing hierarchical SOM model suffers from a new problem: –Maintaining consistency of hierarchical relationships between levels. –Training is done locally, without consideration of how changes effect other SOMs that have connection to the local focus. The Web Mining model for Self-Organizing Maps solve this problem through bidirectional update propagation.

6 JingTao Yao 6 Growing Hierarchical Self-Organizing Maps for Web Mining The Web Mining Model for Self-Organizing Maps C w* U Input Layer: Each vector is inserted into the SOM network for the first stage of competition. An iteration is complete once all input vectors have been presented. Hierarchy Layer: A suitable level within the hierarchy of SOMs is found by traversing the tree. The SOM whose collectively maximum similarity to the input is marked and passed to the next layer. Growth Layer: This layer determines whether or not neurons need to be added or subtracted from the current SOM. If error is above an upper bound threshold, neurons are added. If error is below a lower bound threshold, neurons are removed. Update Layer: This layer updates the winning neuron and the neighborhood associated with it. Bidirectional Update Propagation updates parent neurons and children feature maps that are associated with the winning neuron.

7 JingTao Yao 7 Growing Hierarchical Self-Organizing Maps for Web Mining Formal Definition A = {A 1, …, A t } –A set of hierarchy levels. A i = {W i,1, …, W i,m } –A set of individual SOMs. W i,j = {w 1, …, w n } –A SOM of n neurons. –Each neuron contains a storage unit s k and a weight vector v k.

8 JingTao Yao 8 Growing Hierarchical Self-Organizing Maps for Web Mining Three Basic Functions Three functions are introduced for actions on the system: Lev() –Returns the hierarchy level that a SOM currently resides on. Chd() –Returns a set of SOMs that have child relationship to a particular neuron. Par() –Returns the parent SOM of a particular neuron.

9 JingTao Yao 9 Growing Hierarchical Self-Organizing Maps for Web Mining Process Flow for Training Input Determine winning neuron on current level Propagate Updates Downwards Proceed to next Hierarchy Level with closest neuron Propagate Updates Upwards Is neuron and input similar enough ? NY Add / Subtract Neuron Update Neighborhood Update Winner Neuron Is map representing input enough ? YN Bidirectional Propagation 1.Input is inserted into network 2.Neuron that is most similar is selected. 3.Descend through hierarchy until similarity is maximal. 4.Determine whether correct number of neurons represent pattern. 5.Add / Subtract neurons accordingly. 6.Update neuron and neighbourhood. 7.Update children SOMs. 8.Update parent SOM. 1 2 3 4 5 6 7 8

10 JingTao Yao 10 Growing Hierarchical Self-Organizing Maps for Web Mining Conceptual View At the top-most hierarchy level (A 1 ), only one feature map would exist. –This map contains the absolute highest conceptual view of the entire hierarchical structure. Additional SOMs on subsequent levels offer more precise pattern abstraction. –SOMs are denoted by the sequence of their parents. –W 3,6,4 denotes the feature map is the fourth map on the third level derived from the sixth map on the previous level.

11 JingTao Yao 11 Growing Hierarchical Self-Organizing Maps for Web Mining Learning of Features Once a winning neuron wi*has been identified (denoted by an asterisk), its weight vector vi* is updated according to a learning rate α. –The value α decays over time according to the current training iteration. –v i * (q) = v i * (q-1) + α(p k (q) – v i * (q-1)) The neighbourhood must also be updated with a modified learning rate α /. –v Ni*(d) (q) = v Ni*(d) (q-1) + α(p k (q) – v Ni*(d) (q-1))

12 JingTao Yao 12 Growing Hierarchical Self-Organizing Maps for Web Mining Bidirectional Update Propagation Let w i * be the winning neuron in SOM W j,k for input k. To propagate upwards: –Calculate Par(w i * ) = W j-1,m, where Lev(W j-1,m ) < Lev(W j,k ). –Update all neurons w a contained in W j-1,m that are similar to w i *. –v a * (q) = v a * (q-1) + β(p k (q) – v a * (q-1))

13 JingTao Yao 13 Growing Hierarchical Self-Organizing Maps for Web Mining Bidirectional Update Propagation To propagate downwards: –Calculate Chd(w i * ) = A * j+1, where j+1 is the next level in the hierarchy succeeding level j. –Update the corresponding weight vectors for all neurons w b in SOM W j+1,t, where W j+1,t is on the lower level A * j+1. –v b * (q) = v b * (q-1) + γ(p k (q) – v b * (q-1)) The learning rates β and γ are derived from a value of α. Generally, updates to a parent neuron are not as strong as updates to children neurons.

14 JingTao Yao 14 Growing Hierarchical Self-Organizing Maps for Web Mining Web-based News Coverage Example The top-most level of the hierarchy contains news articles pertaining to high-level concepts and are arranged according to their features. The entire collection of Web documents on the online news site are presented through feature maps that abstract their similarities. Individual maps W 2,1, …, W 2,10 are Web documents pertaining to Global, Local, Political, Business, Weather, Entertainment, Technology, Sports, Opinion, and Health news respectively.

15 JingTao Yao 15 Growing Hierarchical Self-Organizing Maps for Web Mining Web-based News Coverage Example Feature map W 2,10 with neurons linking to three children maps: W 3,10,1, W 3,10,2, W 3,10,3. Articles in W 2,10 relate to Health News. W 3,10,1 relates to Health Research Funding. W 3,10,2 relates to Health Outbreak Crises. W 3,10,3 relates to Doctor shortages. New Health-related articles are coming in rapidly relating to a recent international outbreak. Neurons are added to W 2,10 in the Health Outbreak Crises cluster, that point to the SOM W 3,10,2.

16 JingTao Yao 16 Growing Hierarchical Self-Organizing Maps for Web Mining Conclusion The Web mining model of growing hierarchical self- organizing maps minimizes the effect of the dynamic data and high-dimensionality problems. Bidirection Update Propagation allows for changes in pattern abstractions to be reflect on multiple levels in the hierarchy. The Web-based News Coverage example demonstrates the effectiveness of growing hierarchical self-organizing maps when used in conjunction with bidirectional update propagation.

17 JingTao Yao Growing Hierarchical Self-Organizing Maps for Web Mining Joseph P. Herbert and JingTao Yao Department of Computer Science, University or Regina CANADA S4S 0A2 herbertj@cs.uregina.cajtyao@cs.uregina.ca http://www2.cs.uregina.ca/~herbertjhttp://www2.cs.uregina.ca/~jtyao


Download ppt "JingTao Yao Growing Hierarchical Self-Organizing Maps for Web Mining Joseph P. Herbert and JingTao Yao Department of Computer Science, University or Regina."

Similar presentations


Ads by Google