Presentation is loading. Please wait.

Presentation is loading. Please wait.

Web Intelligence Complex Networks I This is a lecture for week 6 of `Web Intelligence Example networks in this lecture come from a fabulous site of Mark.

Similar presentations


Presentation on theme: "Web Intelligence Complex Networks I This is a lecture for week 6 of `Web Intelligence Example networks in this lecture come from a fabulous site of Mark."— Presentation transcript:

1 Web Intelligence Complex Networks I This is a lecture for week 6 of `Web Intelligence Example networks in this lecture come from a fabulous site of Mark Newman, U of Michigan: http://www-personal.umich.edu/~mejn/

2 This part of the course: WI whenwhatwhywhat else Sep 27 Complex Networks I The WWW is a huge complex network. Many other networks are overlaid upon it. Networks have important and interesting properties, re: speed of information spread, robustness, speed and quality of search, etc. Assignment 1 – worth 70% of CW Oct 4 Complex Networks II Assignment 2 – worth 15% of CW Oct 11 Web Search: How google works Search, obviously, is central to web intelligence Assignment 3 – worth 15% of CW Nov 8 Text mining and knowledge discovery from the WWW Towards inferring useful new knowledge automatically; also for better search, for non marked-up web Reading Nov 15 Web communities and cultural models Understanding how the web influences the formation and behaviour of groups, and the spread of information Reading

3 Introductory Points Graphs and networks are of central importance to us, because: The web is a large and complex network Major phenomena that underpin our existence, such as how information spreads, how diseases develop, how economies evolve, are best viewed mathematically as networks. Networks have structural properties and behaviour. When we analyse the structure of a network, we can reveal important clues about its behaviour. E.g. Predict how fast a virus, or rumour will spread on the web Assess which are the most authoritative web sites Predict how long it will take to search sections of the web Predict how robust to damage an area of the www is, or a cellular process is, etc.

4 This Week’s Material Basic Intro to graphs and networks, terminology, and so on. The interesting properties of real-world networks. Metrics and other structural properties that are currently used to analyse both the www and other networks. To support the understanding of metrics and properties, this week we cover basics of graphs and networks.

5 The very basics A graph is a set of two things: G = {V, E} V = a set of vertices (also called nodes) e.g. V = {A, B, C, D} E = a set of edges (also called arcs, or links) e.g. E = { {A,C}, {A,D}, {B,C}, {B, D} } in which each edge is a set of two vertices from V This graph is: AB CD

6 The very basics II An undirected edge between A and B: {A, B} (or {B, A}) A directed edge between A and B: (A, B) A loop at A: {A, A} or (A, A) A AB AB In an undirected graph, all edges are undirected. In a directed graph, all edges are directed.

7 The very basics III The degree of a node, in an undirected graph, is the number of edges attached to it. In this one, the degrees are: A: 2 B: 3 C: 3 D: 3 E: 0 F: 1 G: 2 What is the mean degree of this graph? AB CD F E G

8 The very basics IV Nodes in directed graphs have in-degrees and out-degrees. Here: Node: in,out as follows: A: 1, 2 B: 1, 2 C: 2, 1 D: 2, 2 E: 1, 1 F: 1,2 G: 0, 2 A directed graph without cycles is called a DAG. Is this a DAG? AB CD F E G

9 The very basics V This is an unlabelled graph. It is exactly the same as (isormorphic to) this one: This is a labelled graph. homepage teachingresearch graphs homepage teaching graphs research Since labels and links have meaning, this one is different:

10 Diversity of graphs: considering only loop-free graphs How many different 2-node, labelled undirected graphs are there? How many different 2-node, labelled directed graphs are there? How many different 3-node, labelled undirected graphs are there? Suppose there are G(k) possible undirected labelled graphs on k nodes. Whenever we add one extra node to an und. Lab. graph on k nodes: Any subset of the k existing nodes could link to it, and there are 2 k such subsets. So the number of possible und. lab. graphs on k+1 nodes is 2 k times what it is on k nodes.

11 Example numbers for undirected labelled graphs Size of graphNumber of possible graphs 5 nodes1024 10 nodes35,184,372,088,832 20 nodes 1.6  10 57 100 nodes 1.3  10 1490 1000 nodesa lot.

12 More basics If there is a path in the graph from each node to every other, the Graph is connected, else it is unconnected. This one? AB CD F E G

13 More basics II Most graphs of interest and importance are far from complete – they tend to be called sparse. AB CD The complete (undirected) graph on n nodes is the graph that contains all n(n  1/)/2 possible edges. Is this one complete? Think about the following graphs: 1: Nodes = students in this university; Edge {A,B} exists if A and B have the same birthday. 2. Nodes = web pages: Edge (A,B) exists if A links to B. 3. Nodes = types of molecules in our bloodstream, Edge(A,B) exists if A interacts with B. 4. Nodes = all living humans. Edge{A,B} exists if A and B have ever shaken hands.

14 More Structural Properties Diameter: length of the longest path between any two nodes Number of components: in undirected graphs Degree distribution: An interesting and important fingerprint of a graph that we will see more of. Modularity: A graph is highly modular if it has several clusters of nodes with many links within the clusters, but few links between the clusters. Hierarchical modularity. A graph seems to be hierarchically modular if it is modular, as above, but the modules are themselves modular.

15 Some Networks One of these is a network of protein interactions in yeast. The other is a visualisation of an outbreak of TB. What do the nodes and edges represent? And … which is which?

16 Is this: spread of HIV infection (node = person / link = HIV transfer) or is it: books about politics (node = book / link = one mentions the other)

17 Notice how the book network is polarised

18 The internet

19 Assignment 1 Read: Exploring Complex Networks, by Steven Stroglatz, Nature 410, 268—276 Write: A 500-word `executive summary’ of most of this article. Leave out Box 1, and the section “Regular networks of coupled dynamical systems”, restart at “Complex network architectures”. AND Write: A 100-word account of what you assess to be the three main points conveyed by this article Write: A 200-word essay about the relevance of those points to the topic of your BSc or MSc (e.g. relevance to AI; relevance to IT(Business), etc..) Word limits in this assignment are important; over the limits means losing marks

20 Marking 30% of the marks: completeness and readability 30% of the marks: evidence of understanding the article, and generally making sense 30% of the marks: clarity of your arguments 10% of the marks: for making me say “Wow”

21 Next week Much more advanced, about: Degree distributions Cluster Co-efficients Modularity and hierarchy Random networks vs real networks Some basic graph algorithms Another article, much smaller, to read.


Download ppt "Web Intelligence Complex Networks I This is a lecture for week 6 of `Web Intelligence Example networks in this lecture come from a fabulous site of Mark."

Similar presentations


Ads by Google