Presentation is loading. Please wait.

Presentation is loading. Please wait.

Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data To implement a nontrivial data mining application To.

Similar presentations


Presentation on theme: "Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data To implement a nontrivial data mining application To."— Presentation transcript:

1 Python Programming in Context Chapter 7

2 Objectives To use Python lists as a means of storing data To implement a nontrivial data mining application To understand and implement cluster analysis To use visualization as a means of displaying patterns

3 Cluster Data points that have something in common Clusters are dissimilar to each other Use simple Euclidean distance to measure how close one point is to another Centroid is a point that represents a cluster (not necessarily a real data point)

4 Figure 7.1

5 Figure 7.2

6 Figure 7.3

7 Figure 7.4

8 Listing 7.1 def euclidD(point1, point2): sum = 0 for index in range(len(point1)): diff = (point1[index]-point2[index]) ** 2 sum = sum + diff euclidDistance = math.sqrt(sum) return euclidDistance

9 Figure 7.5

10 Listing 7.2 def readFile(filename): datafile = open(filename, "r") datadict = {} key = 0 for aline in datafile: key = key + 1 score = int(aline) datadict[key] = [score] return datadict

11 Indefinite Iteration Repeating a process an unknown number of times Control is based on a boolean expression Infinite loop is possible Any for loop can be written as a while loop

12 Listing 7.3 while : statement1 statement2... statementn

13 Figure 7.6

14 Listing 7.4 sum = 0 for anum in range(1,11): sum = sum + anum print(sum)

15 Listing7.5 sum = 0 anum = 1 #initialization while anum <= 10: #condition sum = sum + anum anum = anum + 1 #change of state print(sum)

16 Listing 7.6 sum = 0 anum = 1 while anum <= 10: sum = sum + anum print(sum)

17 Listing 7.7 def readFile(filename): datafile = open(filename, "r") datadict = {} key = 0 aline = datafile.readline() while aline != "": key = key + 1 score = int(aline) datadict[key] = [score] aline = datafile.readline() return datadict

18 Creating Clusters Decide on number of clusters Choose data points to be initial centroids Assign data points to be members of a centroid Recompute centroids Repeat

19 Listing 7.8 def createCentroids(k, datadict): centroids=[] centroidCount = 0 centroidKeys = [] while centroidCount < k: rkey = random.randint(1,len(datadict)) if rkey not in centroidKeys: centroids.append(datadict[rkey]) centroidKeys.append(rkey) centroidCount = centroidCount + 1 return centroids

20 Listing 7.9 def createClusters(k, centroids, datadict, repeats): for apass in range(repeats): print("****PASS",apass,"****") clusters = [] for i in range(k): clusters.append([]) for akey in datadict: distances = [] for clusterIndex in range(k): dist = euclidD(datadict[akey],centroids[clusterIndex]) distances.append(dist) mindist = min(distances) index = distances.index(mindist) clusters[index].append(akey) dimensions = len(datadict[1])

21 Listing 7.9 continued for clusterIndex in range(k): sums = [0]*dimensions for akey in clusters[clusterIndex]: datapoints = datadict[akey] for ind in range(len(datapoints)): sums[ind] = sums[ind] + datapoints[ind] for ind in range(len(sums)): clusterLen = len(clusters[clusterIndex]) if clusterLen != 0: sums[ind] = sums[ind]/clusterLen centroids[clusterIndex] = sums for c in clusters: print ("CLUSTER") for key in c: print(datadict[key], end=" ") print() return clusters

22 Figure 7.7

23 Listing 7.10 def clusterAnalysis(dataFile): examDict = readFile(dataFile) examCentroids = createCentroids(5, examDict) examClusters = createClusters(5, examCentroids, examDict, 3) clusterAnalysis("cs150exams.txt")

24 Visualizing Clusters Earthquake data Show clusters on a map Use turtle module to plot data

25 Figure 7.8

26 Listing 7.11 def visualizeQuakes(dataFile): datadict = readFile(dataFile) quakeCentroids = createCentroids(6, datadict) clusters = createClusters(6, quakeCentroids, datadict, 7) quakeT = turtle.Turtle() quakeWin = turtle.Screen() quakeWin.bgpic("worldmap.gif") quakeWin.screensize(448,266) quakeWin.setup(width=500, height=300) wFactor = (quakeWin.screensize()[0]/2)/180 hFactor = (quakeWin.screensize()[1]/2)/90 quakeT.hideturtle() quakeT.up() colorlist = ["red","green","blue","orange","cyan","yellow"] for clusterIndex in range(6): quakeT.color(colorlist[clusterIndex]) for akey in clusters[clusterIndex]: lon = datadict[akey][0] lat = datadict[akey][1] quakeT.goto(lon*wFactor,lat*hFactor) quakeT.dot() quakeWin.exitonclick()

27 Figure 7.9


Download ppt "Python Programming in Context Chapter 7. Objectives To use Python lists as a means of storing data To implement a nontrivial data mining application To."

Similar presentations


Ads by Google