Presentation is loading. Please wait.

Presentation is loading. Please wait.

Main Project total points: 500

Similar presentations


Presentation on theme: "Main Project total points: 500"— Presentation transcript:

1 Main Project total points: 500
200/500 = 40% finished by March 27 Introduction, Background, Partial Results/Discussion, Acknowledgement, Author contribution, funding/conflicts, References 250/500 = 50% finished by April 5 400/500 = 80% finished by April 17 500/500 = 100% finished by April 26

2 See LABS/ directory for useful R files
But you may want to focus on Euclidean distance AFTER normalizing: From databasics3900.r: > # one way to normalize data > scaledata2 <- scale(data2) # scales data so that mean = 0, sd = 1 > colMeans(scaledata2) # faster version of apply(scaled.dat, 2, mean) # shows that mean of each column is 0 Sepal.Length Sepal.Width Petal.Length Petal.Width e e e e-17 > apply(scaledata2, 2, sd) # shows that standard deviation # of each column is 1 Sepal.Length Sepal.Width Petal.Length Petal.Width P<- select(tbl_df(scaledata2), Petal.Length) # Choose filter m1 <- mapper1D( # Apply mapper distance_matrix = dist(data.frame(scaledata2)), filter_values = P, num_intervals = 10, percent_overlap = 50, num_bins_when_clustering = 10) See LABS/ directory for useful R files # save data to current working # directory as a text file write.table(scaledata2, "data.txt", sep=" ", row.names = FALSE, col.names = FALSE)

3 https://writingcenter.uiowa.edu/#services

4 Mini-presentations in class:
End of March or beginning of April. Over anything related to your project. 5 – 10 minutes/person. Visit speaking center before presentation. Submit summary of visit – what did you learn.

5 https://speakingcenter.uiowa.edu/about-us

6 Modified from http://www.garrreynolds.com/preso-tips/design/
1. Keep it Simple Lots of white space is good: The less clutter you have on your slide, the more powerful your visual message will become. 2. Limit bullet points & text 3. Limit transitions & builds (animation) Only use animations that illustrate a point. Don’t use unnecessary animations. 4. Use high-quality graphics 5. Have a visual theme, but avoid using PowerPoint templates 6. Use appropriate charts 7. Use color well 8. Choose your fonts well use the same font set throughout your entire slide presentation, and use no more than two complementary sans-serif fonts (e.g., Arial and Arial Bold). 9. Use video or audio when appropriate. 10. Organize your talk: Spend time in the slide sorter (or print out your slides at least 6 to a page).

7 In an assertion-evidence slide, the headline is a sentence that succinctly states the slide’s main message Photograph, drawing, diagram, or graph supporting the headline message (no bulleted list) This file presents a template for making Assertion–Evidence (A–E) slides in a technical presentation. The design advocated by this template arises from pages of The Craft of Scientific Presentations (Springer, 2003) and from the first Google listing for “presentation slides”: To follow this template, make sure that you create the slide within this PowerPoint file. Working with a New Slide (under Insert in older versions or as a button on the Home tab in version 2007) , you should first craft a sentence headline that states an assertion about your topic. Having no assertion translates to having no slide. In the body of the slide, you should then support that headline assertion visually: photographs, drawings, diagrams, equations, or words arranged visually. Use supporting text only where necessary. Do not use bulleted lists, because bulleted lists do not reveal the connections between details. This slide shows one orientation for the image and supporting text. Other orientations exist, as shown in the sample slides that follow. Call-out(s), if needed: no more than two lines PowerPoint Template: 7

8 Evaluating-Ayasdi’s-Topological-Data-Analysis-For-Big-Data_HKim2015.pdf

9 We are not (currently) covering persistent homology including barcodes

10

11 Icon Quiz 9 Reading (10 points; Due 4/6 at 7:00 AM) over first page:

12 Introduction The purpose of this paper is to introduce a new method for the qualitative analysis, simplification and visualization of high dimensional data sets, as well as the qualitative analysis of functions on these data sets.

13 Introduction The purpose of this paper is to introduce a new method for the qualitative analysis, simplification and visualization of high dimensional data sets, as well as the qualitative analysis of functions on these data sets.

14 Example: Point cloud data representing a hand.
A) Data Set Example: Point cloud data representing a hand. B) Function f : Data Set  R Example: x-coordinate f : (x, y, z)  x Put data into overlapping bins. Example: f-1(ai, bi) Cluster each bin & create network. Vertex = a cluster of a bin. Edge = nonempty intersection between clusters

15 Introduction The purpose of this paper is to introduce a new method for the qualitative analysis, simplification and visualization of high dimensional data sets, as well as the qualitative analysis of functions on these data sets.

16 Different types of data sets

17 Creating overlapping bins

18 Filter function: eccentricity

19 knn distance with k = 5, 3 intervals, 50% overlap

20 knn distance with k = 5 3 intervals, 50% overlap [ ( ) ) ] (

21 knn distance with k = 5, 50% overlap
3 intervals 5 intervals 10 intervals 100 intervals

22 knn distance with k = 50, 50% overlap
3 intervals 5 intervals 10 intervals 100 intervals

23 Distance Matrix Eigenvector, Mean Centered Distance Matrix
Order of eigenvector: 0 5 intervals, 50% Overlap

24 Distance Matrix Eigenvector, Mean Centered Distance Matrix
Order of eigenvector: 1 5 intervals, 50% Overlap

25 Distance Matrix Eigenvector, Mean Centered Distance Matrix
Order of eigenvector: 1 20 intervals, 20% Overlap

26 Distance Matrix Eigenvector, Mean Centered Distance Matrix
Order of eigenvector: 1 20 intervals, 50% Overlap

27 Distance Matrix Eigenvector, Mean Centered Distance Matrix
Order of eigenvector: 1 20 intervals, 80% Overlap

28 Distance Matrix Eigenvector, Mean Centered Distance Matrix
Order of eigenvector: 1 20 intervals, 80% Overlap --Balanced

29 Introduction The purpose of this paper is to introduce a new method for the qualitative analysis, simplification and visualization of high dimensional data sets, as well as the qualitative analysis of functions on these data sets. Ex: 1.) f(x) = ||x|| 2.) g(x1, …, xn-1) = xn 3.) DSGA decomposition of the original tumor vector into the Normal component its linear models fit onto the Healthy State Model and the Disease component vector of residuals.

30 Some quantitative analysis is also possible

31 3.2.2.2 Insight by Ranked Variables
Going back to the Titanic example, the result of the KS-statistic show, that the variable “Sex” is the most strongly related to passengers death. We could generally assume that men conceded the places in lifeboats to women. Furthermore, it is feasible to deduct the subtle reasons of the death of each group. The passengers in group A died because of two reasons: they were man and the cabin class type was low. The passengers in the group B died because they were man. Finally, the passengers in the group C died because they were staying at third class even though most of them were women.

32 Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition
Singh, Gurjeet; Memoli, Facundo; Carlsson, Gunnar

33

34 We propose a method which can be used to reduce high dimensional data sets into simplicial complexes with far fewer points which can capture topological and geometric information at a specified resolution.

35 We propose a method which can be used to reduce high dimensional data sets into simplicial complexes with far fewer points which can capture topological and geometric information at a specified resolution.

36 Building blocks for a simplicial complex
0-simplex = vertex = v 1-simplex = edge = {v1, v2} Note that the boundary of this edge is v2 + v1 e v1 v2 2-simplex = triangle = {v1, v2, v3} v2 e2 e1 e3 v1 v3 The building blocks for a simplicial complex consist of zero simplices which are zero dimensional vertices, one simplices which are one-dimensional edges, and 2-simplices which are two dimensional triangles, Note that the boundary of this triangle is the cycle e1 + e2 + e3 = {v1, v2} + {v2, v3} + {v1, v3}

37 Creating a simplicial complex
1.) Next add 1-dimensional edges (1-simplices). Note: These edges must connect two vertices. I.e., the boundary of an edge is two vertices

38 Creating a simplicial complex
1.) Next add 1-dimensional edges (1-simplices). Note: These edges must connect two vertices. I.e., the boundary of an edge is two vertices

39 Creating a simplicial complex
In step n, we can add n-simplices, but only if its boundary, a sum of (n-1) dimensional simplices was created in the previous step, the n-1 step where we added n-1 - simplices. In our example, the highest dimensional simplex is a 3-simplex, this solid tetrahedron, so we have a 3-dimensional simplicial complex. This partitioning of an object into 0-simplices, 1-simplices, 2-simplices, etc. is called a triangulation of the object. An object is called a simplicial complex if it has a triangulation. Let us now triangulate a couple of familiar examples. n.) Add n-dimensional n-simplices, {v1, v2, …, vn+1}. Boundary of a n-simplex = a cycle consisting of (n-1)-simplices.

40 Example: Triangulating the disk.
disk = { x in R2 : ||x || ≤ 1 } = Recall that a triangle is topologically equivalent to a circular disk. So as long as I have 3 edges forming a triangle, I can add a 2-dimensional face.

41 Example: Triangulating the circle.
disk = { x in R2 : ||x || ≤ 1 } I now get to punch my topological triangle in order to form the bottom hemisphere. Fist image from

42 Example: Triangulating the sphere.
sphere = { x in R3 : ||x || = 1 } And finally, we fill in the topological triangles with two dimensional faces, We fill in this front face with a triangle. We fill in this side face with the triangle, as well as this back face. The bottom hemisphere is also a topological triangle. It’s boundary consists of three edges so we can also fill it in with a face. In case you don’t see that,

43 Triangulation of a torus

44 Distance Matrix Eigenvector, Mean Centered Distance Matrix
Order of eigenvector: 1 20 intervals, 80% Overlap

45

46 Create overlapping bins:

47 Create overlapping bins:

48 Which choice refers to resolution?
We propose a method which can be used to reduce high dimensional data sets into simplicial complexes with far fewer points which can capture topological and geometric information at a specified resolution. Resolution means ???? Which choice refers to resolution?

49 The idea is to provide another tool for a generalized notion of coordinatization for high dimensional data sets. Coordinatization can of course refer to a choice of real valued coordinate functions on a data set,

50 Dimensionality Reduction: Given dataset D RN
Want: embedding f: D  Rn where n << N which “preserves” the structure of the data. Many reduction methods: f1: D  R, f2: D  R, … fn: D  R (f1, f2, … fn): D  Rn Many are linear, M: RN  Rn, Mx = y But there are also non-linear dimensionality reduction algorithms. U

51 Example: Principle component analysis (PCA)

52 https://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction

53 f1: D  R, f2: D  R, … fn: D  R (f1, f2, … fn): D  Rn

54 f: D  S1 which “preserves” the structure of the data.
Goal f: D  S1 which “preserves” the structure of the data. circle courtesy of knotplot.com

55 circle courtesy of knotplot.com

56 The idea is to provide another tool for a generalized notion of coordinatization for high dimensional data sets. Coordinatization can of course refer to a choice of real valued coordinate functions on a data set, but other notions of geometric representation (e.g., the Reeb graph [Ree46]) are often useful and reflect interesting information more directly.

57 https://en.wikipedia.org/wiki/Reeb_graph


Download ppt "Main Project total points: 500"

Similar presentations


Ads by Google