Main Project total points: 500

Slides:



Advertisements
Similar presentations
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
Advertisements

QR Code Recognition Based On Image Processing
PARTITIONAL CLUSTERING
Analyzing Data (C2-5 BVD) C2-4: Categorical and Quantitative Data.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
A Comprehensive Study on Third Order Statistical Features for Image Splicing Detection Xudong Zhao, Shilin Wang, Shenghong Li and Jianhua Li Shanghai Jiao.
São Paulo Advanced School of Computing (SP-ASC’10). São Paulo, Brazil, July 12-17, 2010 Looking at People Using Partial Least Squares William Robson Schwartz.
Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin Nov
Multivariate Methods Pattern Recognition and Hypothesis Testing.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
CS Instance Based Learning1 Instance Based Learning.
Big data analytics with R and Hadoop Chapter 5 Learning Data Analytics with R and Hadoop 데이터마이닝연구실 김지연.
Software Engineering Project Fruit Recognition Zheng Liu.
1 of 27 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2013, Michael Kalsher Michael J. Kalsher Department of Cognitive Science Adv. Experimental.
Clustering methods Course code: Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu,
Graphical Analysis. Why Graph Data? Graphical methods Require very little training Easy to use Massive amounts of data can be presented more readily Can.
Machine Vision for Robots
Perception Vision, Sections Speech, Section 24.7.
Generalized Fuzzy Clustering Model with Fuzzy C-Means Hong Jiang Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, US.
Comparing Univariate and Multivariate Decision Trees Olcay Taner Yıldız Ethem Alpaydın Department of Computer Engineering Bogazici University
infinity-project.org Engineering education for today’s classroom 2 Outline How Can We Use Digital Images? A Digital Image is a Matrix Manipulating Images.
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
Handwritten Hindi Numerals Recognition Kritika Singh Akarshan Sarkar Mentor- Prof. Amitabha Mukerjee.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Introduction to Data Mining by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
Copyright © 2001, SAS Institute Inc. All rights reserved. Data Mining Methods: Applications, Problems and Opportunities in the Public Sector John Stultz,
Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.
Machine Learning (ML) with Weka Weka can classify data or approximate functions: choice of many algorithms.
Welcome to MATH:7450 (22M:305) Topics in Topology: Scientific and Engineering Applications of Algebraic Topology Week 1: Introduction to Topological Data.
Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:
COLLECTING AND PROCESSING OF INFORMATION PRESENTATION © 2011 International Technology and Engineering Educators Association, STEM  Center for Teaching.
Experience Report: System Log Analysis for Anomaly Detection
Hierarchical clustering
Main Project total points: 500
Evaluating-Ayasdi’s-Topological-Data-Analysis-For-Big-Data_HKim2015
Histograms CSE 6363 – Machine Learning Vassilis Athitsos
Introduction Machine Learning 14/02/2017.
Statistics Principles of Engineering © 2012 Project Lead The Way, Inc.
Outlier Processing via L1-Principal Subspaces
Main Project total points: 500
Dipartimento di Ingegneria «Enzo Ferrari»,
Collecting and processing of information Presentation 4.5.1
Main Project total points: 500
Final Year Project Presentation --- Magic Paint Face
I'd like to suggest that our Ph.D. programs often do students a disservice in two ways. First, I don't think.
The dataset shown here is colored according to a filter function
4.2 Data Input-Output Representation
Collaborative Filtering Nearest Neighbor Approach
Nearest-Neighbor Classifiers
Data Mining: Exploring Data
Main Project total points: 500
Statistics Principles of Engineering © 2012 Project Lead The Way, Inc.
Collecting and processing of information Presentation 4.5.1
CS 478 Homework CS Homework.
Collecting and processing of information Presentation 4.5.1
By Charlie Fractal Mentor: Dr. Vignesh Subbian
Classification and Prediction
Connecting Data with Domain Knowledge in Neural Networks -- Use Deep learning in Conventional problems Lizhong Zheng.
What Is Good Clustering?
Project HW 6 (Due 3/4) points You are given the following dataset to analyze using TDA Mapper a.) What do you expect the output of TDA mapper to.
Statistics Principles of Engineering © 2012 Project Lead The Way, Inc.
Bar Chart Data Analysis First Generation Third Generation.
A Classification Data Set for PLM
Adaptive multi-voxel representation of stimuli, rules and responses
Default coloring is average filter value
Presentation transcript:

Main Project total points: 500 200/500 = 40% finished by March 27 Introduction, Background, Partial Results/Discussion, Acknowledgement, Author contribution, funding/conflicts, References 250/500 = 50% finished by April 5 400/500 = 80% finished by April 17 500/500 = 100% finished by April 26

XRDS • SUMMER 2014 • VOL .20 • NO.4

most frequently occurring digit in the associated clusters 1,797 data points data point: 8x8 matrix Distance metric: Euclidean Filter function: principal SVD values Node colors: filter values, red = high and blue = low Nodes labels: most frequently occurring digit in the associated clusters 5 intervals with 50 percent overlap. 15 intervals with 50 percent overlap.

We currently maintain 360 data sets as a service to the machine learning community.

Source: E. Alpaydin, C. Kaynak, Department of Computer Engineering, Bogazici University, 80815 Istanbul Turkey, alpaydin '@' boun.edu.tr Data Set Information: We used preprocessing programs made available by NIST to extract normalized bitmaps of handwritten digits from a preprinted form. From a total of 43 people, 30 contributed to the training set and different 13 to the test set. 32x32 bitmaps are divided into nonoverlapping blocks of 4x4 and the number of on pixels are counted in each block. This generates an input matrix of 8x8 where each element is an integer in the range 0..16. This reduces dimensionality and gives invariance to small distortions. For info on NIST preprocessing routines, see M. D. Garris, J. L. Blue, G. T. Candela, D. L. Dimmick, J. Geist, P. J. Grother, S. A. Janet, and C. L. Wilson, NIST Form-Based Handprint Recognition System, NISTIR 5469, 1994. Attribute Information: All input attributes are integers in the range 0..16. The last attribute is the class code 0..9

http://archive. ics. uci http://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits http://archive.ics.uci.edu/ml/machine-learning-databases/optdigits/optdigits.names 5. Number of Instances optdigits.tra Training 3823 optdigits.tes Testing 1797

most frequently occurring digit in the associated clusters 1,797 data points data point: 8x8 matrix Distance metric: Euclidean Filter function: principal SVD values Node colors: filter values, red = high and blue = low Nodes labels: most frequently occurring digit in the associated clusters 5 intervals with 50 percent overlap. 15 intervals with 50 percent overlap.

We currently maintain 360 data sets as a service to the machine learning community.

We will (most likely) NOT use TDA mapper for regression analysis https://en.wikipedia.org/wiki/Regression_analysis We will (most likely) NOT use TDA mapper for regression analysis

http://www.bigdata.uni-frankfurt.de/wp-content/uploads/2015/10/ Evaluating-Ayasdi’s-Topological-Data-Analysis-For-Big-Data_HKim2015.pdf

“Color ranges over red to blue and it has different meanings, depending on the type of attributes. For the continuous values, color represents an average of value. A red node contains data samples that have higher average values. In contrast, a blue node contains lower average values. In contrast, for the categorical values, color represents a value concentration.” Analyze your data

3.2.2.2 Insight by Ranked Variables Going back to the Titanic example, the result of the KS-statistic show, that the variable “Sex” is the most strongly related to passengers death. We could generally assume that men conceded the places in lifeboats to women. Furthermore, it is feasible to deduct the subtle reasons of the death of each group. The passengers in group A died because of two reasons: they were man and the cabin class type was low. The passengers in the group B died because they were man. Finally, the passengers in the group C died because they were staying at third class even though most of them were women.