Character legibility: which details are important, and how might it be measured? John Hayes, Jim Sheedy, Yu-Chi Tai, Vinsunt Donato, David Glabe.

Slides:

Advertisements

Similar presentations

3D Geometry for Computer Graphics

Advertisements

A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.

Canonical Correlation

Geometric Representation of Regression. ‘Multipurpose’ Dataset from class website Attitude towards job –Higher scores indicate more unfavorable attitude.

Covariance Matrix Applications

Combining Classification and Model Trees for Handling Ordinal Problems D. Anyfantis, M. Karagiannopoulos S. B. Kotsiantis, P. E. Pintelas Educational Software.

Tensors and Component Analysis Musawir Ali. Tensor: Generalization of an n-dimensional array Vector: order-1 tensor Matrix: order-2 tensor Order-3 tensor.

Lecture 7: Principal component analysis (PCA)

Eigenvalues and eigenvectors

MULTIPLE REGRESSION. OVERVIEW What Makes it Multiple? What Makes it Multiple? Additional Assumptions Additional Assumptions Methods of Entering Variables.

CSci 6971: Image Registration Lecture 2: Vectors and Matrices January 16, 2004 Prof. Chuck Stewart, RPI Dr. Luis Ibanez, Kitware Prof. Chuck Stewart, RPI.

Ordinary least squares regression (OLS)

1 Chapter 17: Introduction to Regression. 2 Introduction to Linear Regression The Pearson correlation measures the degree to which a set of data points.

Chapter 5 Data mining : A Closer Look.

Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.

Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.

FEATURE EXTRACTION FOR JAVA CHARACTER RECOGNITION Rudy Adipranata, Liliana, Meiliana Indrawijaya, Gregorius Satia Budhi Informatics Department, Petra Christian.

Chapter 2 Dimensionality Reduction. Linear Methods

ASSOCIATION BETWEEN INTERVAL-RATIO VARIABLES

Typography The Art of designing with words and letters.

Quantitative Methods Heteroskedasticity.

Some matrix stuff.

Agenda Review Association for Nominal/Ordinal Data –  2 Based Measures, PRE measures Introduce Association Measures for I-R data –Regression, Pearson’s.

Digital Image Processing, 3rd ed. © 1992–2008 R. C. Gonzalez & R. E. Woods Gonzalez & Woods Matrices and Vectors Objective.

Why Is It There? Getting Started with Geographic Information Systems Chapter 6.

MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.

Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor.

11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.

Typograpy. Type Type - Categories Serif Fonts  Oldstyle  Transitional  Modern  Slab Serif Sans Serif Fonts Script Decorative Monospaced.

© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.

Lecture 12 Factor Analysis.

EXCEL DECISION MAKING TOOLS BASIC FORMULAE - REGRESSION - GOAL SEEK - SOLVER.

Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.

Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.

1 Chapter 8 – Symmetric Matrices and Quadratic Forms Outline 8.1 Symmetric Matrices 8.2Quardratic Forms 8.3Singular ValuesSymmetric MatricesQuardratic.

Multivariate Transformation. Multivariate Transformations  Started in statistics of psychology and sociology.  Also called multivariate analyses and.

Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.

Chapter 13 Discrete Image Transforms

EXCEL DECISION MAKING TOOLS AND CHARTS BASIC FORMULAE - REGRESSION - GOAL SEEK - SOLVER.

DATA ANALYSIS AND MODEL BUILDING LECTURE 9 Prof. Roland Craigwell Department of Economics University of the West Indies Cave Hill Campus and Rebecca Gookool.

Simple and multiple regression analysis in matrix form Least square Beta estimation Beta Simple linear regression Multiple regression with two predictors.

1 Objective To provide background material in support of topics in Digital Image Processing that are based on matrices and/or vectors. Review Matrices.

Although we perceive a pixel as being white, it is actually comprised of 3 colored sub-pixels. The sub-pixels can be separately addressed in order to triple.

Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.

LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.

John Hayes, Jim Sheedy, Yu-Chi Tai, Vinsunt Donato, David Glabe,

Stats Methods at IC Lecture 3: Regression.

Introduction to Vectors and Matrices

Principal Component Analysis (PCA)

Linear Algebra Review.

MANOVA Dig it!.

Chapter 16: Exploratory data analysis: numerical summaries

Regression Techniques

Yu-Chi Tai, PhD, John R. Hayes, PhD , James E. Sheedy, OD, PhD

Principal Component Analysis (PCA)

Understanding Standards Event Higher Statistics Award

Typography The Art of designing with words and letters.

BPK 304W Correlation.

Singular Value Decomposition

Efficient Receptive Field Tiling in Primate V1

Linear Discriminant Analysis

Chapter 1 Warm Up .

Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 – 14, Tuesday 8th November 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR)

Introduction to Vectors and Matrices

Feature Selection Methods

Principal Component Analysis

Introduction to Regression

Principal Component Analysis (PCA)

MGS 3100 Business Analysis Regression Feb 18, 2016

Efficient Receptive Field Tiling in Primate V1

Presentation transcript:

Character legibility: which details are important, and how might it be measured? John Hayes, Jim Sheedy, Yu-Chi Tai, Vinsunt Donato, David Glabe

Objective Determine the features of letters that are most associated with individual letter legibility Measure features in several ways: –Individual letter features –Statistical characteristics of individual components in the singular value decomposition of the letter image

Method Reanalysis of previous dataset Letter legibility was measured using 40 subjects who performed a distance threshold legibility task Using regression analysis we determined the characteristics of letters that were predictive of legibility In a second method of analysis, we reviewed the statistical characteristics of the eigenvalues from the singular value decomposition of each letter.

Stimulus Set 10 letters (a, c, e, m, n, o, r, s, v, and w) 11 fonts (Baskerville, Bodoni, Centaur, Consolas, DIN, Futura, Garamond, Georgia, Helvetica, Rockwell, and Verdana)

Method 1: Letter features Common features among all letters: letter height, letter width, max width of main stroke, min width of main stroke, serifs Many letters had additional unique features

Main stroke minimum width Opening size Maximum height of letter Maximum vertical dimension of main stroke Main stroke maximum width Maximum width of letter

Analysis Stepwise regression of the same letter across different fonts. Identified those features that contributed the most unique variance with respect to legibility. Both linear and quadratic components were considered as legibility can increase with a particular feature up to a point and then decrease Exclusion from the model did not necessarily mean lack of importance, as it may just be correlated with other variables that accounted for slightly more variability.

Main stroke minimum width Opening size Maximum height of letter Maximum vertical dimension of main stroke Main stroke maximum width Maximum width of letter Max height of letter, MS minimum width, MS width ratio (max/min), Max vertical dim of stroke, Serif

Cap opening size Maximum bowl height Maximum bowl width Significant factors: Max height of letter, MS minimum width, Max bowl width

Cross stroke angle Cap opening width Cap opening height Cross stroke width Max height of letter, Max width of letter, MS minimum width, MS width ratio (Ma./min), Max vertical dim of stroke, Bottom to cross-stroke/total letter height, Cap opening width, cross stroke width, cross-stroke angle (degrees), Serif

Maximum lower opening size

Opening size Max height of letter, MS minimum width, MS width ratio (max/min), Opening size, Serif

Maximum bowl width Maximum bowl height Max height of letter, Max width of letter, MS minimum width, MS width ratio (Max/min)

Maximum width of horizontal stroke Horizontal length of horizontal stroke Minimum width of horizontal stroke Width of horizontal stroke at attachment to main stroke Max height of letter, Min width of horizontal stroke, Width ratio (max/min), Width of horizontal stroke at attachment to main stroke, Serif

Backslash angle Opening size, upper curve Vertical dimension of stroke, lower curve Max horizontal width of stroke, lower curve Max horizontal width of stroke, upper curve Vertical dimension of stroke, upper curve Opening size, lower curve Vertical distance between S curves Max height of letter, Max width of letter, MS minimum width, MS width ratio (Max/Min), Max width of stroke perpendicular to point of tangency of vertical dimension- upper curve, Max vertical dimension of upper stroke, Ratio of previous two parms, Max width of stroke perpendicular to point of tangency of vertical dimension, Max vertical dimension of stroke – lower curve, Ratio of previous two parms, Serif

Opening size Max height of letter, Max width of letter, MS minimum width, MS width ratio (Max/Min), Opening size, Serif

Left upper opening size Right upper opening size Lower opening size Max height of letter, Max width of letter, MS minimum width, MS width ratio (Max/Min)

Summary statistics for each of the stepwise models. The predicted relative legibility for each letter is determined by applying the mean attribute value to the stepwise regression models. The observed relative legibility was the average legibility across fonts and subjects. The model R 2 includes both subjects and letter attributes. The attribute R 2 considers only the effect of the letter characteristic. Letter Observed Relative Legibility Predicted Relative Legibility Between S Variance Model R 2 Within S Variance Letter Attributes Variance Attribute R 2 a c e m n o r s v w

Conclusions for individual letter characteristics Demonstrated significant relationships between individual letter attributes and relative legibility. We need the advice of the font designers to inform us on whether this information is helpful in the design process. Furthermore, we need to test some of the relationships in fonts with poor legibility and modify them with suggested improvements to determine a causal relationship between attributes and legibility. Replication with other measures of legibility and fonts will help determine if these findings are robust.

Singular Value Decomposition SVD separates a single matrix into a set of ordered independent matrices that completely define the original matrix. The order is based on the amount of variance that is accounted for by each component. A letter can be transformed into a numerical matrix of 0’s and 1’s based on the on or off state of sub-pixels

SVD model A mn = U mm S mn V T nn –A is the original array based on the picture –U is the orthonormal eigenvector of AA T –V is the orthonormal eigenvector of A T A –S is the diagonal array of eigenvalues which weight the contribution of the elements of the eigenvectors. Multiplying USV for each element of S provides the separate components for the singular value decomposition

Hypothesis The simpler the structure the more legible the letter The statistical properties of the eigenvalues from SVD provide us with information on the simplicity of the structure The summary statistics used included the first eigenvalue, sum of first 2, 5, 10, or 20 eigenvalues, and the slope of the first 5 or 10 eigenvalues. Rationale: the more variance accounted for in the first few eigenvalues, the simpler the structure.

Identify the following letter

First eigenvalue converted back to an image

Eigenvalue 1 + 2

Eigenvalues 1, 2, 3

Eigenvalues 1,2,3,4

Eigenvalues 1,2,3,4,5

Eigenvalues 1,2,3,4,5,6

Eigenvalues 1,2,3,4,5,6,7

Eigenvalues 1,2,3,4,5,6,7,8

Complete matrix, Verdana m

What is this letter?

Eigenvalue 1

Eigenvalue 1,2

Eigenvalue 1,2,3

Eigenvalue 1,2,3,4

Eigenvalue 1,2,3,4,5

Eigenvalue 1,2,3,4,5,6

Eigenvalue 1,2,3,4,5,6,7

Eigenvalue 1,2,3,4,5,6,7,8

Complete Centaur e

Method of Analysis Stepwise regression of statistical eigenvalue properties on legibility. Same dataset as individual letter characteristics. Pixel density was added into the model though it is not a part of SVD Jackknife procedure employed to determine predictive ability of the model. –Analysis was run eleven times, each time excluding a different font. –The legibility of the missing font was predicted by the other fonts. –The r 2 of the predicted legibility with the actual legibility was significant at.42.

First 20 eigenvalues Values range from 0 -> 1 Sum of all eigenvalues = 1.0

Excluded Font VariablesR2R2 Baskerville Density 2, Sum5, E1.52 Bodoni Density 2, Sum5, E1.49 Centaur Density 2, Sum5, E1.50 Consolas Density 2, Sum5.53 DIN Density 2, Sum5.48 Futura Density 2, Sum5, E1.54 Garamond Density 2, Sum5, E1.50 Density 2, Sum5, E1.50 Helvetica Density 2, Sum5, E1.50 Rockwell Density 2, Sum5, E1.50 Verdana Density 2, Sum20.46 All Fonts Density 2, Sum5, E1.51 Summary for SVD Density squared, Sum of the first five eigenvalues, and the value of the first eigenvalue are the most frequent predictors of legibility About 50% of the variability of legibility is accounted for by SVD + density models. The more information represented in the first few eigenvalues, the higher the legibility.

Future Directions In this study legibility was measured by the identification of a single letter in the middle of two other letters. We plan to test legibility of letters based on the similarity of the first eigenvalue of the letters on both the right and the left as well in combination with the target to determine if we can identify a confusion index. We wish to explore this methodology on paragraphs of different fonts to determine if a simpler structure is easier to read.