Automated scoring of student trees

Slides:



Advertisements
Similar presentations
Albert Gatt Corpora and Statistical Methods Lecture 13.
Advertisements

O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.
Types of Algorithms.
Boosting Approach to ML
Computing a tree Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
Metrics, Algorithms & Follow-ups Profile Similarity Measures Cluster combination procedures Hierarchical vs. Non-hierarchical Clustering Statistical follow-up.
Introduction to Bioinformatics
Phylogeny and Modern Taxonomy
Warm-Up 3/24 What is a derived characteristic? What is a clade?
Phylogenetic reconstruction
Lecture 21: Spectral Clustering
Discussion #36 Spanning Trees
Bounding Volume Hierarchy “Efficient Distance Computation Between Non-Convex Objects” Sean Quinlan Stanford, 1994 Presented by Mathieu Brédif.
Distance methods. UPGMA: similar to hierarchical clustering but not additive Neighbor-joining: more sophisticated and additive What is additivity?
Assessing cognitive models What is the aim of cognitive modelling? To try and reproduce, using equations or similar, the mechanism that people are using.
Efficient Distance Computation between Non-Convex Objects by Sean Quinlan presented by Teresa Miller CS 326 – Motion Planning Class.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Minimum Spanning Trees
BINF6201/8201 Molecular phylogenetic methods
Aim: Graph Theory - Trees Course: Math Literacy Do Now: Aim: What’s a tree?
A Graph-based Friend Recommendation System Using Genetic Algorithm
Theory of Algorithms: Brute Force. Outline Examples Brute-Force String Matching Closest-Pair Convex-Hull Exhaustive Search brute-force strengths and weaknesses.
Tree-building Survey Development Group Jon Kiparsky Andrew Schonfeld Glenn Thelen
Technological Educational Institute Of Crete Department Of Applied Informatics and Multimedia Intelligent Systems Laboratory 1 CLUSTERS Prof. George Papadourakis,
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
Types of Algorithms. 2 Algorithm classification Algorithms that use a similar problem-solving approach can be grouped together We’ll talk about a classification.
Gesture recognition techniques. Definitions Gesture – some type of body movement –a hand movement –Head movement, lips, eyes Depending on the capture.
Dijkstra animation. Dijksta’s Algorithm (Shortest Path Between 2 Nodes) 2 Phases:initialization;iteration Initialization: 1. Included:(Boolean) 2. Distance:(Weight)
1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.
Spanning Trees Dijkstra (Unit 10) SOL: DM.2 Classwork worksheet Homework (day 70) Worksheet Quiz next block.
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
Biology of Invertebrates The basics of Cladistics.
Graph clustering to detect network modules
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Chapter 15 – Cluster Analysis
Automated scoring of student trees
Reconstructing Evolutionary Trees
Phylogeny - based on whole genome data
Data Mining K-means Algorithm
Distance Computation “Efficient Distance Computation Between Non-Convex Objects” Sean Quinlan Stanford, 1994 Presentation by Julie Letchner.
Lesson Objectives Aims Understand the following “standard algorithms”:
Types of Algorithms.
Multiple Alignment and Phylogenetic Trees
StatQuest: t-SNE Clearly Expalined!!!!
Navigation In Dynamic Environment
A New System of Classification
Types of Algorithms.
Modern Evolutionary Classification 18-2
MATS Quantitative Methods Dr Huw Owens
A New System of Classification
The lowest common multiple The lowest common multiple (or LCM) of two numbers is the smallest number that is a multiple of both the numbers. For small.
Why are CST Star Testing so Important?
Phylogeny and Modern Taxonomy
Systematics Systematics is the science of categorizing organisms into like groups and establishing their relationship relative to each other. Eight major.
Biological Classification: How would you group these animals?
Types of Algorithms.
Minimum spanning trees
CSE 417: Algorithms and Computational Complexity
Unit Genomic sequencing
Phylogenetic Trees Jasmin sutkovic.
Modern Taxonomy traditional classification grouped species according to morphology (body, shape, size and other structural features) modern taxonomy is.
Critical thinking & Application
Classification.
Assembling Genomes BCH339N Systems Biology / Bioinformatics – Spring 2016 Edward Marcotte, Univ of Texas at Austin.
The Asset Reinvestment Logic Diagram
Evolution Biology Mrs. Johnson.
Presentation transcript:

Automated scoring of student trees Two models of algorithmic judgement.

Automated scoring of student trees Two models of algorithmic judgement. (how to tell if a tree is any good, without using your brain)

Last winter, some two hundred freshman biology students were asked to perform a simple task of sorting organisms to indicate how they are related to each other.

Last winter, some two hundred freshman biology students were asked to perform a simple task of sorting organisms to indicate how they are related to each other. The goal was to find out, first, how well they understood evolutionary relationships prior to taking Biology 101.

Last winter, some two hundred freshman biology students were asked to perform a simple task of sorting organisms to indicate how they are related to each other. The goal was to find out, first, how well they understood evolutionary relationships prior to taking Biology 101. The more interesting goal was to understand what sorts of mistakes they made, and to try to understand what they believe about phylogenetic relationships when they enter college.

Last winter, some two hundred freshman biology students were asked to perform a simple task of sorting organisms to indicate how they are related to each other. The goal was to find out, first, how well they understood evolutionary relationships prior to taking Biology 101. The more interesting goal was to understand what sorts of mistakes they made, and to try to understand what they believe about phylogenetic relationships when they enter college. Here are some of the trees they produced.

Some of the trees are pretty good...

Some of the trees are pretty good...

Some of the trees are pretty good...

... and some are not so good.

... and some are not so good.

... and some are not so good.

We’d like to analyze these trees to determine which ones show some understanding of the relevant classifications, and which do not.

We’d like to analyze these trees to determine which ones show some understanding of the relevant classifications, and which do not. Ideally, we’d like to do this without using expensive human brains to do the sorting.

We’d like to analyze these trees to determine which ones show some understanding of the relevant classifications, and which do not. Ideally, we’d like to do this without using expensive human brains to do the sorting. The following slides will show two ways to determine whether trees have organisms grouped into vertebrates and invertebrates, one based on graph relationships and the other based on the organisms’ locations on the screen.

Automated analysis of student trees based on graph relations: the shortest-path method.

How can we tell if organisms are well-grouped?

How can we tell if organisms are well-grouped? In a tree structure, a group is a set of nodes descended from a common ancestor.

How can we tell if organisms are well-grouped? In a tree structure, a group is a set of nodes descended from a common ancestor. This method makes use of this fact to determine group relationships based on nodes’ distances from one another on the graph.

How can we tell if organisms are well-grouped? In a tree structure, a group is a set of nodes descended from a common ancestor. This method makes use of this fact to determine group relationships based on nodes’ distances from one another on the graph. We begin by examining the distances between members of the same group – vertebrates – in a well-formed tree.

Distance from rat to human: 2 Distance from rat to bird: 4 Average distance between vertebrates: 6

Distance from rat to human: 2 Distance from rat to bird: 4 Average distance between vertebrates: 6 Correct groupings.

Distance from rat to human: 2 Distance from rat to bird: 4 Average distance between vertebrates: 6 Distance from rat to human: 2 Distance from rat to bird: 4 Average distance between vertebrates: 3 Correct groupings.

Next, we look at the distances between members of the vertebrates and the invertebrates on the tree.

Distance from rat to snail: 6 Distance from rat to bird: 4 Average distance between vertebrates: 6 Correct groupings.

Distance from rat to snail: 6 Distance from rat to beetle: 6 Average distance between vertebrates: 6 Correct groupings.

Distance from rat to snail: 6 Distance from rat to beetle: 6 Average distance from vertebrates to invertebrates: 6 Correct groupings.

Average distance from vertebrates to invertebrates: 6 Average distance between invertebrates: 3 Correct groupings.

Next, we look at the same distances for an incorrectly-structured tree.

Distance between rat and human: 6 One of these trees has the right idea.... And the other hasn’t Incorrect groupings.

Distance between rat and human: 6 Distance between rat and bird: 4 One of these trees has the right idea.... And the other hasn’t Incorrect groupings. Incorrect groupings.

Distance between rat and human: 6 Distance between rat and bird: 4 Average distance between vertebrates: 5.3 One of these trees has the right idea.... And the other hasn’t Incorrect groupings.

Distance from rat to beetle: 2 One of these trees has the right idea.... And the other hasn’t Incorrect groupings. Incorrect groupings.

Distance from rat to beetle: 2 Distance from rat to snail: 6 One of these trees has the right idea.... And the other hasn’t Incorrect groupings. Incorrect groupings.

Distance from rat to beetle: 2 Distance from rat to snail: 6 Average distance from vertebrate to invertebrate: 4.5 One of these trees has the right idea.... And the other hasn’t Incorrect groupings.

In a well-formed tree, the members of a group will be clustered under one branch of the tree.

In a well-formed tree, the members of a group will be clustered under one branch of the tree. This means that the average distance between members of a group will be smaller than the average distance between any member of the group and the non-group members.

In a well-formed tree, the members of a group will be clustered under one branch of the tree. This means that the average distance between members of a group will be smaller than the average distance between any member of the group and the non-group members. If a tree is not well-formed, the in-group distance and the out-of-group distances will be similar.

In a well-formed tree, the members of a group will be clustered under one branch of the tree. This means that the average distance between members of a group will be smaller than the average distance between any member of the group and the non-group members. If a tree is not well-formed, the in-group distance and the out-of-group distances will be similar. We can derive a grouping score by dividing the out-of-group distances by the in-group distances.

In a well-formed tree, the members of a group will be clustered under one branch of the tree. This means that the average distance between members of a group will be smaller than the average distance between any member of the group and the non-group members. If a tree is not well-formed, the in-group distance and the out-of-group distances will be similar. We can derive a grouping score by dividing the out-of-group distances by the in-group distances. A score of one indicates random placement, higher scores indicate greater clustering of vertebrates or invertebrates.

In a well-formed tree, the members of a group will be clustered under one branch of the tree. This means that the average distance between members of a group will be smaller than the average distance between any member of the group and the non-group members. If a tree is not well-formed, the in-group distance and the out-of-group distances will be similar. We can derive a grouping score by dividing the out-of-group distances by the in-group distances. A score of one indicates random placement, higher scores indicate greater clustering of vertebrates or invertebrates.

Incorrect groupings: out-of-group distance: 5.3 in-group distance: 4.5 Grouping score = 1.1 Correct groupings: out-of-group distance: 6 in-group distance: 3 Grouping score = 2

This works well, if the student is kind enough to connect all of the nodes together.

This works well, if the student is kind enough to connect all of the nodes together. But what do we do if the student's tree is not connected?

Using convex hulls to determine groupings purely from spatial relationships.

This is the “correct grouping” tree, with all of its connections removed.

Clearly, we can no longer count connections to establish groups.

However, the grouping information can be recovered by examining the convex hulls enclosing the organisms.

A convex hull is the smallest convex curve enclosing a set of points.

A convex hull is the smallest convex curve enclosing a set of points. If we draw the convex hulls surrounding two sets of points, for example vertebrates and invertebrates.

A convex hull is the smallest convex curve enclosing a set of points. If we draw the convex hulls surrounding two sets of points, for example vertebrates and invertebrates, we know that the points are separate groups if their hulls do not collide.

Invertebrates Invertebrates Vertebrates

A convex hull is the smallest convex curve enclosing a set of points. If we draw the convex hulls surrounding two sets of points, for example vertebrates and invertebrates, we know that the points are separate groups if their hulls do not collide. This gives us a simple test for whether groups are ideally separated or not.

A convex hull is the smallest convex curve enclosing a set of points. If we draw the convex hulls surrounding two sets of points, for example vertebrates and invertebrates, we know that the points are separate groups if their hulls do not collide. This gives us a simple test for whether groups are ideally separated or not. But what if the groups are only a little bit mixed?

A convex hull is the smallest convex curve enclosing a set of points. If we draw the convex hulls surrounding two sets of points, for example vertebrates and invertebrates, we know that the points are separate groups if their hulls do not collide. This gives us a simple test for whether groups are ideally separated or not. But what if the groups are only a little bit mixed? If we can figure out the minimum number of nodes we have to remove to eliminate the collision

A convex hull is the smallest convex curve enclosing a set of points. If we draw the convex hulls surrounding two sets of points, for example vertebrates and invertebrates, we know that the points are separate groups if their hulls do not collide. This gives us a simple test for whether groups are ideally separated or not. But what if the groups are only a little bit mixed? If we can figure out the minimum number of nodes we have to remove to eliminate the collision We have an indication of how badly the two groups are mixed.

A convex hull is the smallest convex curve enclosing a set of points. If we draw the convex hulls surrounding two sets of points, for example vertebrates and invertebrates, we know that the points are separate groups if their hulls do not collide. This gives us a simple test for whether groups are ideally separated or not. But what if the groups are only a little bit mixed? If we can figure out the minimum number of nodes we have to remove to eliminate the collision We have an indication of how badly the two groups are mixed. Here’s how our collision elimination algorithm does that:

First, the convex hulls are identified First, the convex hulls are identified. The area of overlap is shaded in red.

Next, the organisms are identified for elimination.

The algorithm seeks the node closest to the centroid of the collision area, and removes it.

First, the convex hulls are identified First, the convex hulls are identified. The area of overlap is shaded in red.

The process is repeated until there is no area of collision.

The misplaced nodes are highlighted, identifying the problem areas for the student. Further analysis can identify patterns of mistakes, showing areas needing attention from teachers of biology.