Taxonomy Lecture 12. Topics Tutorial Review Classification Frame Terminology Classical Taxonomy Using Classifications –In system use –In system development.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Introduction to Information Retrieval
Member FINRA/SIPCThursday, November 12, 2009 Resource Menu Changes - Report User Experience Study | Kevin Cornwall.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Managing Data Resources
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Client/Server Databases and the Oracle 10g Relational Database
Classification Lecture 11. Topics Tutorial Review Classification Frame Terminology and measures Using Classifications –In system use –In system development.
Physical design. Stage 6 - Physical Design Retrieve the target physical environment Create physical data design Create function component implementation.
8/28/97Information Organization and Retrieval Files and Databases University of California, Berkeley School of Information Management and Systems SIMS.
M1G Introduction to Database Development 1. Databases and Database Design.
ISD3 Semester 2. Review 3 tier web architecture – describe, explain, terminology, typical interactions SQL & PHP Extended ER models Interaction in human.
12 -1 Lecture 12 User Modeling Topics –Basics –Example User Model –Construction of User Models –Updating of User Models –Applications.
Database Design IST 7-10 Presented by Miss Egan and Miss Richards.
Statistical hypothesis testing – Inferential statistics I.
Today Evaluation Measures Accuracy Significance Testing
CHAPTER 4 Research in Psychology: Methods & Design
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Systems Analysis – Analyzing Requirements.  Analyzing requirement stage identifies user information needs and new systems requirements  IS dev team.
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Processing of large document collections Part 3 (Evaluation of text classifiers, applications of text categorization) Helena Ahonen-Myka Spring 2005.
Lecture 7 Interaction. Topics Implementing data flows An internet solution Transactions in MySQL 4-tier systems – business rule/presentation separation.
Short Introduction to Machine Learning Instructor: Rada Mihalcea.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Understanding the Web Site Development Process. Understanding the Web Site Development You need a good project plan Larger projects need a project manager.
CSC271 Database Systems Lecture # 4.
Knowledge representation
Chapter 7 Web Content Mining Xxxxxx. Introduction Web-content mining techniques are used to discover useful information from content on the web – textual.
Are You Experienced? Seeing the Digital World Through Users' Eyes Jeffrey Veen Partner, Adaptive Path
Document Categorization Problem: given –a collection of documents, and –a taxonomy of subject areas Classification: Determine the subject area(s) most.
Rational/Theoretical Cognitive Task Analysis Ken Koedinger Key reading: Zhu, X., & Simon, H. A. (1987). Learning mathematics from examples and by doing.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
© 2007 by Prentice Hall 1 Introduction to databases.
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Instructor: Dr. John J. Kerbs, Associate Professor Joint Ph.D. in Social Work and Sociology.
Classification Lecture 12. Topics Classification Frame Terminology and measures Using Classifications –In system use –In system development Creating Classifications.
© 2001 Business & Information Systems 2/e1 Chapter 8 Personal Productivity and Problem Solving.
Lead Black Slide Powered by DeSiaMore1. 2 Chapter 8 Personal Productivity and Problem Solving.
Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.
Chapter 6: Using Questionnaire Lecture 6 Topics –Question types –Scales –Formatting the questionnaire –Administering the questionnaire –Web questionnaires.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
SOCIAL NETWORKS ANALYSIS SEMINAR INTRODUCTORY LECTURE #2 Danny Hendler and Yehonatan Cohen Advanced Topics in on-line Social Networks Analysis.
1 Learning Chapter 18 and Parts of Chapter 20 AI systems are complex and may have many parameters. It is impractical and often impossible to encode all.
CpSc 810: Machine Learning Evaluation of Classifier.
COMU114: Introduction to Database Development 1. Databases and Database Design.
Processing of large document collections Part 3 (Evaluation of text classifiers, term selection) Helena Ahonen-Myka Spring 2006.
CORE 1: PROJECT MANAGEMENT Designing. This stage is where the actual solution is designed and built. This includes describing information processes and.
General Business 704 Data Analysis for Managers Introduction The Course, Data, and Excel.
User Interfaces 4 BTECH: IT WIKI PAGE:
Human Computer Interaction
Data Structures and Algorithms Dr. Tehseen Zia Assistant Professor Dept. Computer Science and IT University of Sargodha Lecture 1.
Concepts of Database Management Seventh Edition Chapter 1 Introduction to Database Management.
Workforce Scheduling Release 5.0 for Windows Implementation Overview OWS Development Team.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
LECTURE 18 16/11/15. MAKING THE INTERFACE CONSISTENT Consistency is one way to develop and reinforce the users conceptual model of applications and give.
1 CSI5388 Practical Recommendations. 2 Context for our Recommendations I This discussion will take place in the context of the following three questions:
Constructing an Argument Definitions Distinctions Conceptual Analyses Thought Experiments.
1 CS 430 / INFO 430 Information Retrieval Lecture 12 Query Refinement and Relevance Feedback.
Copyright © 2007, Oracle. All rights reserved. Managing Items and Item Catalogs.
Knowledge and Information Retrieval Dr Nicholas Gibbins 32/4037.
Managing Data Resources File Organization and databases for business information systems.
Human Computer Interaction Lecture 21 User Support
What Is Cluster Analysis?
CSc4730/6730 Scientific Visualization
Constructing an Argument
CS Fall 2016 (Shavlik©), Lecture 2
iSRD Spam Review Detection with Imbalanced Data Distributions
Authors: Wai Lam and Kon Fan Low Announcer: Kyu-Baek Hwang
Spreadsheets, Modelling & Databases
Text Mining CSC 576: Data Mining.
Presentation transcript:

Taxonomy Lecture 12

Topics Tutorial Review Classification Frame Terminology Classical Taxonomy Using Classifications –In system use –In system development Review Preview

Tutorial Review-Dating System Outer join –to include unmatched persons as well as matched Select.. from person left join pair –to include only unmatched : where partner is null –(from Placement visit) use with reference tables Do updates before displaying status Use tables within tables for layout Complex calculation has to be repeated –In Oracle/ SQL server, procedure can be stored in DBMS Multi-user issues –Use single queries or transactions for atomicity –Still get problems with ‘dirty’ data – screen allows match for ‘albert’ but albert already matched Refresh shows blank screen – now refresh home screen only An Update would be rejected (but can be ignored)

Classification Errors (Information Retrieval) RelevantIrrelevant Retrieved Not retrieved true negative true positive false negative (Type II error) false positive (Type 1 error) Precision = TP/ (TP + FP) = TP/ Retrieved Recall = TP / (TP + FN) = TP / Relevant Efficiency = (TP + TN) / (TP + TN + FP + FN) = (TP+TN) / Full Collection

Example Calculation : filtering Good Spam reject accept Precision = TP/ (TP + FP) = 3/8 Recall = TP / (TP + FN) = 3/7 Efficiency = (TP + TN) / (TP+TN+FP+FN) = 9/18= 50% Recall > Precision => not quite balanced TP FP FN TN 46

Classification and Systems Design Steps in Classification –defining the domain (what kinds of things are to be classified) –creating the taxonomy (the set of categories), is purpose and force –defining the representation of individuals –defining the mapping between individuals and categories –coding the categories –creating automatic classifiers –assisting human classifiers –assisting users to interpret categorical information –evaluating classification performance –supporting evolution of taxonomy and classifiers “An early step towards understanding any set of Phenomena is to learn what kinds of things there are in the set – to develop a taxonomy” Herbert Simon

Classification in the News Criminal Justice as a Classifer –Murder, Manslaughter or Innocent Is ‘Munchausen by Proxy’ a real psychological condition? Prisoners of war – US invents a new category for the Quantanamo Bay prisoners Blood groups: –A,B,AB,O –RH+, RH- Classification of Cloud types (Cumulus, Cirrus…) by Luke Howard 1802 Hip evaluation to determine priority for replacement Text classification to bring sense to the Internet

Categories in Information Systems Many systems require the user to classify things in the real world into categories in order to process them: –Files and documents on disk –Facilities in the University (helpdesk, reception.. –Skills in a Placements system –Budget headings, Nominal Ledger headings –Complaints –Fault priority On the system, categories can be clearly distinguished: –Codes for each category But the user typically has the task of mapping the real, complex things into the appropriate categories and interpreting categorical information

Categories in IS theory Much of IS theory is based on a taxonomy: –Problem /solution –Method/methodology/technique.. –ER model –Data Flow Diagram –Soft Systems Analysis - CATWOE –Logical /Physical –Swot analysis Strengths/Weaknesses/Opportunities/Treats –Objective, Goal, Requirement, Constraint

Terminology Category/ Class –A group of similar objects Binary Category –An object is either in the category or not Taxonomy –A set of Categories, sometimes organised into a hierarchy, for a common purpose –Multiple Taxonomies may be applied to the same population of objects Categorisation/ Classification –The task of placing objects into the appropriate Category / Class Clustering –The process of identifying similar objects

A dodgy taxonomy The Argentinean writer Jorge Luis Borges ‘Imaginary Beasts’, ‘Labyrinths’..) quotes a ‘certain Chinese encyclopedia’ in which animals are divided into: A) belonging to the Emperor B) embalmed C) tame D) suckling pigs E) sirens F) fabulous G) stray dogs H) included in the present classification I) frenzied J) innumerable K) drawn with a very fine camel hair brush L) et cetera M) having just broken the water pitcher N) that from a long way off look like flies

ABC Classifier Machine Human Categories/Classes Taxonomy

ABC Classifier Machine Human Categories/Classes Taxonomy Categories not Mutually Exclusive An object can be put in any of several categories

ABC Classifier Machine Human Categories/Classes Taxonomy Categories not Complete Some objects don’t belong anywhere

ABC Classifier Machine Human Categories/Classes Taxonomy Categories not Balanced Some categories much larger than others

ABC Classifier Machine Human Categories/Classes Taxonomy Categories Inconsistant Categories lack a single organising principle

Taxonomy design Categories must be: –Mutually exclusive Every object in at most one category –Complete (exhaustive) Every object in at least one category –Balanced Categories divide objects evenly –Consistant Same characteristics used throughout

Kinds of classification Classical –Classes defined by presence of features Square : 4 sides, equal length, equal angles Rectangle : 4 sides, equal angles Triangle : 3 sides, equal length, equal angles Probabilistic –Classes defined by weighted sum of features ‘bird’ moves, winged, feathered, sings, lays eggs Is a robin a bird? Is a emu a bird? Exemplar (prototype) –Classes defined by one or more key examples Robin is a central example of ‘bird’ Chicken is more remote example Which kind is used in IS Theory? Which kind is used in IS Use?

Clustering Clustering techniques find groups of similar objects Used in data mining to identify customer groups with similar buying behaviour… Mathematical Techniques –k-nearest neighbour –ID3 to create decision tree Human Techniques –Card sorting

Classifying Learning Classifiers –Based on sample of population –Classified by hand –Split into two parts The training set used to compute the classifier The test set used to test the ability of the classifier –Many kinds of classifiers available, all need good understanding of statistics e.g. Naïve Bayesian, Decision Tree, SVM –Threshold set to balance recall and precision Rule and example based for human classifier but performance varies with experience and skill –E.g. book classification, Yahoo directory classification, medical diagnosis –Human classifiers need to be trained too –If classification done by end-users, classification is likely to be inconsistent

Tutorial Read ‘Ten Taxonomy Myths’ Problem: A team of consultants has been hired to assist a local voluntary organisation whose aim is to help local people locate organisations which provide relevant services. They have a web site and publish a newsletter – How would you advise them to classify the organisations for ease of recall? What taxonomies would appropriate? Binary or multi-category? What information would you hold about each organisation? How would you gather information on the effectiveness of your taxonomies?

Review 3 tier, 4-tier web architecture – describe, explain, terminology, typical interactions SQL & PHP –No exam questions to write SQL or PHP but reading knowledge required – up to outer joins and example scripts DBMS comparison and selection Entity-Relationship modelling – revision, application Data flow - specification of data flows, XML Sequence diagrams – construct from description Agile Development and Extreme Programming – description, application, comparison with life-cycle Frames – rationale, role in IS development, basic recognition in a problem description of simple frames and the following in detail Matching Frame – typical applications, fitness function, recognising nominal, ordinal, interval and ratio scales, use of weights Classification Frame – typical applications, terminology, calculation of recall and precision, guidelines for constructing a taxonomy

Preview Learning Frame Business Processes Scenarios and Use cases Object-Relational DBMS Data Quality ….

Black board Suggest additional topics Suggest additional resources Ask questions Give me feedback