1 Representing a Computer Science Research Organization on the ACM Computing Classification System Boris Mirkin School of Computer Science and Information.

Slides:



Advertisements
Similar presentations
Artificial Intelligence 12. Two Layer ANNs
Advertisements

The 5th annual UK Workshop on Computational Intelligence London, 5-7 September 2005 Department of Electronic & Electrical Engineering University College.
Han-na Yang Trace Clustering in Process Mining M. Song, C.W. Gunther, and W.M.P. van der Aalst.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
Introduction to Bioinformatics
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Handwritten Character Recognition Using Artificial Neural Networks Shimie Atkins & Daniel Marco Supervisor: Johanan Erez Technion - Israel Institute of.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Jierui Xie, Boleslaw Szymanski, Mohammed J. Zaki Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180, USA {xiej2, szymansk,
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
CSCI 3 Introduction to Computer Science. CSCI 3 Course Description: –An overview of the fundamentals of computer science. Topics covered include number.
Relevance Feedback based on Parameter Estimation of Target Distribution K. C. Sia and Irwin King Department of Computer Science & Engineering The Chinese.
Neural Networks. R & G Chapter Feed-Forward Neural Networks otherwise known as The Multi-layer Perceptron or The Back-Propagation Neural Network.
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
Business research methods: data sources
Paper Summary of: Modelling Retrieval and Navigation in Context by: Massimo Melucci Ahmed A. AlNazer May 2008 ICS-542: Multimedia Computing – 072.
Evaluation of SEED in Romania and England Angela Sorsby Joanna Shapland University of Sheffield Funded by National Offender Management Service (England)
Chapter 8 Management Support and Coordination Systems.
Education 795 Class Notes Factor Analysis II Note set 7.
Overview of the MS Program Jan Prins. The Computer Science MS Objective – prepare students for advanced technical careers in computing or a related field.
Departamento de Informática Faculdade de Ciências e Tecnologia Universidade Nova de Lisboa Portugal European Master Degree in Computational Logic Contacts.
Evaluating Performance for Data Mining Techniques
Radial Basis Function Networks
Clustering methods Course code: Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu,
1 School of Health in Social Science 2011 UG Entrants’ Survey Analysis.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Student : Sheng-Hsuan Wang Department.
UNIVERSITY OF SOUTH CAROLINA Department of Computer Science and Engineering CSCE 190 Careers in Computer Science, Computer Engineering, and Computer Information.
1 Centro de Competência Nónio Século XXI Faculdade de Ciências Universidade de Lisboa Aims, Scope and Activities 2003/2004.
Data clustering: Topics of Current Interest Boris Mirkin 1,2 1 National Research University Higher School of Economics Moscow RF 2 Birkbeck University.
1 How ACM classification can be used for profiling a University CS department Boris Mirkin, SCSIS Birkbeck, London Joint work with Susana Nascimento and.
DI-FCT-UNL Departamento de Informática Faculdade de Ciências e Tecnologia Universidade Nova de Lisboa UNL’s new Bologna-style 1st-Cycle Degree (BSc) in.
Proliferation cluster (G12) Figure S1 A The proliferation cluster is a stable one. A dendrogram depicting results of cluster analysis of all varying genes.
Transparency in Searching and Choosing Peer Reviewers Doris DEKLEVA SMREKAR, M.Sc.Arch. Central Technological Library at the University of Ljubljana, Trg.
Integrated Database Management at FCT Foundation for Science and Technology - FCT Fundação para a Ciência e a Tecnologia PORTUGAL João G. Crespo Vice-President,
Uncovering Overlap Community Structure in Complex Networks using Particle Competition Fabricio A. Liang
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
Establishing a Software Development Centre in Scotland A cost-quality comparison (Based on information from the Financial Times fDi Benchmark tool – June.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 24 Nov 2, 2005 Nanjing University of Science & Technology.
Sampling Error.  When we take a sample, our results will not exactly equal the correct results for the whole population. That is, our results will be.
Project funded by the Future and Emerging Technologies arm of the IST Programme FET-Open scheme Project funded by the Future and Emerging Technologies.
Computing Ontology Part II. So far, We have seen the history of the ACM computing classification system – What have you observed? – What topics from CS2013.
Establishing an Electronics R&D Centre A cost/quality comparison (Based on information from the Financial Times fDi Benchmark tool – May 2012)
Copyright © Allyn & Bacon 2008 Intelligent Consumer Chapter 14 This multimedia product and its contents are protected under copyright law. The following.
Mestrado em Ciência de Computadores Mestrado Integrado em Engenharia de Redes e Sistemas Informáticos VC 15/16 – TP14 Pattern Recognition Miguel Tavares.
Abstracting concepts from text documents by using an ontology E. Chernyak 1, O. Chugunova 1, J. Askarova 1, S. Nascimento 2, B. Mirkin 1,3 1 Division of.
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
Challenging students to acquire deeper knowledge in HCI course N. Ackovska and M. Kostoska 15 th Workshop on “Software Engineering and Reverse Engineering”
WHAT IS RESEARCH? According to Redman and Morry,
May 2003 SUT Color image segmentation – an innovative approach Amin Fazel May 2003 Sharif University of Technology Course Presentation base on a paper.
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
DATA MINING: CLUSTER ANALYSIS (3) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
PhD at CSE: Overview CSE department offers Doctoral degree in the Computer Science (CS) or Computer Engineering areas (CpE) at both MS to PhD and BS to.
Software Development Center California Scotland Benchmark Report 2016
Analysis of Computing Options at ISU
WP1 – Smart City Energy Assessment and User Requirements
Creating fuzzy rules from numerical data using a neural network
Real-Time Posture Classification and Correction based on a Neuro-Fuzzy Control System Leonardo Martins1,2, Hugo Pereira1, Rui Almeida1, Bruno Ribeiro1,
CSE 4705 Artificial Intelligence
Tabulations and Statistics
network of simple neuron-like computing elements
Cluster Validity For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall For cluster.
Volume 54, Issue 6, Pages (June 2007)
Miguel Tavares Coimbra
Lecture 16. Classification (II): Practical Considerations
QoI: Assessing Participation in Threat Information Sharing
Presentation transcript:

1 Representing a Computer Science Research Organization on the ACM Computing Classification System Boris Mirkin School of Computer Science and Information Systems, Birkbeck College, University of London, United Kingdom Susana Nascimento and Luís Moniz Pereira Computer Science Department and Centre for Artificial Intelligence (CENTRIA) Faculdade de Ciências e Tecnologia Universidade Nova de Lisboa Portugal CENTRIA

2 Motivation: an Objective Portrayal of Research Organisation as a Whole Overview the structure of scientific subjects being developed in the organisation. Position the organisation over the ACM-CCS ontology. Assessing scientific subjects not fitting well to ACM-CCS these are potentially the growth points or other breaking through developments. Planning research restructuring and investment. Overview of scientific field being developed in a country, with a quantitative assessment of controversial areas e.g. the level of activity is not sufficient or the level of activities excesses the level of results.

3 ACM-CCS: Classification level 1 A. General Literature B. Hardware C. Comp. Sys. Organization D. Software E. Data F. Theory of Computation J D I G H C B EF K A CS  G. Mathematics of Computing  H. Information Systems  I. Computing Methodologies  J. Computer Applications  K. Computing Milieux

4 Cluster-Lift Method Express Research Activities of CS Organization (RAO) as a set of CLUSTERS of ACM-CCS Subjects Captures RAO in a straightforward way No information away about individual members or teams Can be implemented on different levels of the taxonomy Needs good clustering tecniques MAP individual clusters to ACM-CCS and GENERALISE them A new approach Extendable to other ontologies and activities

5 Electronic Survey Tool for Data Collection

6 Generic Survey Output: fuzzy memberships over all subjects in 3 rd Layer of ACM-CCS

7 Fuzzy Similarity between ACM-CCS Subjects Contribution by a respondent [ f(i) ] – membership vector over all subjects i in 3rd layer of ACM- CCS from the survey. A(i,j)=f(i)  f(j), the product, for all ACM-CCS 3rd layer subjects i and j. Matrices A(i,j) summed up over all individuals weighted according to their span ranges. Fuzzy similarity measure between two ACM-CCS subjects measure is proportional to the number and importance of research activitives in both subjects (details can be presented).

8 Bulding Overlapping Subject Clusters Additive Clustering with Iterative Extraction (ADDI-S) Given the similarity matrix, the additive clustering problem is of finding one-by-one of K clusters and their intensity weights that minimize the sum of squared errors. Interpretable parameters of cluster intensity and its contribution to the explanation of the data scatter. Leads to tight clusters A subject i belongs to a cluster S in case its similarity is higher than half of the average similarity within the cluster S ; Subject i is also well separated from the rest, because for each entity j  S, its average similarity with S is less than that. Computationally feasible.

9 Generalising Subject Clusters mapped onto ACM- CCS: good and bad cases Blue cluster is tight, all topics are in one ACM-CCS subject. Red cluster is dispersed over many ACM-CCS subjects. CS

10 Elementary Structures The set of subject clusters, their ‘head subjects’, ‘gaps’ and ‘offshoots’ constitutes what can be called the profile of the organization under study. The total count of ‘head subjects’, ‘gaps’, and ‘offshoots’, each type weighted accordingly, can be used for scoring the extent of the fit between a research grouping and the ontology. Lifting a Subject Cluster onto the Ontology

11 Parsimonious Lifting of Subject Cluster onto ACM-CCS Plural Solutions: which one is better? Mapping (B) is better than (A) if ‘gaps’ are much cheaper than additional ‘head subjects’.

12 Real Case Study: 2006 Survey of CS of FCT-Universidade Nova de Lisboa Survey conducted in our Department in 2006 Participation 30 individuals Each one supplied three ACM-CCS 2nd level topics 26 of 59 topics at ACM-CCS 2nd level are covered Additive clustering algorithm ADDI-S Six subject clusters found cl1= {F1, F3, F4, D3} (contribution 27.08%) cl2= {C2, D1, D2, D3, D4, F3, F4, H2, H3, H5, I2, I6} (contribution 17.34%) cl3= {C2, C3, C4} (contribution 5.13%) cl4= {F4, G1, H2, I2, I3, I4, I5, I6, I7} (contribution 4.42%) cl5= {E1, F2, H2, H3, H4} (contribution 4.03%) cl6= {C4, D1, D2, D4, K6} (contribution 4.00%)

13 Profile of DI-FCT-UNL (2006 Survey) GEBKJA E1 E2 E3 E4 E5 G1 G2 G3 G4 K1 K2 K3 K4 K5 K6 K7 K8 H F C D CS I Head subject Offshoot Gap I1 I2 I3 I4 I5 I6 I7 C. Computer Systems Organization F. Theory of Computation D. Software I. Computing Methodologies D. Software and H. Information Systems H. Information Systems D. Software and H. Information Systems

14 Analysis The most contributing cluster with head subject ( ) ‘Theory of Computation’ comprises a very tight group; The next contributing cluster has two head subjects ( ) D. Software and H. Information Systems, and several offshoots among the other head subjects, indicating that this cluster should be the structure underlying a certain unity of the department; There are only 3 offshoots outside the department’s head subjects. E1. Data Structures from H. Information Systems; G1. Numerical Analysis from I. Computing Methodologies; K6. Management of Computing and Information Systems from D. Software as all them seem natural, they potentially could be updated in the list of collateral links of the ACM ontology. GEBKJA H F C D CS I