© Brigitte Jörg June 4th, 2008 in Maribor, Slovenia Project Results Analyzing European Research Competencies in IST – Results from a European SSA Project.

Slides:



Advertisements
Similar presentations
How to form a consortium
Advertisements

NH - NCP - 23/10/02 1 6th Framework Programme Introduction to Priority 3 Dir G - DG Research Brussels - 23 October 2002.
R&D in statistics An Update. R&D in statistics DOSIS SUP.COM 5th Framework Programme 1st Call Tools & Methods Applications.
European Conference GEO welcomes FP7 Constanta, Romania, 15 th September 2006 ERA ENV – Integration of Associated Candidate Countries and New EU Member.
We’re here for you. “European Exchange of Best Practice in Arson Investigation and Prevention” European exchange of best practice in arson investigation.
Creating the User’s European Digital Library Jill Cousins The European Library Knowbynet, Berlin, June 2007.
Info Day - Promoting FP7 – Theme 1 HealthFeb. 14th, 2007, Bucharest Building on the success of SMEs go Life Sciences SMEs go Health.
© Brigitte Jörg iConnectEU Workshop – October 16th, 2008 Brussels Project Results Knowledge Base for RTD Competencies in IST – Results from a European.
Erasmus Thematic Network Sanne Hirs, Project coordinator Faculty of Law, Utrecht University.
Delegations III KAM, Bratislava 4th to 8th September 2013.
Knowledge Management LXV International Council Meeting Qawra, Malta 16 th - 23 rd of March 2014.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Delegations IV KAM Prague 3rd to 7th September 2014.
Dawn Wright Oregon State University Ned Dwyer Coastal & Marine Resources Centre, Ireland The International Coastal Atlas Network (ICAN) FGDC Marine & Coastal.
Sixth Framework Programme INTRODUCTION . Contents Contents of FP6 Instruments in FP6 Financing Budgets Timetable Highlights.
FP 6- Thematic Priority 4 Aeronautics & Space Joseph Prieur - Aeronautics DG Research- Space &Transport.
EuroCRIS Best Practice Task Group: a concept and workplan Sergey Parinov TG leader Best Practice.
Dr. Jūratė Kuprienė Director for innovations and infrastructure development Workshop: Information services for research process , Rīga Research.
1 Building Semantic Applications Paul Warren
DFKI GmbH, , R. Karger Indo-German Workshop on Language Technologies Reinhard Karger, M.A. Deutsches Forschungszentrum für Künstliche Intelligenz.
Location of JSI EuropeSlovenia Micro-location of JSI Department of Knowledge Technologies Jožef Stefan Institute Ljubljana.
IST Call 5 Information Day, Luxembourg, 16 June 05 TEL-ME-MOR The European Library: Modular Extensions for Mediating Online Resources
FIIT STU Bratislava Classification and automatic concept map creation in eLearning environment Karol Furdík 1, Ján Paralič 1, Pavel Smrž.
European Virtual Laboratory of Mathematics Daniela Velichová Katedra matematiky, Strojnícka fakulta Slovenská technická univerzita.
Funded by the 7 th Framework Programme of the European Union The CHOICE project Strengthening Chinese Collaboration on ICT R&D with Europe Dr T J Owens.
1 The Project Circle, January 27, 2005 How can the NCP support you? Mária Búciová, IST NCP Slovak University of Technology in Bratislava, Slovakia.
T HE SK CRIS - INFORMATION SYSTEM ON RESEARCH, DEVELOPMENT AND INNOVATION Zendulkova Danica.
Implementation Instruments for FP6 Thematic Priorities Joseph Prieur - Aeronautics DG Research- Space &Transport.
Towards a European network for digital preservation Ideas for a proposal Mariella Guercio, University of Urbino.
2003 grant Foundation for European Forest Research Forestry activities of international organizations in relation to the needs and expectations of Central.
European Business Register Congress of the Notaries of Europe, Brussels, 28 June 2011.
European Virtual Laboratory of Mathematics Daniela Velichová Department of Mathematics Mechanical Engineering Faculty Slovak University of Technology.
National Library of Estonia in the TEL-ME-MOR project IST4Balt workshop in Estonia June 2006 Baltic ICT Community.
11th International Conference on Interactive Computer aided Learning September 24 –25, 2008, Villah, Austria EVLM pilot project - European challenges in.
1 SMEs – a priority for FP6 Barend Verachtert DG Research Unit B3 - Research and SMEs.
Participation in 7FP Anna Pikalova National Research University “Higher School of Economics” National Contact Points “Mobility” & “INCO”
Science, research and development European Commission DG RTD A-2/Peter Härtwich 09/2001 Associated candidate countries in the 5th Framework Programme Associated.
IRC: ICT Presentation Sept Opportunities through Collaboration Ing.Pierre Theuma Manager, IRC Malta.
1 Direction scientifique Networks of Excellence objectives  Reinforce or strengthen scientific and technological excellence on a given research topic.
Prof. Giuseppe Burgio, EuroSapienza, Rome. My presentation: 1.From the end of the 2nd World War to the European UnionFrom the end of the 2nd World War.
Cracow Seminar, 2006Dragon – fly Project Presentation of the Dragon-fly Project – its aims and concept Patrycja Wojtaszczyk & Krzysztof Puchalski National.
The MICHAEL Project is funded under the European Commission eTEN Programme The multilingual catalogue of digital cultural heritage in Europe.
EIPA CAF Resource Centre CAF CAF activities – state of affairs Patrick Staes & Ann Stoffels EIPA CAF Resource Centre Berlin, 8-9 February 2007.
Specific call for the extension of existing QoL contracts to include partners from the NAS (I) Why such a call for proposals? -To promote the participation.
TÜBİTAK SOCRATES II European Community action programme in the field of education Duration: 1 January December 2006 Budget: 1,850 mEuro over.
Horizon 2020 Spreading Excellence and Widening Participation TWINNING.
Socrates - Erasmus Presentation to Academic Staff Institute of Technology Blanchardstown 29 November 2003 Higher Education Authority An tÚdarás.
DG ResearchEuropean Commission RTD 06/FK/lb 09/2001 Special measures to further improve participation of Newly Associated States (NAS) in FP-5.
IST World Follow-Up – January 2009 – To merge with/add to the proposal presented by Keith Jeffery (STFC)
Benchmarking tool for Quality Assurance in VET.
Poznan 19 April Kwietnia 2005 Poznan FP6 The Next Calls for 2005 Technology Platforms FP7.
Malgorzata Gliniecka Institute of Fundamental Technological Research Polish Academy of Sciences PRO_NMS Košice, Slovakia 1 Dezember 2006 PRO_NMS Pro Active.
© Enterprise Europe Network South West 2009 The Eurostars Programme Kenny Legg R&D Funding for the Environmental Sector – 29 June 2010 European Commission.
Current State-Operated Scientific Information Systems within the EU Presentation on euroCRIS Meeting, Prague, 7th – 9th November 2010 Danica Zendulkova,
Bureau for International Research and Technology Cooperation Herlitschka 1 Warsaw FP6 Launch Conference - 26 Nov Small and Medium Enterprises -
T Mathea Fammels, Head of Unit (acting) Policy and Communications Vilnius/ Lithuania, 30 October 2015 The EIT Regional Innovation Scheme and opportunities.
E u r o g u i d a n c e A Network of National Resource and Information Centres for Guidance Established in 1992.
 the creation of an "internal market" in research (free movement of knowledge, researchers and technology)  the restructuring of the European research.
E u r o g u i d a n c e A Network of National Resource and Information Centres for Guidance Established in 1992.
EuroCRIS Projects Task Group November 2005, Lisboa.
Supporting Network in the Baltic Candidate Countries Partnering Meeting in Birini, Presented by Zygmunt Krasiński NBCC Project manager Zygmunt.
This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under.
Work Package 3 „Alignment & Monitoring of research programs and policy” Annette Angermann & Wenke Apt Brussels, 21th January 2016.
Coordination and Policy Development in Preparation for a European Open Biodiversity Knowledge Management System Supported by the European Commission through.
DG ResearchEuropean Commission RTD 06/FK/lb 09/2001 Special measures to further improve participation of Newly Associated States (NAS) in FP-5.
EGI-InSPIRE RI EGI Compute and Data Services for Open Access in H2020 Tiziana Ferrari Technical Director, EGI.eu
Monitoring progress in the field of education and training
The Joint Action on Health Information InfAct
DG RTD Common Implementation Centre Common Data and
Presentation transcript:

© Brigitte Jörg June 4th, 2008 in Maribor, Slovenia Project Results Analyzing European Research Competencies in IST – Results from a European SSA Project – Brigitte Jörg, Jure Ferlez, Hans Uszkoreit, Mitja Jermol (DFKI) (IJS) (DFKI) (IJS)

© Brigitte Jörg June 4th, 2008 in Maribor, Slovenia Project Results Project Information  Funding Organization: European Commission  Funding Program: Sixth Framework Programme (FP6: IST (3 rd Call))  Project Type: Specific Support Action (SSA)  Duration: 32 Months (April 2005 – November 2007)  Project Co-ordination: DFKI GmbH  Technical Co-ordination: Jozef Stefan Institute (IJS)  Technology Partners: DFKI, IJS, Ontotext, CCLRC  Project Consortium: 15 partners from EU MS, NMS and ACC

© Brigitte Jörg June 4th, 2008 in Maribor, Slovenia Project Results Project Consortium  Deutsches Forschungszentrum für Künstliche Intelligenz, Germany  Institute Jozef Stefan, Slovenia  Ontotext Lab, Sirma AI EAD, Bulgaria  RTD Talos, Cyprus  Institute of Information Theory and Automation, Czech Republic  Archimedes Foundation, Estonia  Comp. and Autom. Research Inst., Hung. Academy of Sc., Hungary  Institute of Mathematics and Computer Science, Uni of Latvia  Lithuanian Innovation Centre, Lithuania  Projects in Motion, Malta  Technical University of Silesia, Poland  National Institute for R&D in Informatics, Romania  Slovak University of Technology, Poland  TUBITAK, Turkey  The Science and Technology Facilities Council, UK (formerly CCLRC, UK)

© Brigitte Jörg June 4th, 2008 in Maribor, Slovenia Project Results Technology Partners DFKI Co-ordinator “LT World” Portal Information Extraction Semantic Web Jozef Stefan Institute Technical Co-ordinator “Project Intelligence” Data Mining Social Network Analysis Ontotext “KIM Semantic Annotation Platform” euroCRIS “CERIF” Standard Access to Data

© Brigitte Jörg June 4th, 2008 in Maribor, Slovenia Project Results Project Objectives  Set up and populate an information portal on IST research  Provide information about RTD actors and their experience and expertise  Provide innovative and automated services  To promote RTD competencies in specific fields  To support partner search for IST proposals and commercial projects

© Brigitte Jörg June 4th, 2008 in Maribor, Slovenia Project Results Presentation Outline  Information Repository  Data Collection  Data Integration / Data Cleaning  Evaluation of Results  Analytic Tools  Overall Conclusion

© Brigitte Jörg June 4th, 2008 in Maribor, Slovenia Project Results Repository Features  Information Repository (CERIF 2004) containing  Organisation  Person  Project  Publications  Data Collection (CERIF XML) from  National CRISs  National Collections  Web Crawlings  Community Support  Data Integration into ONE single dataset  to enable analysis at European Level  Data Cleaning with  Supervised Machine Learning Methods (Active Learning)

© Brigitte Jörg June 4th, 2008 in Maribor, Slovenia Project Results Repository Data Analysis  Duplicate records inherent in single datasets  Even more duplicate records after merging single datasets  Most obvious duplicates for organisations and persons  no significant number of duplicate projects  publications have been ignored  Duplicate records are a known problem

© Brigitte Jörg June 4th, 2008 in Maribor, Slovenia Project Results  Problem: duplicate detection in record set A  Given: a set of records in A  Classify: every pair (a,b) A x A M U (set of true matches) (set of true non matches) Formal Problem Definition (Winkler 2006)

© Brigitte Jörg June 4th, 2008 in Maribor, Slovenia Project Results  Heuristic Analysis of Random Samples: National Datasets / Cordis Datasets  most obvious duplicates found inside Cordis FP5 and Cordis FP6 datasets and across Cordis FP5 and FP6 datasets  not so many duplicates found in national datasets  a lot of duplicate person records across all datasets  no duplicate records found in project datasets  only some duplicate records across project datasts  publications have not been examined  Decision taken with respect to the IST World scope  not touching project records  ignore publication records  find a solution for person records (IST World Community)  concentrate on cleaning organisation records IST World Problem Definition

© Brigitte Jörg June 4th, 2008 in Maribor, Slovenia Project Results Problems with Organisation Records Most entries had slightly different names caused by additional special characters or character modifications  Capitalization, Lowercase Letters  Blanks, extra Spaces  Hyphens  Quotes  Coma in Different Places  Article in Name  Full stop in Name  Incomplete Names  English Translation  Word Order  Language Specific Characters (Jorg instead of Jörg)  Special Characters (wrong encoding &, ?, )  Mixture of Organisation Names and Department Names  Differences in Addresses Data Cleaning Application

© Brigitte Jörg June 4th, 2008 in Maribor, Slovenia Project Results IST World Dataset Integration Organisation Names: Fulltext Indexing Querying Organisation Names + Location (1) Name/Location Strings (Bag of Words) (2) Word/Character Order (String Kernels) (3) Spelling Errors (Edit Distance Measure) (4) Normalization of (1-3) Human Decision M = Match U = Non-Match - = unknown Machine Learning (Support Vector Machine) M = Match U = Non-Match - = unknown Machine Decision M = Match U = Non-Match Knowledge about Records

© Brigitte Jörg June 4th, 2008 in Maribor, Slovenia Project Results Active Learning Application

© Brigitte Jörg June 4th, 2008 in Maribor, Slovenia Project Results Evalution of Results in CORDIS FP6 dataset  human evaluation of 1000 organisation record pairs  30 M correct; 934 U correct  1 M incorrect; 35 U incorrect  97% precision  46% recall  integration approach worked well  can be used for large scale integration tasks  Result: semi-automated identification of 4000 duplicates with high accuracy and a reasonable recall

© Brigitte Jörg June 4th, 2008 in Maribor, Slovenia Project Results Analytic Tools  Advanced Tools  Collaboration Diagram  Competence Diagram  Experimental Tools  Collaobration Trends  Competence Trends  Consortia Prediction  Semantic Search

© Brigitte Jörg June 4th, 2008 in Maribor, Slovenia Project Results How to analyze or generate a Diagram  definition of a query in the IST World Portal  get a list of result records matching the query  generate diagrams based on results

© Brigitte Jörg June 4th, 2008 in Maribor, Slovenia Project Results Competence Diagram Query: IST SSA projects within FP6 Aim: investigate the thematic range of SSA projects in FP6 Thematic Areas (Blue Clouds): SEMANTIC HEALTH LEGAL CHANGING ROADMAP SOFTWARE Projects (Red Dots) Linked with Full Record in Repository

© Brigitte Jörg June 4th, 2008 in Maribor, Slovenia Project Results Competence Diagram Query: IST SSA projects within FP6 Aim: investigate the thematic range of SSA projects in FP6 Goals (List of Keywords): DEMENTIA PEOPLE MEDICAL STANDARDS … Configuration of Result Space: 40% of result list 30 topics

© Brigitte Jörg June 4th, 2008 in Maribor, Slovenia Project Results Competence Diagram Query: IST SSA projects within FP6 Aim: investigate the thematic range of SSA projects in FP6 Goals Configuration of Result Space: 40% of result list 30 topics Themes

© Brigitte Jörg June 4th, 2008 in Maribor, Slovenia Project Results Collaboration Diagram Query: IST SSA projects within FP6 Aim: investigate the collaboration of SSA partners in FP6 Number of joint partners Configuration of Result Space: 20% of result list Project

© Brigitte Jörg June 4th, 2008 in Maribor, Slovenia Project Results Evaluation of Analytic Tools  IST World allowed to perform the tasks defined  for more details see the full paper in the Proceedings  All analytics depend on the data behind  The analytic tools are very powerful

© Brigitte Jörg June 4th, 2008 in Maribor, Slovenia Project Results Evaluation of Queries  Query execution performed in March 2008  Queried datasets IST World / Cordis IST World Portal: CORDIS Search:

© Brigitte Jörg June 4th, 2008 in Maribor, Slovenia Project Results Results of Query Evaluation Discovered inconsistencies with Cordis data:  „FP6“ string: 30 of 80 relevant records missed the string  „SSA“ string: 15 of 208 relevant records missed the string  „Specific Support Action“ string: 15 of 208 relevant records missed the string  Dates (Year of the call): not consistently recorded  Query 1: 22 projects contained the string „Coordination Action“, „Specific Targeted Action“, „Integrated Project“, others  An investigation of the results of the Query 1 in Cordis revealed: 80 projects of the result list are missing in IST World

© Brigitte Jörg June 4th, 2008 in Maribor, Slovenia Project Results Overall Conclusion  Integration Method:  Could be further developed  Test data could be used to generate a better classification model  Feature generation could be improved by using ontological knowledge  Transfer learning methods might be helpful for re-use of the learned model  Evaluation of large Datasets:  very difficult  needs expert knowledge  Analytic Tools:  depend on quality data behind  are very powerful for investigation of large datasets

© Brigitte Jörg June 4th, 2008 in Maribor, Slovenia Project Results European Research Dataset (entries)  Europan Research: Orgs, Proj, Exp, Pubs  Bulgaria: 794 Orgs, 73 Proj, Exp, Pubs  Cyprus: 29 Orgs  Czech Republic: 183 Orgs, 163 Proj, 164 Exp  Estonia: 75 Orgs, 1256 Proj, 6726 Exp., Pubs  Hungary: 2665 Orgs, 1297 Proj, 2425 Exp  Latvia: 106 Orgs, 830 Proj, 701 Exp  Lithuania: 102 Orgs,  Malta: 58 Orgs, 27 Proj, 898 Exp, 180 Pubs  Poland: 1451 Orgs, 2179 Proj, 7392 Exp, Pubs  Romania: 169 Orgs, 68 Proj, 87 Exp  Serbia: 60 Orgs, 2278 Exp, Pubs  Slovenia: 1723 Orgs, 3748 Proj, Exp  Slovakia: 56 Orgs, 432 Proj, 683 Exp.  Turkey: 285 Orgs  EPRI-start: 286 Orgs, 275 Exp  Cordis FP5+FP6: Orgs, Proj, Exp  Community: 61 Orgs, 41 Proj, 435 Exp January 2008

© Brigitte Jörg June 4th, 2008 in Maribor, Slovenia Project Results Beyond the Project IST World is online: Registration is free Create your Competence Map / Collaboration Map Continuation is planned …