Presentation is loading. Please wait.

Presentation is loading. Please wait.

DATA SCIENCE SOLUTION FOR RESEARCH Because today science is complex, multidisciplinary and multi-stakeholders Twitter - Facebook - Linkedin -------------------------

Similar presentations

Presentation on theme: "DATA SCIENCE SOLUTION FOR RESEARCH Because today science is complex, multidisciplinary and multi-stakeholders Twitter - Facebook - Linkedin -------------------------"— Presentation transcript:

1 DATA SCIENCE SOLUTION FOR RESEARCH Because today science is complex, multidisciplinary and multi-stakeholders Twitter - Facebook - Linkedin Paris - San Francisco - Luxembourg © MyScienceWork

2 Innovative monitoring solution for research assessment
Who we are? Democratizing science Free discovery tool & promotion platform for scientists and research professionals Data oriented solution for digital asset management system and analytics Innovative monitoring solution for research assessment MSW aims to make science more open and accessible to all. This is reflected through 3 pillars: Popularization and scientific communication: we work with a network of scientific journalists to popularize and enhance scholarly outputs + Access to scientific publications through our open database of more than 70M publications and patents Polaris OS: which aims to provide research institutes with an expert solution in data management. Sirius: Polaris OS goes hand in hand with the data processing expertise we have developed over the past 8 years. © MyScienceWork

3 MyScienceWork Database Partners & Figures
Strong Partnerships 70 Million scientific publications 568+ Sources indexed from Open Institutional Repositories and Publishers 12 Million Patents © MyScienceWork

Let’s talk about how to build an open repository for scientific institutions to analyze and process data © MyScienceWork

5 The Story From a research institute need to the project launch
Institutional Repository Inter operability Data science Innovative technologies UX/UI Polaris OS is an open source solution developed in partnership with the famous French Institute for Demographic Studies INED. We started the project with them last year and at this time, they would like to get an institutional repository solution that can address this specific challenges: Provide their researchers with an easy solution for publication deposit: UX/UI, auto-completion of metadata and useful tools (bibliographic management, push of publication toward other repositories…) Get a solution where they can control the quality of data produced by their research activities and thus be able to: Increase the visibility of their research outputs in other databases (national or thematic repo…) and on the web Produce analytics graphs and reports for evaluation on solid basis Get a sustainable solution: Metadata model flexible... Based on open and innovative technologies (data oriented…) Interoperable with other framework and infrastructures Easy new development environment Being able to be as autonomous as possible to manage the solution (no IT expertise required). © MyScienceWork

6 The challenge Build a data oriented solution that guarantee sustainable evolution What issue do we address? With the large amount of data available on the Internet and the rise of the Semantic Web (Web 3.0) that has seen an incredible amount of work towards the standardization of metadata exchanges around the world, Institutions still face nowadays major challenges regarding the management of their data. Several standards have been developed concurrently, and it is difficult to design bridges between every single one. Moreover, it is often striking to see the discrepancies that exist between solutions to store data into complex databases, solutions to harvest and ingest new documents and solutions to visualize them in a user-friendly way. When we started to think about the project, we identified the main challenge as having a solution data oriented that can guarantee a sustainable evolution. © MyScienceWork

7 The solution: Data driven solution for digital asset management & analytics dashboard. In order to help institutions overcome technical and budgetary hurdles, we developed an open source repository designed to analyze and process data. This is a major technological breakthrough to improve data management, analyze research impact, and better user experience. Polaris OS integrates into the ecosystem of research institutes by recovering/exporting all types of data (scientific, technological, financial, and managerial) to and from their existing information systems to organize, refine, and enrich it. Polaris OS is a combination of a: Enterprise Resource Planning: Integrate external streams: it’s the ability of interfacing the solution with external sources coming from IT department from our clients. Aggregate data under a common model: it’s the ability to gather data disseminated on various databases. Roles & users management: defining fine-grained access to the platform on both backoffice & frontoffice Extract Transform Load: Using pipeline to handle stream of data. A pipeline is a succession of functions used to format, complete, filter, transform as well as validate an input. The pipelines are extremely flexible and give access to a wide range of transformations and completions. Complex functions can be designed using transducers (a combination of simple functions that output always the same result given the same input). Data can be retrieved from a wide range of protocols including : ODBC (connection to relational databases), SOAP-XML (used primarily with Java and Hibernate), REST (standard way to exchange data on the internet nowadays), SFTP ((secured) file transfer protocol). Content Management System: Templates describe the way a page looks (does it have a header, a footer, a horizontal or vertical menu, …) Menus can be changed (items can be added, moved, removed, …) A widget is a basic element that has a unique purpose: search, browse, showing an image, a text, and so on. Widgets are placed on a 12-column grid based on standard CSS framework used by web designers and integrators. © MyScienceWork

8 Key Success point: Innovative Technologies
Solution built on innovative & open technologies REACTIVE & RESPONSIVE Vue.js SEMANTIC SEARCH ENGINE Elasticsearch DATA SCIENCE NLP, ML, AI FRAMEWORK Node.js FILES MANAGMENT Minio I would like to emphasize on some few matters that make Polaris OS unique. First of all, we choose very carefully the technology necessary to build the solution with 2 ideas in mind: the technologies chosen must be open and innovative (data oriented). As an example, we choose: Vue.js: this is one of the most promising technology to develop reactive interface Elasticsearch: currently the best search engine for data Node.js: framework… © MyScienceWork © MyScienceWork

9 Key Success point: 4 Interfaces Dedicated user interfaces
RESEARCHERS Metadata autocompletion (pdf extraction, CrossRef…) Right balance b/w mandatory and optional info Thesaurus/Controlled vocabulary list Useful tools (Bib. Man. Tool, Extraction…) RESEARCH DIRECTOR Analytical dashboard (collaboration, research trends, financial insights…) Customized reports for evaluation and communication Bibliographic management tool CURATORS/LIBRARIANS Easy to use Publication Review Tool Embargo management automated Control of new vocabulary entries within a list… ADMIN. Well documented solution (GitHub) Low level of IT knowledge needed Easy set up of users and accounts Second point, Interfaces have to be designed for all type of users. In our case, we identified 4 types of users and thus we set up 4 customized interfaces. © MyScienceWork © MyScienceWork

10 Key Success point: Openness Open data and interoperability
In a technical environment that constantly evolved quickly, a solution need to be built on a framework that guarantee sustainable evolution. Polaris OS addresses the following shortcomings of open repositories: Infrastructure/Integration: facilitate integration between systems and databases Data reuse: structure, clean and enrich the different formats of data in order to optimize the management and reusability of it. New open technologies and standards: use it to build a sustainable solution Get a solution that allow the institution to get more visibility and that can be totally integrate into customer’s system. Incoming data flow Be able to integrate all kind of sources Metadata model flexible (no predefined data model) Outgoing data flow Open/push it Be able to integrate all kind of sources Flexible metadata model SEO compliant Google Google Scholar © MyScienceWork © MyScienceWork

11 Key Success point: Analytics
AI ML NLP DATA SCIENCE Last but not least, once your data are cleaned and structured, you are ready to use a very powerful « Analytics Dashboard » Productivity metrics & Impact metrics Collaboration & reference graphs Data visualization for strategic intelligence The system can also export report with customized/pre-defined cover. With your complete database we will be allow to create your own analytics reports, for example: Average number of funded studies by country, by type of funding, by type of collaborations Trends of research projects topics (by country, type of funding…) Fields of the studies (based on your “research categories”) © MyScienceWork

12 THANK YOU! YANN MAHE +33(0) © MyScienceWork

Download ppt "DATA SCIENCE SOLUTION FOR RESEARCH Because today science is complex, multidisciplinary and multi-stakeholders Twitter - Facebook - Linkedin -------------------------"

Similar presentations

Ads by Google