Evaluating state of the art in AI

Slides:

Advertisements

Similar presentations

ONYX RIP Version Technical Training General. Overview General Messaging and What’s New in X10 High Level Print and Cut & Profiling Overviews In Depth.

Advertisements

SUCCESSFUL SERVICE DESIGN Turning innovation into practice READ MORE... USER GUIDE Read the “User Guide” to find out more about this site START Ready to.

Workload Management Workpackage Massimo Sgaravatto INFN Padova.

5/5/2005Toni Räikkönen Internet based data collection from enterprises using XML questionnaires and XCola engine CoRD Meeting May 11th 2005.

Raffaele Di Fazio Connecting to the Clouds Cloud Brokers and OCCI.

ATLAS Outreach & Education News & Collaboration. News Reporting ATLAS progress and results to the world ATLAS Week - 11 Oct 2011S. Goldfarb - ATLAS Outreach.

Computing on the Cloud Jason Detchevery March 4 th 2009.

Codeigniter is an open source web application. It occupies a very small amount of space in the memory and is most useful for developers who aim to develop.

Middleware for FIs Apeego House 4B, Tardeo Rd. Mumbai Tel: Fax:

Team Members Ming-Chun Chang Lungisa Matshoba Steven Preston Supervisors Dr James Gain Dr Patrick Marais.

/16 Final Project Report By Facializer Team Final Project Report Eagle, Leo, Bessie, Five, Evan Dan, Kyle, Ben, Caleb.

1 RIC 2009 Symbolic Nuclear Analysis Package - SNAP version 1.0: Features and Applications Chester Gingrich RES/DSA/CDB 3/12/09.

Best 3 Software Development Languages. Hibernate Training Hibernate is a high-performance object-relational mapping tool and query service. Hibernate.

Advanced Higher Computing Science

Workload Management Workpackage

Top 8 Best Programming Languages To Learn

Big Data is a Big Deal!.

By: Raza Usmani SaaS, PaaS & TaaS By: Raza Usmani

What’s new in FUSION? Bob McGaughey

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

Jason Bury Dylan Drake Rush Corey Watt

SCEC Drupal Website Development Overview and Status

Testing Alfresco extensions (no, it’s not about jUnit)

Docker Birthday #3.

Overview – SOE PatchTT November 2015.

Kevin C. Chang University of Illinois, Urbana-Champaign

Line of Business Solutions in SharePoint Online

Overview – SOE PatchTT December 2013.

Unified Modeling Language

A Network Science Approach to Fake News Detection on Social Media

What is all the fuss over Containers?

Design and Implementation

Fun with Reporting Services Tools

Midway Milestone Presentation CS Fall 2017

SENIOR MANAGER - SOFTWARE TESTING PRACTICE

Fast Action Links extension A love letter to CiviCRM

Operating Systems and Systems Programming

Top Reasons to Choose Angular. Angular is well known for developing robust and adaptable Single Page Applications (SPA). The Application structure is.

B534 distributed computing

Analyzing EZproxy logs with ezPAARSE

Big Data - in Performance Engineering

Section 14.1 Section 14.2 Identify the technical needs of a Web server

Automated Testing and Integration with CI Tool

Data science and machine learning at scale, powered by Jupyter

Introduction to Apache

Module 01 ETICS Overview ETICS Online Tutorials

HPML Conference, Lyon, Sept 2018

What's New in eCognition 9

Closing Remarks.

Fast Conflict Detection for

Technical Capabilities

Collaborative Collections

SharePoint 2019 Overview and Use SPFx Extensions

ARCHITECTURE OVERVIEW

MADE IN USA KEY QUOTE TOOL.

From Development to Production: Optimizing for Continuous Delivery

Title of Project Joseph Hallahan Computer Systems Lab

CS130 Spring 2018 Hi Everyone, hope you are enjoying ShopTalk so far

Back end Development CS Programming Languages for Web Applications

From Development to Production: Optimizing for Continuous Delivery

Software Development Life Cycle Models

CAD DESK PRIMAVERA PRESENTATION.

Node.js Test Automation using Oracle Developer Cloud- Simplified

PyWBEM Python WBEM Client: Overview #2

What's New in eCognition 9

Risk Map Project By Qinghua Long Research Software Engineer

Back end Development CS Programming Languages for Web Applications

eCV Replacement Susan Hallmark

Neal Kurande, WinaGodwin Anyanwu Jr., Adam Chau

THE ASSISTIVE SYSTEM SHIFALI KUMAR BISHWO GURUNG JAMES CHOU

Presentation transcript:

Evaluating state of the art in AI Hi, I’m Deshraj from CloudCV. CloudCV is an open source organization that aims to simplify the process of AI research. I’ll be talking about our latest project, EvalAI. Evaluating state of the art in AI

and 30+ open source contributors Team This is our team. We are a group of graduate students, engineers and researchers supported by more than 30 open source contributors who are working with us on the project. and 30+ open source contributors

Benchmarking progress in AI is hard The problem: Benchmarking progress in AI is hard. Comparing a new technique with existing approaches is a critical component of research. In a field as fast-moving as AI, such reproducibility is especially important. However, it is becoming increasingly difficult to reproduce numbers from published papers, and to reliably compare new algorithms with existing approaches. Some common problems are: inconsistencies arising from differences in algorithm implementation, from using different splits of the same dataset, or not using the same evaluation metrics. Differences in algorithm implementation Using different splits of the same dataset Not using consistent evaluation metrics

To benchmark progress, hundreds of AI challenges have been proposed: In computer vision, we have challenges like imagenet and the MS COCO Challenges, in data science we have had the Netflix Challenge and the KDD cup, and so on. However, a centralized platform to both host and participate in such challenges is still lacking. change the size of squad dataset move VQA image above say imagenet instead of ILSVRC ?

What is EvalAI? https://evalai.cloudcv.org EvalAI is an open-source web platform that aims to evaluate the state of the art in AI. Its goal is to help AI researchers, and students to host, collaborate, and participate in AI challenges organized around the globe. https://evalai.cloudcv.org

Wait, isn’t this just like Kaggle? EvalAI Challenge Platform Generic Challenges Protocols and Phases Open Source Portable Hosting A question we’re often asked is: Doesn’t Kaggle already do this? The central difference is this: On Kaggle, an AI researcher needs to choose from Kaggle’s predefined set of evaluation schemes: often that is just not enough. On the other hand with EvalAI, /any/ kind of challenge can be hosted.

EvalAI Host Participant A challenge on EvalAI is designed around two entities: challenge hosts, and challenge participants. Now, I will be talking about some of the crucial features in EvalAI for both hosts and participants. EvalAI Host Participant

Link: https://github.com/Cloud-CV/EvalAI Host Open Source We have open sourced EvalAI so that the challenge hosts can deploy it on their own servers and run their challenges. Link: https://github.com/Cloud-CV/EvalAI

Host Open Source Custom Evaluation Protocols and Phases We have designed a versatile backend framework that can support user-defined evaluation metrics and various evaluation phases for a single challenge. Custom Evaluation Protocols and Phases

Host Hosting is simple Open Source Custom Evaluation Hosting a challenge is streamlined. One can create the challenge on EvalAI using the intuitive UI or using zip configuration file. Custom Evaluation Protocols and Phases Hosting Challenge is simple Hosting is simple

MapReduce Job Execution: Logical View Participant Fast Evaluation MapReduce Job Execution: Logical View For participants, EvalAI provides a swift and robust backend based on map-reduce framework that speed up evaluation on the fly. This makes it much faster for researchers to reproduce results from technical papers, and to perform reliable and accurate analyses. Output Map Shuffle Reduce

Hosted on Codalab Hosted on Codalab As its first challenge, EvalAI hosted this year’s VQA 2.0 challenge. Some background: Last year, the VQA 1.0 challenge was hosted on Codalab, and on average evaluation would take 410 seconds. Hosted on Codalab Hosted on Codalab

Hosted on Codalab Hosted on Codalab This year, the dataset for the VQA 2.0 challenge is twice as large. Despite this, we’ve found that our parallelized backend only takes a third of the time, which is a 6x speedup! Hosted on Codalab Hosted on Codalab

6x speedup Hosted on Codalab Hosted on EvalAI This year, the dataset for the VQA 2.0 challenge is twice as large. Despite this, we’ve found that our parallelized backend only takes a third of the time, which is a 6x speedup! Hosted on Codalab Hosted on EvalAI

Participant Fast Evaluation Centralized leaderboards On the front-end, we display centralized leaderboards that update in real time to help participants and organizers track progress. Centralized leaderboards VQA 2.0 Public Leaderboard

Centralized leaderboards Participant Fast Evaluation Finally, we have designed a clean and intuitive user experience. To top it all, EvalAI is completely open-source and so new features can easily be added by the community. Centralized leaderboards Realtime updates

Progress 30+ Open Source Contributors Overall, we are building a centralized platform for all the AI challenges whether they are hosted on EvalAI or fork of EvalAI or as any other web application. 30+ Open Source Contributors 3 times with Google Summer of Code 1100+ Github Issues & PR’s

Upcoming features Challenge Creation on EvalAI OR We are working on some really cool features as of now and I would like give an overview of some of those features. The first one is creating challenge on EvalAI. One can create challenge either using zip configuration file or using the intuitive UI of EvalAI. OR Using intuitive UI (releasing soon) Using zip configuration file

Upcoming features Challenge X Central leaderboard for challenges hosted as third party applications We are also working on adding support for challenges hosted on third party web applications where they can be linked with EvalAI using a simple web hook service that will publish the results on EvalAI’s leaderboard on the fly. This way, the challenge hosts can restrict the access to the test annotations to themselves and still their challenge can be featured on EvalAI along with all the results. Challenge X HTTP POST (for each submission) Third party server EvalAI

Submission Successful Upcoming features EvalAI Python APIs Another important feature is EvalAI Python APIs. Using this, the participants can submit the results to EvalAI through python console. >> import evalai >> evalai.submit(challenge=“vqa”, data={“foo”: “bar”}) Submission Successful Participant Terminal

Summarize Centralized platform for all AI Challenges And many more To summarize, our ultimate goal is to build a centralized platform to host, participate and collaborate in AI challenges organized around the globe and we hope to help in benchmarking progress in AI. And many more

Interested in hosting a Challenge? If you are interested in hosting a challenge on EvalAI, please reach out to us at team@cloudcv.org Reach out to us at team@cloudcv.org

Thank you! evalai.cloudcv.org deshraj@cloudcv.org Thats all from my side. Thanks for listening. :) Thank you! evalai.cloudcv.org deshraj@cloudcv.org