Algoval: Evaluation Server Past, Present and Future Simon Lucas Computer Science Dept Essex University 25 January, 2002
Architecture Evolution Version 1: Centralised evaluation of Java submissions (Spring 2000) Version 2: Distributed evaluation using Java RMI (Summer 2001) Version 3: Distributed evaluation using XML over HTTP (Spring 2002)
Competitions Post-Office Sponsored OCR Competition (Autumn 2000) IEEE Congress on Evolutionary Computation 2001 IEEE WCCI 2002 ICDAR 2003 Wide range of contests – OCR, Sequence Recognition, Object Recognition
Sample Results
More Details
Parameterised Algorithms Note that league table entries can include the parameters that were used to configure the algorithm This allows developers to observe the results of different parameter settings on the performance measures E.g.: problems.seqrec.SNTupleRecognizer?n=4&g ap=11?eps=0.01
Centralised System restricted submissions to be written in Java – for security reasons –Java programs can be run in within a highly restrictive security manager Does not scale well under heavy load Many researchers unwilling to convert their algorithm implementations to Java
Centralised II Can measure every aspect of an algorithms performance –Speed –Memory requirements (static, dynamic) All algorithms compete on a level playing field Very difficult for an algorithm to cheat
Distributed Researchers can test their algorithms against others without submitting their code Results on new datasets can be generated immediately for all clients that are connected to the evaluation server Results are generated by the same evaluation method. Hence meaningful comparisons can be made between different algorithms.
Distributed (RMI) Based on Java’s Remote Method Invocation (RMI) Works okay, but client programs still need to access a Java Virtual Machine BUT: the algorithms can now be implemented in any language However: there may still be some work converting the Java data structures to the native language
Distributed II Since most computation is done on the clients' machines, it scales well. Researchers can implement their algorithms in any language they choose - it just has to talk to the evaluation proxy on their machine. When submitting an algorithm it is also possible to specify URLs for the author and the algorithm Visitors to the web-site can view league tables then follow links to the algorithm and its implementer.
Distributed (RMI)
UML Sequence
Remote Participation Developers download a kit Interface their algorithm to the spec. Run a command-line batch file to invoke their algorithm on a specified problem
Features of RMI Handles Object Serialization Hence: problem specifications can easily include complex data structures Fragile! – changes to the Java classes may require developers to download a new developer kit Does not work well through firewalls HTTP Tunnelling can solve some problems, but has limitations (e.g. no callbacks)
XML Version While Java RMI is platform independent (any platform with a JVM), XML is language independent XML version is HTTP based No known problems with firewalls
XML Version Each client (algorithm under test) –parses XML objects (e.g. datasets) –sends back XML objects (e.g. pattern classifications) to the server
Pattern recognition servers Reside at particular URLs Can be trained on specified or supplied datasets Can respond to recognition requests
Example Request Recognize this word: Given the dictionary at: – And the OCR training set at: – Respond with your 10 best word hypotheses
Issues How general to make problem specs –Could set up separate problems for OCR and face recognition, or a single problem called ImageRecognition How does the software effort scale?
Software Scalability Suppose we have: –A algorithms implemented in L languages –D datasets –P problems –E algorithm evaluators How will our software effort scale with respect to these numbers?
Scalability (contd.) Consider server and clients More effort at the server can mean less effort for clients For example, language specific interfaces and wrappers can be defined This makes participation in a particular language much less effort This could be done on demand
Summary Independent, automatic algorithm evaluation Makes sound scientific and economic sense Existing system works but has some limitations Future XML-based system will overcome these Then need to get people using this Future contests will help Industry support will benefit both academic research and commercial exploitation