Java Analysis Studio & Object Oriented Data Analysis (in Java) KEK 25 th May 2000 Tony Johnson - SLAC
Contents Overview of Java Why Java for Data Analysis Java Analysis Studio Recently added features Using Java for Reconstruction Linear Collider Simulation Framework Is Java fast enough for Data Analysis? HEP-wide java libraries Conclusions Demo
History of Java 1991 James Gosling at Sun creates Java language (née Oak) Targeted at consumer electronics - cable top boxes, VCR, TV etc. Goal was reliability not speed 1994 Hot Java Web browser written (in Java) Supports Applets - Downloadable programs that run inside web browser Java licensed by Netscape, Oracle, Microsoft many others Huge hype surrounding “Web Programming language” 1997 Java 1.1 released with many standard libraries Sun’s mantra becomes “Write Once Run Anywhere” Enthusiastically supported by all major hardware and many software vendors Microsoft begins to have second thoughts 1998 Java 2 released, even more standard libraries Now truly general purpose language Sun (and DOJ) sue Microsoft
Java Architecture More than just a Web Tool Java is a fully functional,, language Java is a fully functional, platform independent, object-oriented language Powerful set of libraries, including GUI library. Powerful set of machine independent libraries, including GUI library. Totally Buzzword Compliant Simple, Object Orientated, Distributed, Dynamic, Robust, Secure, Architecture Neural, Portable, High Performance, Multithreaded. Interpreted? Java Source code Java “Bytecodes” Compiler MacUnixPC Bytecode Interpreter JIT Compiler Machine Code Compiled + Interpreted. Dynamic Optimization may make Java faster than statically compiled languages (in principle).
Java Features Simple But not trivial…you need to read a book Syntax very close to C++ No backwards compatibility issues Some features of C++ which add undue complexity dropped. Good stepping stone to (or from) C++ Clean and Efficient Object-Oriented Language Language features guide programmer toward reliable programming habits Robust Extensive Compile-Time checking of code Second level of run-time checking of code Memory management done by system, not by programmer No pointers to mess up (Java uses references rather than pointers) Chances of program running as designed without the need for time- consuming debugging is greatly increased.
Java Features (continued) Highly Portable Java works today on NT, Win95/98, Unix (including Linux), Mac, VMS Personal Java - Windows CE, Palm Pilot Programs written in Java are very portable Move to another platform and it just works Care needed with AWT GUI components (obsolete) and web browsers Lifetime of HEP experiments > OS lifetime. Lifetime of Java > Lifetime of HEP experiment?? Encourages true modularity Build entire framework for HEP experiment in Java Abstract away underlying systems (batch system, IO system etc.)
Java Features (continued) Distributed Built in support for Internet protocols, URL’s, HTTP, Remote Method Invocation, Corba, Database access etc. Secure Bytecode “verifier”, padded cell (c.f. Web Browser) Multithreaded Language has direct support for multithreading Dynamic Libraries can change without recompiling programs that use them Can dynamically load and unload code during program execution Can move objects across the network (agents), or store them in databases and retrieve them later.
Java Libraries and API’s Standard Libraries and API’s 2D + 3D graphics + GUI (Swing) + Imaging + Printing Database connectivity (JDBC) + ODMG Collections, IO (Serialization), Data Compression Networking, Sockets, SSL, Corba, RMI Java Beans (components), Help Multimedia, Sound, Speech Security, Code Signing, Cryptography Math, Arbitrary Precision Math Shared Data (Collaborative Applications) Huge “Community-Ware” software archive IBM alone has hundreds of Java resources on its Alphaworks site
Java Tools Popularity of Java = many tools And they are cheap (or even free) Development Environments (IDE’s) Editor, Compiler, Debugger, WYSIWYG GUI designer, Source control Automatic Documentation generators Memory and CPU Optimizers Since debugging time is minimal you might actually have time to use them Object Modelers Many commercial sets of components
Java Limitations? No operator overloading Annoying for complex numbers, matrices, 3/4-vectors Perhaps more often abused than sensibly used Lightweight Objects (value semantics) may overcome this Bugs sometimes slow to be fixed Printing, Imaging existed for >1 year Perhaps “Community Source License” will help Little control over Memory Allocation Integration with C++ could be better Standardization lacking Sun had promised to submit Java to ISO for standardization, but has so far failed to deliver
Why Java for HEP Computing? Previous generation of experiments used Fortran + Data Management System (== Jazelle, Zebra, BOS) Solves Three Problems Ability to Represent Complex Data Structures Persistence (i.e. read in and write out complex structures) Run time access to named data in structures (for analysis) Now time has marched on and modern experiments use C++ Represent Complex Data Persistence Run time access to data Still need to build (or buy and deploy) data management system (e.g. Root, Objectivity) Java Represent Complex Data Persistence (serialization) Run time access to data (reflection) support built-in to language
Where would HEP use Java? GUI systems online + control (not really any alternative) Event Display Reconstruction+Simulation packages? Data Analysis tasks Offline Online Event Generators
Java Analysis Studio Experiment independent analysis tools for High Energy Physics data
Introduction to JAS JAS starts from experience with SLD interactive data analysis IDA (Toby Burnett) + SLD extensions Integrates ideas from Reason, Hippodraw, LHC++, Histoscope, … Exploit advantages of Java Cross platform, dynamic loading, GUI, many standard API’s – networking, HTML, etc. Aim is to solve real life physicist problems Want to get input from as many people as possible. System is flexible enough to change.
JAS Overview Modular Java Toolkit for Analysis of HEP data Data Format Independent Experiment Independent Supports arbitrarily complex analysis modules written in Java Rich Graphical User Interface (GUI) with: Data Explorer Flexible Histogram + Scatterplot display Histogram manipulation+fitting Built-in Editor/Compiler (for writing analysis modules) Extensible via plugins User extensible via Object Orientated API's Written entirely in Java so will run on any platform with a Java VM (JDK 1.1 or better) Support: Windows 95/98/NT/ Linux + Solaris Works on: DEC + SGI + Mac
JAS Components JASHist (Plot Bean) Fitting Framework FunctionsFitters Analysis Framework GUI Framework Plugin Histogram Accumulation 3-4 Vector Utilities Data Interface Histo/Plot Adaptor Network Adapter Particle Properties Jet Finder PAWSQLstdHEP
Data Access Classes Analyze local or remote data User interface independent of Data Location Does not assume fast network (works well at 28.8 bps] Analysis code moves (transparently) to data Desktop Client DIM Local Data Network Data Server DIM Remote Data
Remote Data Analysis GUI Data Analysis Engine Users Java Code Experiment Interface Java Compiler + Debugger Experiment Extensions (Event Display) TCP/IP Network Padded Cell C++ Code Data Zebra Jazelle Paw Root Objectivity
Distributed Data Analysis Network Data Server Desktop Client Network Data Controller Distributed Data Data Server DIM Data Server DIM Data Server DIM Data Server DIM Data Server DIM Data Server DIM
Plot Display Package 1-d/2-d Histogram/ScatterPlot Display multiple axes, direct user interaction, overlays, fitting
Java Analysis Studio GUI
Example Analysis Code (Track Recon)
Demo
New Features Modular Plot Component Can be used in other applications GUI, servlets Model-view-controller design Supports many display styles, 1d, 2d, scatterplot, fitting, slices, user interaction, XML for data interchange with other apps. jEdit Editor Full featured program editor Syntax highlighting, indenting, bracket matching Expect to be able to integrate advanced features Debugging, auto-completion
New Features – HTML support
New Features – WIRED Plugin
New Features – AIDA support AIDA is attempt to standardize HEP histogram interface Abstract interface C++ and Java supported Multiple implementations JAS now supports AIDA interface Now possible to create JAS histograms from C++ C++ Program AIDA JNI Java AIDA JAS
New Features – G4 interface
Future Features - 3D Support
Usage Babar using for Online Monitoring Using Online Monitoring API HTML Pages with embedded plots Custom Overlays US Linear Collider Studies Have an entire recon+analysis package written in Java Using JAS as analysis interface Making use of remote data access using repository at University of Pennsylvania CLEO Using plot bean for online displays Other smaller scale users All giving very valuable feedback Helping to produce more reliable solution
OpenSource – Anyone can Contribute! All source code now stored in CVS Use any CVS client for anonymous (read-only) access We recommend jCVS (pure Java CVS client) Source code all web browsable Implemented using jCVS servlet Write access can be given to interested developers Intend to put entire code under LGPL Platform independent build system Uses jmk - pure java make-like tool To build entire system on any platform with CVS and Java cvs co jas cd jas java -jar jmk.jar
Documentation LCD Tutorial exists Nice step by step tutorial for beginners Examples are all based on LCD but can be used by anyone Starts from very beginning Slowly adding information to Users Guide Still nowhere near complete How To being created to cover specific topics Servlets How To HTML How To XML How To Online API How To Working on Fitting How To JavaDoc generated API documentation available Documentation remains weak link We are aware of this and are working on producing more documentation Also need more design specs/internals documentation to make open source model more effective
Java for Reconstruction/Simulation Dual Goals: Contribute to Linear Collider Detector/Physics Studies Experiment with using Java for full offline reconstruction and analysis package
LC Detector studies in US Goals: Detailed Study of physics processes in a variety of possible LC Detectors. Reference Small and Large detectors Full simulation with GISMO Switch to Geant4, when ready Analysis using Paw C++ & Root Java & JAS Software Requirements Flexibly handle different detector geometries and technologies Rapid development of variety of reconstruction and analysis algorithms
Java package hep.lcd Reconstruction Processors Track finder+fitter written Interface to Fortran fitter in progress Several clustering algorithms Parameterized MC Processors Can read generator input or Gismo output Track and Cluster smearing Analysis Utilities Event Shape + Thrust utilities Jet finder [Jade, Durham] Histograming Event Displays Simple 2D Event display Full 3D WIRED event display Framework Driver framework interactively control calling of processors debugging/histograming Parameter (Constant) access driven by detector geometry MC event input (StdHEP format) IO system based on Java IO random access files Can be run inside JAS or standalone
Event Display
Java for Reconstruction/Simulation Looks very promising Have been able to develop framework very fast People have no problem learning and using it Performance looks good Future Java interface to Geant4?
Reconstruction Performance
Java Performance Summary Is Java Fast Enough for Physics Analysis? Yes Time gained in development well worth runtime overhead Good design has more effect on final speed than language Many tools available to help optimize code Java will continue to get faster More information - ACM 1999 Java Grande Conference THE JAVA PERFORMANCE REPORT
HEP-wide Java libraries FreeHep java library Extract common code from JAS+WIRED Add other utilities (not highly hep specific) Encapsulated Postscript generator JACO – Java to C++ interface Encourage others to look at what is there We welcome contributions from others HEP library – more physics specific 3 and 4 vectors, jet finders, MC generators Histograming package (AIDA)
HEP-wide Java libraries FreeHEP library already has useful stuff in it, HEP library just getting started Both libraries in CVS Read access available to anyone Write access to qualified developers Web Site Contributions welcome
Conclusions Java is a very useful language+environment that could be very beneficial to HEP in many areas. Could Java be used for entire offline for major experiment? Technically - Yes Will Java Survive long enough? Need ISO standard Need to see how market forces play out. Programming in Java is Fun!! Spend time architecting an elegant solution to problem to be solved Not Reinventing the wheel, Debugging someone else’s problem Porting to different platforms
More Information… Java Analysis Studio FreeHEP library US Linear Collider Reconstruction WIRED AIDA