DART Developing Toolkits for e-Research Dr Jeff McDonell, DART Project Director July 2006
2 Acknowledgements DART is a proof-of-concept project funded by the Department of Education, Science and Training (DEST) to support collaborative research in Australia through the program Managed Environment for Research Repository Infrastructure (MERRI) What is DART? D ataset A cquisition / Accessibility / Annotation e- R esearch T echnology
3 The Australian government is funding several e-Research projects, like DART, to… enable publicly funded research to be publicly available support research collaboration -- this means sharing -- to share, then data and information needs to be stored, retained, defined & secured -- then many collaborators can access it establish common standardised software / middleware applications that are adaptable to many research capabilities help develop world class Australian research Why DART?
4 What is DART trying to achieve? To develop software tools to handle the data and information management requirements of the complete research lifecycle To collect and manage large datasets, associated with instruments, such as sensor networks, synchrotrons, telescopes, etc. To support collaborative research and annotation needs To deal with intellectual property, privacy and security issues To create customised portals for research demonstrators To handle research publication, discovery and access or to put it another way……
5 DART - version 1
6 DART - version 2
7 DART - version 3
8 What should researchers expect from DART? A reliable place to store data – not just using the lab server or home PC or putting it onto a DVD somewhere! Useful tools for researchers to use in their everyday research Software tools just focused on management of data & information Potentially a customised portal applicable to their research field A standard and secure method of storing, accessing, analysing and annotating research results Easier to collaborate, share information and publish results
9 DART design criteria Identify best-of-breed solutions in each area Use Open Standards and choose Open Source software if possible Leverage existing work and expertise - don’t reinvent the wheel! Use real world research environments to test proof-of-concept Identify common frameworks for: - Security (with help from MAMS & e-Security Framework projects) - network transport - integration across all 27 DART work packages
10 DART demonstrators The three research areas chosen as DART demonstrators are: X-Ray Crystallography Climatology Digital History The use of demonstrators is designed to show the value of an end-to-end lifecycle approach and to test proof-of-concept outcomes from DART Monash, Queensland & James Cook University researchers are involved in all three demonstrators
11 The DART demonstrator tasks are to: Engage with suitable researchers at each partner university for each of the three selected research areas Define the research activities applicable to each area Embed Information Management specialists into research teams Construct a custom designed prototype DART portal, incorporating software applications specific to each research discipline Progressively refine the model as the DART project adds new features and services How to build demonstrators?
12 DART logistics DEST funding of A$3.235 million involving: 3 partners:- Monash University (host) in Melbourne - MU - University of Queensland in Brisbane - UQ - James Cook University in Townsville - JCU 5 technical areas of focus within the DART work packages (WPs) 7 Chief Investigators 18 month project, expected to be completed by mid next year 27 Separate DART work packages 40+ project team members!!
13 DART chief investigators Andrew Treloar Asad Khan David Abramson Ann Monotti (Project Architect) Jane Hunter Xiaofang Zhou Ian Atkinson
14
15 DART work packages The 27 work packages (WPs) cover five technical areas: Data Collection and Monitoring Storage and Interoperability Content and Rights Annotation and Assessment Discovery and Access
16 Data Collection and Monitoring Developing front-end research processes 1.Connect instruments and sensors effectively to the network - JCU 2.Connect instruments to repositories with Storage Resource Broker (SRB) via Common Instrument Middleware Architecture/CIMA - JCU 3.Ensure data is of sufficient quality to warrant curation - JCU 4.Online remote access to working instruments and sensors - UQ 5.Improve intelligence of the storage framework - JCU
17 Click here ▬ ►
18 Storage and Interoperability (part 1) Developing middleware tools 1.Facilitate distributed data management with Fedora - UQ 2.Improve interoperability between SRB and Fedora - MU 3.Support richer metadata to enhance discovery - UQ 5.Support data replication systems, such as SRB, Globus & GFarm - MU 6.Allow simulation data to be retrieved or dynamically regenerated - MU
19 Storage and Interoperability (part 2) Developing secure data transfer and storage 4.Secure service for transferring data from instruments and sensors to repositories via the Grid - MU 7.Develop pre-processing system for secondary storage - MU 8.Pilot long distance high speed and secure data transfers between repositories - MU 9.Scope and pilot storage infrastructure requirements - MU
Manage Data User requests data acquisition 2. Acquiring CD or DVD static data 3. Acquiring dynamic instrument or sensor data 4. Acquiring static or dynamic SAN data 5. Storing raw data in Primary Storage 6. Pre-processing of the raw data 7. Storing pre- processed data in Secondary Storage
21 Potential use of GridSphere as the front-end DART portal
22 Turning Data into Information Protein crystallography raw data 3D atomic structure of protein after processing
23 Data collected at JCU, MU and UQ is stored in Primary Datastores Processed data is securely transferred to Secondary Datastores Secure Data Replication via the Network to DART Partners Monash University James Cook University University of Queensland QCIF / QPSF Grid Primary Secondary Sensor Network AARNet / GrangeNet
24 Content and Rights Collecting data sources into institutional repositories 1.Move data from personal repositories into trusted alternatives - MU 2.Reduce barriers to content acquisition by rights assignment for non- Science researchers (Creative Commons) - UQ 3.As above – for Science researchers (Science Commons) - UQ 4.Improve management practices in research communities - MU 5.Assist researchers to deposit datasets and other digital objects into institutional repositories - MU 6.Clarify legal issues around intellectual property (IP), information security and privacy – MU
25 Annotation and Assessment Including collaboration tools for research 1.Allow researchers to annotate each other’s work - UQ 2.Improve annotation & deposit rates by allowing end user control - UQ 3.Help annotation services contribute to the life and productivity of research communities - UQ 4.Foster wiki-based collaborative work practices in research teams - JCU
26 Click here ▬ ►
27 Discovery and Access Searching, browsing and discovering resources 1.Improve repository deposit rates, sharing & reuse by user access - MU 2.Improve repository deposit rates, sharing & reuse by improving discoverability - MU 3.Reduce effort for creating metadata schemas & improve interoperability - UQ
28 Discovery examples
29 DART Demonstrators X-Ray Crystallography Focussing on diffractometers and protein crystallography Using CIMA instrument interfaces James Whisstock (MU), Jenny Martin (UQ) and Ian Atkinson (JCU) are the major researchers involved Climatology Focussing on ocean and atmospheric data – e.g. merging of data around Heron Island (in the Great Barrier Reef) to predict weather Amanda Lynch (MU) and Stuart Kininmonth (AIMS) early adopters
30 DART 3 rd Demonstrator Digital History Three key projects with a Humanities and Social Sciences focus: Gugu Badhun Digital History project - JCU Women on Farms project - MU Western Cape Community Agreement project - UQ Dealing with video storage & management, annotations, survey data, authorisation and security, community involvement, etc
31 DART Deliverables
32 What will DART deliver? By mid 2007, DART aims to provide: Practical and workable software tools for researchers to use for their daily data and information management requirements Working proof-of-concept demonstrations, including customised research portals Strong feedback from researchers in all the demonstrators Reports recommending best practice in several areas Assessment of the value of the DART integrated lifecycle approach A clear understanding of how to turn proof-of-concept into robust production-ready systems Note DART WILL NOT be delivering production services!
33 How far has DART progressed? Fast startup: Started Dec 2005, DART now has 40+ staff and researchers on board Collaborative Project: 27 WPs in 3 partner universities: Monash, Queensland, James Cook Effectively managed: 7 Chief Investigators, strong Project Office & Board of Management Grounded in research practice: Building three demonstrators with research teams from the 3 partners Common standards used to develop generic software tools: Fedora, GridSphere, SRB, Kepler, XACML, Shibboleth, Annotea, CIMA, plone
34 Key DART Achievements Strong progress in data capture and instrument integration Investigating storage and replication of very large datasets across diverse networks (up to Petabytes) Have placed Information Management staff into key research teams, addressing their data and information management requirements Developing annotation software for 3-D models, video and audio IP and privacy are being reviewed by a Law Faculty Investigating Creative Commons & Science Commons licensing Working to utilise Shibboleth, PKI and Grid security standards Developing search tools, metadata schema registry, wiki tools, etc
35 DARTs in use by the ARCHER? ARCHER is a new DEST funded project for 2007 that will take the proof-of-concept outcomes of DART and turn them into production- ready ARCHER software tools These tools will be developed into modular middleware web services, customised through dedicated task forces to suit the needs of the: nine NCRIS priority research capabilities, plus two specialised task forces for the Humanities & Social Sciences
36 Useful DART tools for ARCHER (part 1) Compute/storage: Interface to instruments & sensors, CIMA video and Kepler workflows Interface to distributed computing (HPC / Grid) Interface between the hardware and DART software tools Secure access to large scale data storage & repositories (SRB, Fedora) Data Quality: Pre-analysis of data to automatically detect faulty or degraded data Seamless replication of data for backup and disaster recovery Support for multiple data replication systems (SRB / GFarm / Globus) Transfer of large datasets between systems efficiently & fault tolerantly
37 Useful DART tools for ARCHER (part 2) Software tools: Manage metadata, including defining, storing, searching, etc. Manage authentication and authorisation, plus data security Deal with Science/Creative Commons licensing, IP and privacy issues Provide secure annotations for documents, datasets, video, audio, etc. Usability : GridSphere portal to tie all the software tools together Migrate data from personal to institutional storage Support for legacy applications Collaborate using research-centric wiki / weblog communication tools
38 Acknowledgements Without the hard work of all these people, DART would just not happen! UQ JCU MU
39 DART Contacts Web:dart.edu.audart.edu.au Project Phone Project Phone Questions?