DART Project Some Issues around User Requirements Tom Denison, Stefanie Kethers, Nicholas McPhee, Natalie Pang Monash University
2 Outline General process and method Current Sources Requirements: General Thoughts Preliminary findings Digital History (Women on Farms Gathering) Crystallography and Climate Research Preliminary Conclusions Next steps
3 Process and Method Interviews Focus groups Goal - Finding out about; Current processes Weaknesses / strengths in the process What should change (and how), what should not? Feasibility of different scenarios (e.g., distributed repositories, central repository) Data collection: audio recordings, process diagrams
4 Process Diagrams Actor Information Flow Quality Information Flow Information Flow Content Information Flow Medium
5 Sample Interview Run-through General - What is your field? How long have you been working in this field? What sort of projects do you work on? Data sets - What kind of data sets do you work on (data types, size, source, etc.)? How do you collect your data? Does the data need to be processed before it can be used? Data management - Do you manage your own research data? How you store your data? Have you encountered any problems in managing your data (e.g., security, version control, back-ups, providing adequate descriptions and consistent locations)? Ethics - Do you need ethics clearance for your work? If so, are any restrictions placed on the (re)use of the data collected?
6 Sample Interview Run-through (cont’d) Collaborative Research - Do you work with researchers from different disciplines? Do your collaborators require access to your data sets (or vice versa)? Scenarios - Please describe 1-2 scenarios where you worked on data sets. Where do you see room for improvement? What would need to happen to enable these improvements? Central repositories - Are you aware of any data repositories that you could currently contribute to, and do you use them? Where do you see problems with central data repositories in general, and in your field? Do you see any advantages / disadvantages to data repositories in relation to your field?
7 Current Sources Crystallography – Nick McPhee, Robyn Polan Climate Research – Nick McPhee Digital History – Natalie Pang Medicine: 1 interview with a postdoc researcher working in allergies 1 meeting with Neil Clarke (Monash e-research) General: Several articles, including Microsoft Research report: “Towards 2020 Science” The Australian, 26/7/06: “Dealing with the Data Deluge” More references provided on last slide
8 Requirements: General Thoughts Access Control Control over data should rest with researchers Perceived security important Data access for researchers only vs public access Meta data access for researchers only vs public access Usability Discipline-specific language / interface Easy-to-use interface (can mean different things to different people!) Alignment with Work Processes Interoperability with existing software systems Interoperability with existing metadata schemata
9 Requirements: Preliminary Findings (Digital History: Women on Farms Gathering Heritage Collection) Based on earlier work using action research methodology Earlier work has established a portal showcasing the heritage collection of the community (Women on Farms Gathering) Joint effort between Museum Victoria, the Women on Farms Gathering community, and Monash University (Faculties of Arts and IT) Researchers: Curators from Museum Victoria Researchers from the community and University of Otago on Rural Studies, Farming, and Geography Faculty of Arts (Monash) on History and Women Studies Faculty of IT (Monash) on Community Informatics and Collaborative Design Principles/Processes
10 Requirements: Preliminary Findings (Digital History: Women on Farms Gathering Heritage Collection) Portal reflects key work processes between Museum and the community (showcasing emerging practices in museum-ology) Datasets: symbolic objects as collection item, and stories/memories around these objects Typical process: Museum & researchers put up data collective agreement (collaborative tools are essential) negotiation & discourses Museum & researchers export data back to own research agenda Findings arising from previous work / recommendations for future work: Ownership in community: creation of content needs to rest with researchers (collaborative work) Annotation and metadata significant in organising content (stories, geography and memories) to provide meaning to social sciences (history and place) Qualitative datasets are rich by nature and need to be organised and presented in their contexts Need to support multiple media types (oral and visual history)
11 Requirements: Preliminary Findings (Digital History: Women on Farms Gathering Heritage Collection) Datasets and their metadata, annotations, and descriptive need to be interoperable for use by Museum’s cataloguing system Usability and information organisation Disparities between researchers with broadband and slow connections need to be managed Digital history is collected and presented through the website: openness in receiving and contributing stories Support for conflict resolution is required in the case of (stories’) disputes
12 Preliminary Findings: Crystallography and Climate Research Trust/Privacy of Data Researchers want/need a trustworthy system (linked to control) Researchers work in competitive fields; keeping data private is essential Data validation is vital. In Crystallography accurate data “to the bit” is a MUST Security Rights and Access Control - Public and private access, project/group access, individual access Control Researchers desire strong control over their data Too much automation may lead to mistrust of results Ability to manually undertake some automated tasks is useful
13 Preliminary Findings: Crystallography and Climate Research Archiving and Curation Medical data such as Crystallography data sets must be archived for 7 years, and in Climate Research there is a need for long-term storage for trend analysis Data Ownership - How will this be organised through the portal? Through access rights/roles? Who owns the data in the short and long term? Metadata Good metadata is needed for the Demonstrators Schemata for Crystallography and Climate Research are still being determined Support for specific file formats e.g. Crystallography CIF files, Climate Research MM5 (v3) files. Naming conventions for data sets. How will this be organised? Other Issues Links between data sets and publications, e.g. through ARROW and VITAL/VALET. How can we facilitate this? Is it feasible?
14 Preliminary Findings: Crystallography and Climate Research Time/Effort (Workflow) The portal should not complicate or increase the amount of time The portal should not dramatically alter their workflow unless… Benefits Researchers are interested in added value “What can the portal offer me that I cannot do already? Can it make my life easier (save me time)?” Portal needs to be “sold” to researchers For Crystallography the benefits are clear; a move from a manual system to a digital system. New abilities and functions (i.e. CIMA) What about Climate Research? Digital system already exists. Projects run on the Sun Grid, and the data is stored there as well. What can a portal offer them? Self-configuring.ksh scripts? An interface to display model output? Job scheduling? -> KEPLER?
15 Preliminary Conclusions Both the literature and everything we have done to date points to the fact that researchers are overwhelmingly concerned with their own, or their team's, work and that the effective management of existing data and work flows is a dramatically more significant driver than notions of potential collaborative research. While this may change over time, it would suggest that at the moment, the design of any portal should be kept as simple as possible and focus on providing access to tools of immediate benefit to current research priorities.
16 Next Steps Availability Lack of availability of experts; an important issue for developing user requirements Interviews at JCU: Demonstrator projects + additional researchers Monash Crystallography focus group Monash Climate Research interviews Monash Digital History interviews Monash Research Admin / Management interviews: Ethics office / Ethics committee
17 References Lost in a Sea of Science Data: Towards 2020 Science: eportA4.pdf eportA4.pdf The Draft Report of the American Council of Learned Societies’ Commission on Cyberinfrastructures for Humanities and Social Sciences. ICPSR Guide to Social Science Data Preparation and Archiving. More references in our work-in-progress bibliography at packages/cr/cr1/dart_lit_list_v4.doc/view packages/cr/cr1/dart_lit_list_v4.doc/view