Changing Cultures, Building Standards Linda Beebe Senior Director, PsycINFO
ICSTI Annual Meeting 2012 About 12 years ago Supplemental Materials emerged with a bang!
And authors and publishers did─ ◦ Text (extended methodology sections, bibliographies, survey results, derivations...) ◦ Tables and figures ◦ Multimedia ◦ Gene sequences, protein structures, chemical compounds, structures, 3-D images ◦ Computer programs—algorithms, code, executables ◦ Datasets—and raw research data ICSTI Annual Meeting 2012
No standards Very different cultures and practices from one discipline to another Inconsistent identifiers Poor metadata Lack of discovery tools Abuse of readers and reviewers ICSTI Annual Meeting 2012
Business Policies & Practices cover selecting, editing, hosting, assuring discoverability, referencing, packaging, maintaining links, providing context, and preserving. Technical Recommendations emphasize metadata, persistent identifiers, preservation, packaging and exchange. Bi-directional linking using DOIs, emphasis on persistent linking reliability. Flexibility and simplicity to support either a simple approach or the most detailed and granular metadata. Clear definitions of metadata elements. Attention to preservation and migration, including saving of objects along the migration chain. ICSTI Annual Meeting 2012 Nearing Final Publication
Following 2 slides from Howard Ratner good reminder of the growth Borrowed with permission from his talk December 2011 STM Innovations meeting Ideas generated by the STM Future Lab Committee. ICSTI Annual Meeting 2012
NEW ACCESS TO CONTENT * API-platforms for third party developers available at Elsevier, Springer, NPG, IEEE (search) Getting ready for launch: IoPP, T&F, CABI Many more expected to follow Curiosity driven R&D GRANULARITY OF CONTENT SEMANTICS LET THE OUTSIDE WORLD IN OUR CONTENT YOUR WAY CREATE CROSS- PUBLISHER STANDARDS Common metadata Full text formats HTML5 API PLATFORMS XHTML THIRD PARTY APPS App store LINKED DATA LINKED OPEN DATA RDF MOBILE PRODUCTIVITY MULTI-DEVICE PRODUCTIVITY Seamlessly linked platforms M-commerce MOBILE Transmedia items Voice Activation ICSTI Annual Meeting 2012
RESEARCH DATA DATA OBJECTS ARE FIRST CLASS RESEARCH OBJECTS MAKE DATA INTERACTIVE share the actual workflow of the researcher? graphics represent data sets; how to open them up? ACTIONABLE DATA DATA CREATION What formats do users want ? COMMON STANDARDS AUTHORING TOOLS how to treat supplemental files to journals? Guidelines for: -Reuse and sharing -Incentives and barriers -Editorial policies Discoverability of data BIG DATA Deep Linking REPOSITORIE S DATACITE Bibliographic tools User behaviour Mendeley CiteSeer ColWiz ReadCUBE Data journal ICSTI Annual Meeting 2012
“Hard sciences” such as Physics and Chemistry—long history of handling supplemental material and requiring access to data. Disciplines that study human subjects (psychology, sociology, health sciences)—far less likely to have such practices. There is growing interest in standards and other support for data deposits and access. ICSTI Annual Meeting 2012
Study of Matter AAAS—must deposit in approved repository. ACS—must submit data and deposit. AGU—must deposit data in approved repository ASPB—must submit to journal. Study of Humans APA—to date only expected to supply for verification. APS—no requirements ASA—no requirements posted AAA—no requirements posted ICSTI Annual Meeting 2012
In the “softer” sciences, increased quantities of data are scattered on laptops, in file drawers, on the web—all in danger of being lost, even thrown away. Question: how do we preserve these data and make them available for further research?
ICSTI Annual Meeting 2012 What constitutes data? What must the author do to it? Who will maintain it? What about confidentiality? How does one cite data?... And many more
Websters—factual information (as measurements or statistics) used as a basis for reasoning,discussion, or calculation. Chaim Zins (2006)—statistical observations and other recordings or collections of evidence NSF—any information that can be stored in digital form and accessed electronically, including, but not limited to, numeric data, text, publications, sensor data streams, video, audio, algorithms, software, models and simulations, images, etc. Altman & King—systematic compilation of measurements for machine reading; must be systematically organized and described ICSTI Annual Meeting 2012
Report on Integration of Data and Publications, October 17, Susan Reilly, Wouter Schallier, Sabine Schrimpf, Eefke Smit, and Max Wilkinson. Retrieved 10/11/2012 from / 2011_12_5_ODE_Report_On_Integration_of_Data_and_Publications.pdfhttp://
Replication standard—sufficient information to enable a third party to to replicate with no additional information from the author (King 1995). So authors must— Provide clear metadata. Code consistently and list coding instructions. Explain how data were used. Provide all raw data. Organize data in a way that can be used by others. Making data available requires a different workflow and more work—but makes for a better scientist. ICSTI Annual Meeting 2012
Natural sciences, many options such as Crystallography, ChemStar, ChemSpider, PubChem, PANGAEA. Life Sciences, Protein DataBank now one entity with data from former banks in US, Europe, and Japan. Also Dryad, National Biological Information Infrastructure. Not so many options in Social Sciences. ICSTI Annual Meeting 2012
Inter-university Consortium for Political and Social Research (ICPSR) U of Michigan Data deposit and management Publication-Related Archive quickly available, but ICPSR does not process. Institute for Quantitative Social Science (IQSS) Dataverse Network Harvard Maintains dataverses (individual repositories). Delivers formal persistent citations. ICSTI Annual Meeting 2012
IQSS Dataverse Network terms and conditions (paraphrased): Agree not to use materials to obtain information that could ID subjects in any way, produce links that could ID them or do anything that could constitute invasion of privacy or breach of confidentiality. Also, will not download or use in any way prohibited by applicable law. And will always include the bibliographic citation for the data in any publication that references the data. ICSTI Annual Meeting 2012
Like any citation, it must contain basic elements that identify the dataset as unique: Title, Author, Date, Version, Persistent Identifier DataCite, the organization that manages DOIs for data, recommends Creator (Publication/Year): Title. Version. Publisher. ResourceType. Identifier. Example: Irino, T; Tada, R (2009): Chemical and mineral compositions of sediments from ODP Site 127‐797. Geological Institute, University of Tokyo. ICSTI Annual Meeting 2012
How do we know the data have not changed? Altman & King (2007) advocated the Universal Numeric Fingerprint (UNF)—a short fixed- length string of numbers and characters Example: UNF: 3: ZNQRI xOBffg?== in which the 3 is the version number, the suffix is the fingerprint. If that number changes, the set is a new version of the data. ICSTI Annual Meeting 2012
Just like citing other sources of information— encourages findability, credits the creator, makes any impact trackable. Promotes more and better science, as it enables reuse and verification of data. Rewards the data producer—may encourage others to deposit data. ICSTI Annual Meeting 2012
DataCite—very international with members around the world (CDL & Purdue US members, Microsoft & ICPSR associates) Co-Data—International Council for Science, Committee on Data for Science & Technology International Association for Social Science Information Services & Technology Day-PASS—Data Preservation Alliance for the Social Sciences, membership organization of archives and research centers to date ICSTI Annual Meeting 2012
Linkability and Citability of Research Data Responsibilities for researchers, data archives, publishers Co-reponsibility for bi-directional linking between datasets and publications using persistent identifiers Support for data reuse Issued in June 2012 Joined by CrossRef in July ICSTI Annual Meeting 2012
Researcher InstitutionFunder Publisher Data Manager Need for collaboration among all major participants in the Research Cycle
Funder mandates for data sharing plans encourage new thinking from some disciplines. Connection with the publications is needed. FundRef new initiative within CrossRef Collaboration between publishers and funders to make connections between grants and resulting publications Pilot for publishers to create and submit standard metadata with funder name and grant number. Working group includes several publishers and funders. ICSTI Annual Meeting 2012
Established to solve the name abiguity problem in scholarly communications by creating a registry of persistent unique identifiers for individual researchers. Provides an open and transparent linking mechanism between ORCID, other identifiers, and research objects—pubs, grants, patents, etc. Governed by a board representing all stakeholders. Launching this month. ICSTI Annual Meeting 2012
Designed to facilitate information exchange about research and scholarship. Funded by NIH, National Center for Research Resources Initially, 7academic institutions, but growing APA has instance: Semantic web of information to support interconnectedness and trust support maintenance of research data. ICSTI Annual Meeting 2012
Data standards, changing cultures, new infrastructures will help us avoid the tumult we’ve experienced with supplemental materials.
Past expectation-- psychologists do not withhold data and will share for verification of results. New expectation— authors must agree to share data. ICSTI Annual Meeting 2012
Psychologists worried— Potential nefarious uses—unscrupulous people could twist the data or hector people author trying to help. Well-intentioned but inept secondary analysis—they might get it wrong! Loss of potential publications for self—I haven’t written all my articles from this data! But most common fear—loss of academic credit for what may be years of data collection. ICSTI Annual Meeting 2012
Archives of Scientific Psychology is very different from other APA journals in 4 regards: Authors must submit data to APA or approved repository. Journal is electronic only. It is an open access/author pays model. Authors must submit two methods sections: 1 scientific and 1 in lay language. ICSTI Annual Meeting 2012
Authors sign a Collaboration Agreement specifying that others may reuse their data. Researchers who wish to reuse the data must sign a Collaboration Agreement stating 1. They will not do anything to reveal identity of subjects. 2. They will not engage in “gotcha” publishing—run analyses to prove author wrong and publish the results. 3. They will offer the original data collector co-authorship. ICSTI Annual Meeting 2012
Change the paradigm for use and reuse of data in psychological research by assuring full attribution and credit for the original creator of the data. Contribute to the culture of transparency and prevention of fraud in science. Maintain APA’s high standards for peer- reviewed literature and contributions to science. ICSTI Annual Meeting 2012 The jury is still out—but manuscripts are coming in.
ICSTI Annual Meeting 2012 As all the participants in the scholarly communications process work to enhance access to data, there undoubtedly will be more revolutionary changes.
Linda Beebe Senior Director, PsycINFO American Psychological Association ICSTI Annual Meeting 2012 By building standards, we can change cultures.