Q&A series with ECS OpenCon 2017 speakers
ECS will be hosting its first ever OpenCon event on October 1 in National Harbor, MD. OpenCon will be ECS’s first, large community event aimed at creating a culture of change in how research is designed, shared, discussed, and disseminated, with the ultimate goal of making scientific progress faster.
During ECS’s OpenCon, Meredith Morovati, executive director of Dryad, will give a talk on open data.
The following conversation is part of a series with speakers from the upcoming ECS OpenCon. Read the rest of the series.
ECS: How and why did Dryad get its start; and how has it grown since then?
Meredith Morovati: Editors from journals in the fields of evolution and life science—some of them competing journals—were becoming concerned that it was difficult to find data that supported the literature; the “policy” of asking an author to share data after the fact was a failure. In 2011, twelve of these editors came together to remedy this, and developed the Joint Data Archive Policy (JDAP). JDAP required, as a condition of publication, that data be archived in an appropriate public archive and stated that data are products of the scientific enterprise in their own right. These editors argued that data must be preserved and usable in the future. JDAP is now a model for requiring data as a condition of publishing an article.
The use of Dryad was not stipulated as part of this policy, but Dryad became the preferred solution due to its one-to-one relationship with data and scholarly literature. In addition, Dryad curates data to ensure high quality metadata and is committed to discoverability.
ECS: What are the benefits of open data sharing?
MM: Benefits of data sharing can vary depending on the field. For some, it is the ability to build on data for new studies. For instance, there is a study of MRI data that originated in the field of neurology but has great potential for other fields, such as machine learning.
Another benefit is that sharing data, especially early in the process, can speed up collaboration and innovation. For example, some Dryad data are associated with data papers. Data papers are scholarly works in which the main output is the dataset and the descriptions; this is common in some fields. Early publication of a dataset from a citizen science project that identified animals in the Serengeti not only resulted in a a citable publication for the authors, it quickly generated new scholarship in other areas such as ecology, education, and artificial intelligence.
Another clear benefit of open data is transparency. By sharing data, we improve the public’s trust in science. We are living in a divisive time in which people have lost confidence in science, and facts seem to be in doubt. Ensuring that publicly-funded research remains open to the public increases both transparency and trust.
Other benefits include career enhancement—publishing data with a citable DOI in a repository has been shown to increase citations to the linked literature, and reproducibility, since making data open is an essential step to enabling others to reproduce one’s work.
ECS: What is perhaps the most urgent need researchers have with respect to data?
MM: In a recent survey of Dryad submitters, we found that though about half of them planned to submit their data as part of their funding agreement, very few budgeted for it. Let that sink in for a minute! While researchers are now realizing that data need to be archived as part of funding, they overwhelmingly do not (or cannot) allocate resources. Additionally, fewer than half of the researchers we surveyed were able to archive their data during the grant period. This means that even if researchers plan to charge their grant for data submission, they may not be able to do so.
Funders, publishers, and the open science community are also recognizing that data archiving is not free, so it is vital now to advocate the need to set aside resources for this endeavor.
ECS: For researchers, what other areas do you see that need to intersect with open data?
MM: In order to produce a good open data set, one of the most important things to do (outside of ensuring quality data) is proper documentation. Much angst can be avoided if the researcher considers the end (re)user who would benefit from well-structured and annotated data.
In addition, researchers might consider what kind of file formats are best for open science. While Dryad is format-agnostic (we accept all formats), we do encourage open formats that have a better chance of being usable in the future.
ECS: How can producing open data be sustainable yet affordable for all?
MM: In my mind there are two main issues here. The first is how to properly resource the archiving of data. The second is how to make sure the data will be available for the long term.
At Dryad, quality archiving means that a curator reviews data for quality and to ensure it is ready to be used in an open, public archive. They look for sensitive or human subject data, intellectual property issues, etc. This is an important requirement for sustainability, with associated costs. To address the cost, we developed a data publication charge (DPC) that levies a charge at the time of data publication. It is currently $120 USD and we encourage learned societies, publishers, and funders to pay this fee on behalf of their community. With DPCs, we’ve found a way to make it fairly affordable, but in order for this to continue, we need the support of journals and funders.
We have also spent time working on the need for long-term accessibility of all Dryad data. There are many publishers who store supplemental data but have no safeguards for a temporary or permanent loss. This is the job of a repository. Not all publishers have the bandwidth to deal with this. Dryad has recently partnered with an organization to ensure access to data for the long term.