Xiaogang (or Marshall as he has been called at ITC) graduated from ITC in 2011 with a PhD degree, with supervision of Professor Freek van der Meer and Dr. John Carranza. He is now a postdoctoral research associate at Tetherless World Constellation, Rensselaer Polytechnic Institute, USA, working on Semantic eGeoscience.
I began to think about a blog on this topic after I read a few papers about open codes and open data published in Nature and Nature Geoscience in November 2014. Later on I also noticed that the editorial office of Nature Geoscience had compiled a cluster of articles themed on transparency in science (www.nature.com/ngeo/focus/transparency-in-science/index.html), which really created an excellent context for further discussion of open science.
A few weeks later I attended the American Geophysical Union (AGU) Fall Meeting in San Francisco, CA. This used to be a giant meeting with more than 20,000 attendees. My personal focus is presentations, workshops and social activities in the group Earth and Space Science Informatics. To summarize the seven-day meeting experience with a few keywords, I would choose Data Rescue, Open Access, Gap between Geo and Info, Semantics, Community of Practice, Bottom-up, and Linking. Putting my AGU meeting experience together with thoughts after reading the Nature and Nature Geoscience papers, now it is time for me to finish a blog.
Besides the incentives for data sharing and open source policies of scholarly journals, we can extend the discussion of software and data publication, reuse, citation and attribution by shedding more light on both the technological and social aspects of an environment for open science.
Open science can be considered a socio-technical system. One part of the system is a way to track where everything goes, and another is a design of appropriate incentives. The emerging technological infrastructure for data publication adopts an approach analogous to paper publication and has been facilitated by community standards for dataset description and exchange, such as DataCite (www.datacite.org), Open Archives Initiative-Object Reuse and Exchange (www.openarchives.org/ore) and the Data Catalog Vocabulary (www.w3.org/TR/vocab-dcat). Software publication, in a simple way, may use a similar approach, which calls for community efforts on standards for code curation, description and exchange, such as the Working towards Sustainable Software for Science (http://wssspe.researchcomputing.org.uk). Simply minting digital object identifiers to codes in a repository makes software publication no different from data publication (see also: www.sciforge-project.org/2014/05/19/10-non-trivial-things-github-friends-can-do-for-science/). Attention is required to code quality, metadata, licence, version and derivation, as well as metrics to evaluate the value and/or impact of a software publication.
Metrics underpin the design of incentives for open science. An extended set of metrics (called altmetrics) was developed for evaluating research impact and has already been adopted by leading publishers such as the Nature Publishing Group (www.nature.com/press_releases/article-metrics.html). Factors in altmetrics include how many times a publication has been viewed, discussed, saved and cited. On my flight back from the AGU meeting, it was very interesting to read some news about funders’ attention to altmetrics (www.nature.com/news/funders-drawn-to-alternative-metrics-1.16524) in the 12/11/2014 issue of Nature that I’d picked up from the NPG booth in the AGU meeting exhibition hall. For a software publication, the metrics might also count how often the code is run, the use of code fragments, and derivations from the code. A software citation indexing service – similar to the Data Citation Index (http://wokinfo.com//products_tools/multidisciplinary/dci/) of Thomson Reuters – can be developed to track citations among software, datasets and literature and to facilitate software search and access.
Open science would help everyone – including the authors – but it can be laborious and boring to give all the fiddly details. Fortunately fiddly details are what computers are good at. Advances in technology are enabling the categorization, identification and annotation of various entities, processes and agents in research, as well as the linking and tracing among them. In our 06/2014 Nature article on climate change, we discussed the issue of provenance of global change research (http://www.nature.com/nclimate/journal/v4/n6/full/nclimate2141.html). These works on provenance capture and further extend the scope of metrics development. Yet incorporating these metrics in incentive design requires the science community to find an appropriate way to use them in research assessment. Progress has recently been made: NSF has renamed the Publications section as Products in the biographical sketch of funding applicants and allowed datasets and software to be listed (www.nsf.gov/pubs/2013/nsf13004/nsf13004.jsp). To fully establish the technological infrastructure and incentive metrics for open science, more community efforts are still needed.