Following a data citation reference through the publishing process:
In my role as the Data Workflows Specialist at the University of Michigan - Library I review large datasets and code deposits. I also support various aspects of our research data repository Deep Blue Data, https://deepblue.lib.umich.edu/data, based on Samvera Hyrax. My colleagues and I have been making efforts to improve connections between our system and other systems to gather various metrics for our datasets.
In the Spring of 2021, a researcher I regularly work with informed me that he had included the citation to his dataset in the References section of the paper that he had just submitted to AGU JGR Planets, https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2021JE006875. I thought it was an excellent opportunity to follow one of our datasets through the process from a mention in the References section all the way through to the DataCite Data Metrics badge, https://support.datacite.org/docs/displaying-usage-and-citations-in-your-repository, in the Deep Blue Data repository indicating this dataset has been cited.
This is the rough process (Figure 1):
Citation for the dataset, https://doi.org/10.7302/zck4-0058, as displayed in Deep Blue Data (Figure 2):
Once the article has been officially published, the citation is fully marked up and hyperlinked in the AGU article references, https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2021JE006875, the DOI resolves back to Deep Blue Data deposit (Figure 3):
Interestingly, the Google Scholar link (circled in Figure 3), https://scholar.google.com/scholar?hl=en&q=Arbic%2C+B.+K.%2C+%26+Schindelegger%2C+M.+%282021%29.+Long%E2%80%90term+Earth%E2%80%90Moon+evolution+with+high%E2%80%90level+orbit+and+ocean+tide+models+%5BDataset%5D.+University+of+Michigan%E2%80%93Deep+Blue+Data.+https%3A%2F%2Fdoi.org%2F10.7302%2FZCK4-0058, resolves to the AGU article itself in a circular manner, NOT to the dataset itself (Figure 4):
Rather than pointing to Google Scholar, AGU could point to DataCite Commons, https://commons.datacite.org/doi.org/10.7302/zck4-0058, (they have links to CrossRef for other citations) or even Google Dataset Search, https://datasetsearch.research.google.com/search?query=%22Long-term%20Earth-Moon%20evolution%20with%20high-level%20orbit%20and%20ocean%20tide%20models%22&docid=L2cvMTFwdmgyMjEyZA%3D%3D, (Figure 5).
The publisher, AGU/Wiley, makes the article metadata available to Crossref as XML displayed in Figure 6 in a somewhat more readable format via their API, https://api.crossref.org/v1/works/10.1029/2021JE006875. Citation #7 in Figure 6 below is how the citation looked prior to the Nov 2021 “fix” from AGU/Wiley. (See upcoming article from Shelley Stall and others from Force 11). Citation #8 is here for reference to show how a regular article is displayed. Note the DOI is called out on #8 (Figure 6):
Here is the vendor XML feed from the publisher, https://api.crossref.org/works/10.1029/2021JE006875/transform/application/vnd.crossref.unixsd+xml (Figure 7):
After the fix from AGU/Wiley, the dataset reference in Crossref in December 2021 is listed as an “unstructured citation” that includes the DOI (Figure 8):
Here is the official XML feed from the publisher to Crossref after the AGU/Wiley fix (Figure 9):
Unfortunately, the AGU/Wiley fix does not help DataCite to see this citation mention in the references as a citation to the dataset in DataCite Commons JSON (Figure 10):
Figure 11 shows how the citation appears in the DataCite Commons, https://commons.datacite.org/doi.org/10.7302/zck4-0058, frontend:
Figure 12 shows how the citation appears in the DataCite Search, https://search.datacite.org/works/10.7302/zck4-0058, frontend:
Finally, Figure 13 displays the backend of the DataCite Data Metrics badge, https://support.datacite.org/docs/displaying-usage-and-citations-in-your-repository (HTML underlying the graphic):
The DataCite Data Metrics badge displays no indication of citations (Figure 14):
I also tracked a dataset DOI, https://doi.org/10.7302/pa6y-fb55, mentioned in a “Data Availability” statement, https://www.nature.com/articles/s41467-021-27827-y#data-availability, but not in the “References” section in an article in Nature (Figure 15):
Unfortunately, there is no indication of the existence of a data availability statement in the metadata shared with Crossref, https://api.crossref.org/v1/works/10.1038/s41467-021-27827-y.
After this research, I’m not sure where or how our researchers are supposed to cite their datasets for them to be counted by the systems that “count.” Any advice on how this should be done would be greatly appreciated. I would also be happy to discuss any of this further or do testing!
For more citation fun: https://apps.lib.umich.edu/blogs/bits-and-pieces/contributing-citation-datacite-iscitedby