Love your Data through Documentation

U-M love your data week logo

Documenting your data is kind of like eating your spinach. You know that you need to do it to keep your data healthy, but it’s not something that you look forward to. Good documentation takes an investment of time and energy. It can feel like grunt work, or that it is slowing you down when you really want to keep making progress on your research.

The challenge is that the value of the data that you produce can be compromised if it is not adequately documented. If people (or your future self) cannot understand the decisions that you made and the steps you took to generate, process, analyze or summarize your data sets, then it will be difficult to trust your data and by extension your research findings. With increasing pressure on researchers to share their data publicly for others to review and use, documenting the work that you do with your data is more important than ever.

But how much and what kind of documentation is needed for your data set? This can be a difficult question to answer as many research fields are still deciding on and developing norms and practices around data. Relevant guidance may exist from expert sources in your field (such as the guidance produced by ICPSR for the Social Sciences, or the primer produced by DataONE for ecological sciences) or from funding agencies (such as the NIH Data Sharing Policy). If you do not know if guidance exists for your field or not, ask a librarian for help! We will be happy to assist you in tracking down relevant resources on documenting your data set.

If guidance does not exist for your field, or even if it does, you may want to keep a few rules of thumb in mind as you determine what to document about your data.

Distance: Who else beside you will need to understand and trust your data? Think of documentation as a series of concentric circles moving outward with yourself at the center of the circle. The amount of documentation that you may need to inform yourself may not be a great deal because you are intimately familiar with your own work. Moving out from the center to the next ring are your collaborators. They will be familiar with your data from working closely with you, but they will still likely need more documentation to be able to understand and trust your data than you would. The next ring would contain other researchers doing similar work to your own. They would still know the basic principles and practices behind your data, but they would need to have more documentation about your data than people working with you directly to make sense of it. The outer rings may include other researchers in related disciplines, or the general public each of whom would need increasing amounts and perhaps different types of documentation for your data. Thinking about your potential audiences ahead of time will help you plan out the amount and the type of documentation you need to generate.                  

Time: How long do you anticipate your data being useful? If you answer is “a long time” or “indefinitely”, you may need to spend extra effort into developing rich documentation. The more time passes the harder it may be to recall the decisions that you made and the work that you did with your data. This is especially true once you cease to develop the data set and move on to other research projects, or even on to new jobs. The best time to capture the work that you are doing with your data is right after you have done it. It is very difficult, not to mention time-consuming, to try and generate documentation weeks, months, or years after you have done the work. Think of documenting your work as you do it, as doing a favor for your future self.

Templates: What will you absolutely need to capture as a part of your documentation? Identifying what information will be needed is important but only half of the way towards successfully documenting your work. The other part is designing a system the makes it easier to capture the information you will need. One approach would be to identify the specific characteristics of the data and then design them as headers for your lab notebook (or whatever method you intend to use). Writing down these headers as a template for every entry you make and filling them in will get you into the habit of capturing this information overtime. Think of this as creating a recipe for successfully capturing the information you need. After all, even spinach can be delicious if you follow your favorite recipe.

Whatever and however you decide to develop documentation for your data, it’s best if you plan it out ahead of time. Documentation is an essential component of a healthy data set and therefore needs to be a part of your ongoing plan of action. Planning out where, when, what and how to document is a critical component in developing a healthy data set that can withstand time and go the distance.    

1 Comment

on Feb. 16, 1:24pm

One reason to document that I've been thinking about lately is that it makes the publishing process easier. If you document the project from the beginning, then when you start writing that article two-three years, or more, later all the information you need is right there. You don't have to search your memory, look at the umpteen documents you've created, or talk to your collaborators asking "how/why did we do it this way". When you get comments back from reviewers, you have one document that you can refer to and believe me it sounds better if you say "my documentation shows" rather than "I remember that".