Document your research software

Last updated on 2024-09-23 | Edit this page

Estimated time: 55 minutes

Overview

Questions

  • What can I do to make my code more easily understandable?
  • What information should go into comments?
  • What are docstrings and what information should go into docstrings?

Objectives

  • Know what makes a good documentation
  • Learn what tools can be used for writing documentation
  • Be able to motivate a balanced decision: sometimes READMEs are absolutely enough

Use these slides as a guidance.

The main purpose of this lesson is to make sure participants understand that DOCUMENTATION IS IMPORTANT. The goal is more to trigger participants then to teach them all the different ways one could document a project. It is good to communicate this (and that this will give more time for the other parts of the workshop).

Why we teach this lesson


Specific motivations:

  • Code documentation becomes quickly unmanageable if not part of the source code.
  • It helps people to quickly use your code thus reducing the time spent to explain over and again to new users.
  • It helps people to collaborate.
  • It improves the design of your code.

What makes a good documentation?


Exercise: Think of good and bad examples

Write down your thoughts in the collaborative documents. Respond with emojis 👍 🙀 to your colleagues’ answers. - Think of projects of which you like the documentation. What do you like about them? - Think of projects for which you don’t like the documentation. What don’t you like about them? Are you missing anything?

NB: You can choose a mature library with lots of users for this exercise, but try to also think of less mature projects you had to collaborate on, or papers you had to reproduce.

  • It is important to document code
  • Think about the people reading your documentation (your audience)
  • Depending on the purpose and state of the project documentation needs to meet different criteria. 
  • For most scientific projects, in-code documentation and a well thought out README file is enough.
  • Documentation should be tracked with corresponding code

Types of documentation


There are different types of documentation:

  • README files
  • in-code documentation
  • API documentation
  • Tutorials

We will discuss a few of them in more depth.

Writing good README files


The README file is the first thing a user/collaborator sees. It should include:

  • A descriptive project title
  • Motivation (why the project exists)
  • How to setup
  • Copy-pastable quick start code example
  • Link or instructions for contributing
  • Recommended citation

Exercise README: Draft or improve a README for one of your recent projects (in breakout rooms)

Try to draft a brief README or review a README which you have written for one of your projects.

You can work individually, but you could also discuss whether anything can be improved on your neighbour’s README file(s).

Think about the user (which can be a future you) of your project, what does this user need to know to use or contribute to the project? And how do you make your project attractive to use or contribute to?

(Optional): Try the https://hemingwayapp.com/ to analyse your README file and make your writing bold and clear.

In-code documentation


In-code documentation: - Makes code more understandable - Explains decisions we made

When not to use in-code documentation:

  • When the code is self-explanatory
  • To replace good variable/function names
  • To replace version control
  • To keep old (zombie) code around

Readable code vs commented code

PYTHON

# convert from degrees celsius to fahrenheit
def convert(d):
    return d * 5 / 9 + 32

vs

PYTHON

def celsius_to_fahrenheit(degrees):
    return degrees * 5 / 9 + 32

Writing good comments - In-code-1: Comments

Let’s take a look at two example comments (comments in Python start with #):

Comment A

PYTHON

  # now we check if temperature is below -50
  if temperature < -50:
      print("ERROR: temperature is too low")

Comment B

PYTHON

  # we regard temperatures below -50 degrees as measurement errors
  if temperature < -50:
      print("ERROR: temperature is too low")

Which of these comments is more useful? Can you explain why?

  • Comment A describes what happens in this piece of code. This can be useful for somebody who has never seen Python or a program, but for somebody who has, it can feel like a redundant commentary.
  • Comment B is probably more useful as it describes why this piece of code is there, i.e. its purpose.

What are “docstrings” and how can they be useful?


Here is function fahrenheit_to_celsius which converts temperature in Fahrenheit to Celsius.

The first set of examples uses regular comments:

PYTHON

# This function converts a temperature in Fahrenheit to Celsius.
def fahrenheit_to_celsius(temp_f: float) -> float:
    temp_c = (temp_f - 32.0) * (5.0/9.0)
    return temp_c

The second set uses docstrings or similar concepts. Please compare the two (above and below):

PY

def fahrenheit_to_celsius(temp_f: float) -> float:
    """
    Converts a temperature in Fahrenheit to Celsius.

    Parameters
    ----------
    temp_f : float
        The temperature in Fahrenheit.

    Returns
    -------
    float
        The temperature in Celsius.
    """

    temp_c = (temp_f - 32.0) * (5.0/9.0)
    return temp_c

Docstrings can do a bit more than just comments: - Tools can generate help text automatically from the docstrings. - Tools can generate documentation pages automatically from code.

It is common to write docstrings for functions, classes, and modules.

Good docstrings describe: - What the function does - What goes in (including the type of the input variables) - What goes out (including the return type)

Naming is documentation: Giving explicit, descriptive names to your code segments (functions, classes, variables) already provides very useful and important documentation. In practice you will find that for simple functions it is unnecessary to add a docstring when the function name and variable names already give enough information.

User/API documentation


  • What if a README file is not enough?
  • How do I easily create user documentation?

Tools

You can use the following tools to generate user or API documentation:

Sphinx (documentation generator)

  • creates nicely-formatted HTML pages out of .md or .rst files
  • programming language independent

Github pages (deploy your documentation)

  • set up inside your GitHub repository
  • automatically deploys your Sphinx-generated documentation

You can show the example documentation deployed on GitHub pages here: https://esciencecenter-digital-skills.github.io/good-practices-documentation-example/

Then, you can show that this content comes from simple markdown files, like: https://github.com/esciencecenter-digital-skills/good-practices-documentation-example/blob/main/doc/another-feature.md?plain=1

In addition, you can explain that with a few settings you can automatically generate documentation from docstrings. You can give https://nanopub.readthedocs.io/en/latest/reference/client.html as an example.

Key Points

  • Depending on the purpose and state of the project documentation needs to meet different criteria.
  • Good README files provide a good landing place for anyone that is new to your project
  • Comments should describe the why for your code not the what.
  • Writing docstrings can be a good way to write documentation while you type code since it also makes it possible to query that information from outside the code or to auto-generate documentation pages.