Document your research software
Last updated on 2024-09-23 | Edit this page
Overview
Questions
- What can I do to make my code more easily understandable?
- What information should go into comments?
- What are docstrings and what information should go into docstrings?
Objectives
- Know what makes a good documentation
- Learn what tools can be used for writing documentation
- Be able to motivate a balanced decision: sometimes READMEs are absolutely enough
Why we teach this lesson
Specific motivations:
- Code documentation becomes quickly unmanageable if not part of the source code.
- It helps people to quickly use your code thus reducing the time spent to explain over and again to new users.
- It helps people to collaborate.
- It improves the design of your code.
What makes a good documentation?
Exercise: Think of good and bad examples
Write down your thoughts in the collaborative documents. Respond with emojis 👍 🙀 to your colleagues’ answers. - Think of projects of which you like the documentation. What do you like about them? - Think of projects for which you don’t like the documentation. What don’t you like about them? Are you missing anything?
NB: You can choose a mature library with lots of users for this exercise, but try to also think of less mature projects you had to collaborate on, or papers you had to reproduce.
- It is important to document code
- Think about the people reading your documentation (your audience)
- Depending on the purpose and state of the project documentation needs to meet different criteria.
- For most scientific projects, in-code documentation and a well thought out README file is enough.
- Documentation should be tracked with corresponding code
Types of documentation
There are different types of documentation:
- README files
- in-code documentation
- API documentation
- Tutorials
We will discuss a few of them in more depth.
Writing good README files
The README file is the first thing a user/collaborator sees. It should include:
- A descriptive project title
- Motivation (why the project exists)
- How to setup
- Copy-pastable quick start code example
- Link or instructions for contributing
- Recommended citation
Exercise README: Draft or improve a README for one of your recent projects (in breakout rooms)
Try to draft a brief README or review a README which you have written for one of your projects.
You can work individually, but you could also discuss whether anything can be improved on your neighbour’s README file(s).
Think about the user (which can be a future you) of your project, what does this user need to know to use or contribute to the project? And how do you make your project attractive to use or contribute to?
(Optional): Try the https://hemingwayapp.com/ to analyse your README file and make your writing bold and clear.
In-code documentation
In-code documentation: - Makes code more understandable - Explains decisions we made
When not to use in-code documentation:
- When the code is self-explanatory
- To replace good variable/function names
- To replace version control
- To keep old (zombie) code around
Readable code vs commented code
vs
Writing good comments - In-code-1: Comments
Let’s take a look at two example comments (comments in Python start
with #
):
Comment A
PYTHON
# now we check if temperature is below -50
if temperature < -50:
print("ERROR: temperature is too low")
Comment B
PYTHON
# we regard temperatures below -50 degrees as measurement errors
if temperature < -50:
print("ERROR: temperature is too low")
Which of these comments is more useful? Can you explain why?
- Comment A describes what happens in this piece of code. This can be useful for somebody who has never seen Python or a program, but for somebody who has, it can feel like a redundant commentary.
- Comment B is probably more useful as it describes why this piece of code is there, i.e. its purpose.
What are “docstrings” and how can they be useful?
Here is function fahrenheit_to_celsius
which converts
temperature in Fahrenheit to Celsius.
The first set of examples uses regular comments:
PYTHON
# This function converts a temperature in Fahrenheit to Celsius.
def fahrenheit_to_celsius(temp_f: float) -> float:
temp_c = (temp_f - 32.0) * (5.0/9.0)
return temp_c
The second set uses docstrings or similar concepts. Please compare the two (above and below):
PY
def fahrenheit_to_celsius(temp_f: float) -> float:
"""
Converts a temperature in Fahrenheit to Celsius.
Parameters
----------
temp_f : float
The temperature in Fahrenheit.
Returns
-------
float
The temperature in Celsius.
"""
temp_c = (temp_f - 32.0) * (5.0/9.0)
return temp_c
Docstrings can do a bit more than just comments: - Tools can generate help text automatically from the docstrings. - Tools can generate documentation pages automatically from code.
It is common to write docstrings for functions, classes, and modules.
Good docstrings describe: - What the function does - What goes in (including the type of the input variables) - What goes out (including the return type)
Naming is documentation: Giving explicit, descriptive names to your code segments (functions, classes, variables) already provides very useful and important documentation. In practice you will find that for simple functions it is unnecessary to add a docstring when the function name and variable names already give enough information.
User/API documentation
- What if a README file is not enough?
- How do I easily create user documentation?
Tools
You can use the following tools to generate user or API documentation:
Sphinx (documentation generator)
- creates nicely-formatted HTML pages out of .md or .rst files
- programming language independent
Github pages (deploy your documentation)
- set up inside your GitHub repository
- automatically deploys your Sphinx-generated documentation
Key Points
- Depending on the purpose and state of the project documentation needs to meet different criteria.
- Good README files provide a good landing place for anyone that is new to your project
- Comments should describe the why for your code not the what.
- Writing docstrings can be a good way to write documentation while you type code since it also makes it possible to query that information from outside the code or to auto-generate documentation pages.