<!-- .slide: data-state="title" -->
# Distributing Software
===
<!-- .slide: data-state="standard" -->
## Why distribute?
1. For reproducibility/transparency of research
2. For developers that want to reuse parts of your software/methods
3. For users that want to use your software
note:
There are many potential reasons for distributing your software. In science, these three
scenarios are the most common. Note that users and developers can include yourself!
===
<!-- .slide: data-state="standard" -->
## Kinds of software
Script, program, notebook, package, library, image, container, it's confusing!
Let's clear that up first
note:
Software can seem very confusing, because there are many different terms that are used
to describe various kinds of software, and they're often used sloppily as well. So let's
start with explaining a bit how software is put together, and what all these terms mean.
===
<!-- .slide: data-state="standard" -->
### Functions (or Procedures)
<div style="width: 40%; float: left; margin-top: 3%">
- Ordered list of instructions
- Like a recipe
- Can refer to other recipes
- Closed box
</div>
<div style="width: 60%; float: right; margin-top: 3%">
<pre style="font-size: 70%; text-align: left">
def minmax(list_of_things):
sorted_things = sorted(list_of_things)
smallest = sorted_things[0]
largest = sorted_things[-1]
return smallest, largest
</pre>
</div>
note:
To start, let's simplify everything way down and look at a basic component of software,
the function. (There's more to it, but this is all we need here and gives a good idea.)
A function, or procedure which is really a better term but not what most people use
anymore, is a list of instructions, like a recipe.
To make lasagna, you use sheets of dough, Bechamel sauce, and tomato sauce, and stack
them up in alternating layers before cooking everything in the oven. Of course tomato
sauce does not grow on trees, so you'll need a sub-recipe to make that, and it's the
same with functions: they call other functions.
Like with cooking, where you might get some ready-made pasta or sauce, the functions
called from a function can be made by someone else. Your laptop has tens or hundreds of
thousands of functions in it to do everything it does.
===
<!-- .slide: data-state="standard" -->
### Scripts and Notebooks
<div style="width: 50%; float: left; margin-top: 3%">
- Single file with function(s)
- Sit at the top of a pyramid
- Use libraries
- Run interactively
- User handles errors
</div>
<div style="width: 50%; float: right; margin-top: 0%">
<img src="media/scripts_notebooks.png" width="100%">
</div>
note:
A script is a small bit of software, usually a single file with one or a few functions
in it. A notebook is a script that is run in a special environment, like Jupyter, which
shows the script as a document with blocks of code and results (e.g. graphs) embedded in
it.
Scripts and notebooks are run interactively by a user, who can see the code and handles
any errors, if they occur. The functions called by the functions in a script are usually
not in the script itself, but come from libraries.
===
<!-- .slide: data-state="standard" -->
### Libraries
<div style="width: 50%; float: left; margin-top: 5%">
- Lower levels of the pyramid
- Ready-made functionality
- Have an interface
- Handle errors itself
</div>
<div style="width: 50%; float: right; margin-top: 0%">
<img src="media/scripts_notebooks.png" width="100%">
</div>
note:
Libraries are collections of functions intended for use by others, either other
libraries, or scripts, or even people. They contain ready-made functionality organised
around a topic. For example there are plotting libraries that can make different kinds
of graphs, or libraries that can solve particular kinds of equations.
Libraries have an interface or API, which consists of functions that the user is
supposed to call. The library will probably have some internal functions as well.
Libraries need to handle any errors if they occur, either by working around the problem
or by reporting it to the user in a well-defined way.
Designing APIs and handling errors is hard, so making a good library is not so easy, but
libraries are very reusable.
===
<!-- .slide: data-state="standard" -->
### Programs
<div style="width: 50%; float: left; margin-top: 3%">
- At the top, or including all
- Bigger than a script
- Not meant for programming
- User interface
</div>
<div style="width: 50%; float: right; margin-top: 3%">
<img src="media/programs.png" width="100%">
</div>
note:
Program is a very generic term. A script is a small program, and the borderline isn't
very clear. A program can do things on its own, so a library is not a program. Programs
often include all the libraries that they use, but you can also say that program X
depends on library Y. A program is not meant for programmers, but for end users, with
which it communicates through some kind of user interface, be it graphical or using text
or on the command line.
===
<!-- .slide: data-state="standard" -->
### Packages
<div style="width: 50%; float: left; margin-top: 3%">
- A file containing a collection of software
- Intended for distribution
- Usually holds a library or a program
- Has metadata
- Listed in a package index
- Installed using a package manager
</div>
<div style="width: 50%; float: right; margin-top: 6%">
<img src="media/packages.png" width="100%">
</div>
note:
Packages are often used to distribute software. A package is an archive file (e.g. a
ZIP file) that contains software, usually a library or a program. There is also metadata
that describes, among other things, licenses and authors and other packages containing software that is
needed by this one.
Packages are usually uploaded to a package index, an online store of packages where
users can find them and download them, e.g. PyPI or CRAN. Finding, downloading and
installing is done using a tool called a package manager, like pip or conda.
Especially older programming languages may not have a standard package index or package
manager, e.g. C++ and Fortran.
===
<!-- .slide: data-state="standard" -->
### Images and containers
<div style="width: 50%; float: left; margin-top: 3%">
- A file containing a collection of software
- Everything, including the OS
- Cannot be combined
- Can be run like a program, as a container
</div>
<div style="width: 50%; float: right; margin-top: 3%">
<img src="media/images_containers.png" width="100%">
</div>
note:
A virtual machine image or container image is a file containing a collection of
software, like a package. However, an image contains absolutely everything needed,
including (most of) the operating system. If you have an image, then you can run the
program inside it on your computer. This will create a kind of virtual computer inside
your computer (that's called a Virtual Machine or a Container) inside which the software
runs. Containers cannot be combined and you can't call them from a script, so they're
only useful for programs, not for libraries, and the user needs special software to be
able to run the program.
===
<!-- .slide: data-state="standard" -->
### Scenario 1: Review & reproduce
- You have a script
- The script uses some libraries
- You want others to see and run it
===
<!-- .slide: data-state="standard" -->
### Distributing scripts/notebooks
- Make script/notebook available for download
- Make environment file, e.g. requirements.txt, environment.yml
- User uses a package manager to create environment
- User runs the script/notebook
- Optional: use an image (mostly long-term archival)
note:
In this case, you have a script or a notebook, and also an environment (e.g. using Conda
or virtualenv) with the libraries you use installed in it. This environment is really
just a folder on your computer with the libraries in it, and some machinery that makes
it so that when the script calls a function that is in a library, the library will be
found and the function run.
So, you need to share your script or notebook of course, but the user will also need a
similar environment to run it, otherwise those library functions are missing. You can
(and should!) facilitate that by making an environment file that describes all the
packages containing the libraries that you're using. If you then make that available,
the user can create a similar environment on their computer and run your script.
Software that manages these environments, like Conda and pip/virtualenv, can make these
environment files for you, and also read one and create an environment by downloading
all the needed packages from the Internet, so this is not too difficult.
Another option is to use an image, but 1) they're very big, 2) they require special
software, and 3) they're much more work to create. They do have the advantage that they
work without an Internet, or in the future when all that old stuff is no longer online.
===
<!-- .slide: data-state="standard" -->
### Scenario 2: A reusable method
- You have created a new method
- Should be somewhat generic
- To be used by programmers
- Needs a library!
note:
Scientists sometimes develop new methods for calculating something, for example
acoustics in a room. They will likely have a script that calculates things for their
paper, and that script can be shared as in Scenario 1, but it would also be nice if
people could use their method for calculating acoustics in all sorts of scenarios, in
combination with other calculations.
In this case, what's really needed is a library. As mentioned before, making libraries
isn't that easy, so they may need some help from a Research Software Engineer to design
a good interface. Another option, and often a better one actually, is to contribute to
an existing widely used library. That will give your code more exposure, and may get you
some help from its maintainers as well, in the form of reviews.
===
<!-- .slide: data-state="standard" -->
### Distributing libraries
- Make a package containing the library
- Add metadata with dependencies
- Upload the package to a repository
note:
The best way to distribute a library is by making a package for it, if there is a
suitable packaging system available. Most modern programming languages have one, e.g.
PyPI and pip for Python, or CRAN and install.packages() for R, or Cargo for Rust.
So your steps here are to find the right packaging system based on the programming
language the library is written in, then creating a package for that system (there is
usually a manual/tutorial available), and uploading the package to the repository so
that users can use their package manager to install it. Make sure to list all the
dependencies in the metadata!
For languages like Fortran or C++ that don't have a standard package manager, you could
try Conda, or make packages for an operating system-specific packaging system like
APT/DPKG (Debian/Ubuntu Linux), DNF/RPM (Red Hat Linux), HomeBrew or macPorts (macOS) or
even EasyBuild, Spack or Nix (on High-Performance Computing)
===
<!-- .slide: data-state="standard" -->
### Scenario 3: A whole application
- Not so common in science
- You have created a program
- To be used by end users
note:
Scientists don't often make whole applications with user interfaces that anyone can just
pick up and use. That's a lot of work, and you don't get credit for it in science.
Nevertheless there are examples of this, e.g. MaxEnt in ecology, and it could happen. In
this case, what you have is a program.
===
<!-- .slide: data-state="standard" -->
### Distributing programs
- In an app store!
- As an installer?
- As a package?
- As an image?
note:
There are a few options for distributing a program. Packaging systems can be used,
especially if the program is intended for use by programmers, or by more knowledgeable
users. An App Store is basically a packaging system for applications, and that would be
a good option too if available.
Another option is to make an installer. That's a single file that is a working program,
like a package, but it is also itself a program that when run will install your program
onto the user's system. If you've installed software on Windows then you may have seen
one, although they've mostly been replaced by app stores and may cause the operating
system to warn about suspicious software.
Finally, you could make an image here, but as before, it would be very large, and it
would require special software on the user's computer, so in the here and now it's not a
good option. It would be more resistant to changes in the operating system that break
the program, because it includes the old version of the operating system. And then
again, running an old operating system is a security risk.
===
<!-- .slide: data-state="standard" -->
### Updating software
- Can't we just put it online and be done?
- For scripts, you probably can
- Libraries and programs need maintenance
note:
Scripts are built on top of libraries, which sit on top of an operating system, which
sits on top of hardware, and all that stuff is constantly changing. Furthermore, people change and expect to use software in different ways over
time, and that also needs to be taken into account.
For example, the LimePy library in the AMUSE astrophysics software uses a function in
the SciPy library that does not exist anymore. The old version of SciPy it was written
for was designed for an old version of Python, which is no longer supported and
shouldn't be used anymore, for security reasons. Also, a script using LimePy may use
other libraries that require a newer SciPy, and you can't install multiple versions of
the same thing at the same time, so you won't be able to run that script anymore.
Because of this, all software needs to be updated from time to time in order to continue
to work. This is tricky in science, which works in terms of projects that are finished
and then replaced by something else. For scripts, it's probably not so important because
people can still read the script to see what you did, and update it if they see the
need.
For libraries and programs, it's a different story, because those are used over a longer
period, and for libraries other things get built on top of them, and we'd like all that
to continue to work.
===
<!-- .slide: data-state="standard" -->
### Versioning
- Use [Semantic Versioning](semver.org) for libraries (X.Y.Z)
- Less important for applications
note:
For libraries, it's important that you communicate about changes with programmers using
your library. If the changes affect your interface, then they will have to modify their
software to work with your new version. This is sometimes necessary and sometimes a good
idea, as long as it doesn't happen too often. You do need to communicate it clearly
however.
A good way to do that is to use Semantic Versioning. This uses X.Y.Z versions, and from
version 1.0.0 promises that everything built on top of the library will continue to work
as long as the first number remains the same. The second number is changed to mark
adding new functions, and the third number is incremented if the new version only fixes
bugs.
===
### Summary
- Script/notebook: publish with environment
- Library: package and upload
- Program: app store, package, installer