Software Version Control

<!-- .slide: data-state="title blue_overlay yellow_flag yellow_strip purple_half_circle_bottom purple_blob right_e_top" --> # Version Control What is version control and why should I use it? note: You are probably using version control every day. Word and other documents store a history of changes for you, which you can undo. Google Docs have an even more intricate system of reviewing history, etc. === <!-- .slide: data-state="standard center" --> ### Version Control *A system that organizes and records changes to a (set of) file(s) and/or their metadata over time, allowing one to revisit specific versions later.* <div class="fragment" data-fragment-index="1"> <img src="https://swcarpentry.github.io/git-novice/fig/phd101212s.png" width="38%"> <small> <a href="https://phdcomics.com">Piled Higher and Deeper</a> by Jorge Cham </small> </div> note: Strictly speaking, this is a form of version control: separate versions are stored, commented on, and organized in some way. Let's see a more practical, systematic, and robust way of doing this... === <!-- .slide: data-state="standard center" --> ## Documents are... <div> ... a series of changes <img style="height: 30vh; margin: 0; padding: 0;" src="https://swcarpentry.github.io/git-novice/fig/play-changes.svg"/> </div> note: In version control systems, documents start with a base version (which may or may not be empty) and record all the changes that happened on top of that base version. Because of this you can always "play back" to an earlier version or compare separate iterations, without having to store near-identical variations of the same documents. == <!-- .slide: data-state="standard center" --> ## Software is... ... a collection of one or more documents. - Code - Documentation - Environment & infrastructural files - ... note: Any piece of code is nothing more than a (plain) text document, not too different from a Word file. The same is generally true for most other files comprising the software. Therefore, changes are excessively easy and computationally efficient to track. Even complex (binary, data, image, ...) files can ultimately be rendered in plain text, and therefore tracked using version control. Although changes are not easy to visualize and it will require multiplying the data storage for each change. === <!-- .slide: data-state="standard center" --> ## Changes... ... are stored using a version control (VC) system (usually `git`). <img src="media/git-diff.png" width="70%"> A single unit of change is called a [commit](version_control_terminology), and is typically associated with a brief [commit message](version_control_terminology), [commit hash](version_control_terminology) (SHA), and other metadata. <!-- .element: class="fragment" data-fragment-index="1" --> note: Commit hashes are unique references to a commit, while commit messages are human readable descriptions. Other metadata can include the author(s), timestamp, etc. == <!-- .slide: data-state="standard center" --> ## A commit... ... can be of any size or type: - Single line additions/deletions/changes - Additions/deletions/changes of (multiple) large sections - Adding or deleting files - Moving files into different folders - ... note: A commit is the fundamental unit of change in version control, but its scope is not strictly defined. Ideally, a commit should be large enough to represent a meaningful improvement while remaining small enough to focus on a single, coherent change. Balancing these factors helps maintain clarity and makes it easier to track, review, and revert changes when needed. == <!-- .slide: data-state="standard center" --> ## Sequential commits... ... form a log The data structure that contains the software plus the commit history is called a [repository](version_control_terminology). <img src="media/git-log.png" width="70%"> note: A good rule of thumb for commit sizing is whether you can write a clear, concise commit message summarizing the change. If the commit is too small, the message will simply describe the specific action taken. If it's too large, summarizing it succinctly becomes difficult. Striking the right balance ultimately also allows the commit log to be read almost like a history book. === <!-- .slide: data-state="standard" --> ## Keeping track #### Question: Say you have dozens (hundreds, thousands, ...) of old versions, how do you manage to find a specific/useful previous version? - Effective committing (sizing & messages) <!-- .element: class="fragment" data-fragment-index="1" --> - Versioning systems <!-- .element: class="fragment" data-fragment-index="1" --> - Create Change logs and DOIs for stable/important versions <!-- .element: class="fragment" data-fragment-index="1" --> note: We've talked above extensively about commit hygiene, so will now discuss the other 2 aspects. == <!-- .slide: data-state="standard" --> ## Versioning Use a logical system to keep track of (stable) versions<br> ... and document the system used in the repository. The most common are: <ul> <li><a href="https://semver.org/">Semantic Versioning (SemVer)</a>, and <li><a href="https://calver.org/">Calendar Versioning (CalVer)</a>. </ul> note: Make sure that whatever versioning system you use is also documented in the repo, so that others (and future you) can see what the logic is. == <!-- .slide: data-state="standard" --> ## SemVer Terminology MAJOR.MINOR.PATCH - MAJOR: backwards incompatible changes - ⚠️ MAJOR=0: API may change any time - MINOR: added backwards incompatible functionality - PATCH: backwards compatible bugfixes - 1.0.0 defines public API note: Not every change or addition needs a new version number; this is what commits are for. == <!-- .slide: data-state="standard" --> ## Using SemVer <div style="float: left; width: 49%;"> ### Pros - Widely used - "Maturity" - `0.2.0` vs `3.1.4` - Long-term-stable (LTS) versions - Maintain older versions with fresh patches - `1.0.0` as a milestone </div> <div class="fragment" style="float: right; width: 49%;"> ### Cons - No info about "freshness" - How old is version `2.7.1` ? - "Bigger is better", marketing - Superstition 13, 14 - `1.0.0` as a milestone </div> == <!-- .slide: data-state="standard" --> ## CalVer Examples These "projects" use CalVer to show freshness. - RSECon25: `YY` - See https://rsecon25.society-rse.org/ - Every year new one - certifi 2025.10.5: `YYYY.MM.DD` - See https://pypi.org/project/certifi/#history - Certificates expire quickly, so they need to be recent - Ubuntu 26.04.1 `YY.0M.PATCH` - See https://releases.ubuntu.com/ - Operating systems need to be updated regularly - Using SemVer patch addition for "Service Packs" == <!-- .slide: data-state="standard" --> ## CalVer Terminology - `YYYY` Full year - 2006, 2016, 2106 - `YY` Short year - 6, 16, 106 - `0Y` Zero-padded year - 06, 16, 106 - `MM` Short month - 1, 2 ... 11, 12 - `0M` Zero-padded month - 01, 02 ... 11, 12 - `WW` Short week (since start of year) - 1, 2, 33, 52 - `0W` Zero-padded week - 01, 02, 33, 52 - `DD` Short day - 1, 2 ... 30, 31 - `0D` Zero-padded day - 01, 02 ... 30, 31 note: - Common assumptions are using the UTC timezone and the Gregorian calendar - The [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) date format (YYYY-0M-0D) is recommended, since sorting alphabetically results in sorting chronologically as well == <!-- .slide: data-state="standard" --> ## Using CalVer <div style="float: left; width: 49%;"> ### Pros - Communicates "freshness" - Version from this month would be rather recent - No "bigger is better" marketing </div> <div class="fragment" style="float: right; width: 49%;"> ### Cons - No indication of "maturity" - `0.2.0` vs `3.1.4` - No indication of API breakage - Is my code compatible with version `2016.10`? </div> == <!-- .slide: data-state="standard" --> ## Choosing a Versioning Scheme - Use **SemVer** when compatibility is important. - Pick **CalVer** if you are releasing on a schedule and timing is important. When in doubt, read up more about the differences. <style> .footnote_link { color: white !important; } .footnote_link:hover { color: var(--nlesc-purple) !important; } </style> <footer style="font-size: smaller;"> [1]: <a class="footnote_link" href="https://gosink.in/versioning-strategies-explained-semver-to-calver-and-beyond-and-which-one-should-you-choose-2/">gosink.in/versioning-strategies-explained-semver-<br>to-calver-and-beyond-and-which-one-should-you-choose-2/</a> <br> [2]: <a class="footnote_link" href="https://sensiolabs.com/blog/2025/semantic-vs-calendar-versioning">sensiolabs.com/blog/2025/semantic-vs-calendar-versioning</a> </footer> == <!-- .slide: data-state="standard" --> ## Change logs & DOIs Keep a human readable log summarizing the changes of each new version. Host versions on servers such as [Zenodo](https://zenodo.org/) or [GitHub](https://github.com/)/[GitLab](https://gitlab.com/) to generate a DOI. <img src="media/ChangeLog.png" width="45%"> <small>Excerpt of release notes for <a href="https://github.com/astral-sh/ruff/releases/tag/0.8.6">ruff linter v0.8.6</a></small> note: This is the equivalent of the "commit message" at the level of a new version. There are a few (near) synonyms used for this: change log, release notes, change history, etc. However, it is separate from the commit log or commit history, as per the comment above. === <!-- .slide: data-state="standard center" --> ## Collaborating using VC More than one... <!-- .element: class="fragment" data-fragment-index="0" --> <div style="float: left; width: 49%;" class="fragment" data-fragment-index="1"> ... source of change... <img style="height: 80%; padding-top: 20px;" src="media/versions.svg"/> </div> <div style="float: right; width: 49%;" class="fragment" data-fragment-index="2"> ... can be merged. <img style="height: 80%; padding-top: 20px;" src="media/merge.svg"/> </div> note: When collaborating you might have various versions (sets of changes) that co-exist at the same time on so-called "branches". Modern version control software can usually automatically merge multiple changes into a single document. == <!-- .slide: data-state="standard center" --> ## Branching A project can have many [branches](version_control_terminology), which may or may not get [merged](version_control_terminology) back into the main version. <img src="media/git-branching-turing-way.png" width="60%" alt="Several development branches in Git."> <small> Image by: <a href="https://book.the-turing-way.org/reproducible-research/vcs/vcs-workflow-branches">The Turing Way</a></small> **What uses can you think of to create branches, other than "feature branches"?** <!-- .element: class="fragment" data-fragment-index="1" --> note: The main branch is supposed to be a stable version, that one can mostly rely on to work as expected. Changes created in branches may get merged back into the stable version, or may persist (or die) as a parallel version Branches here are indicated as "feature branches", i.e. branches used while creating new features in a code base. Other uses of branches: - a sandbox or playground, for trying out different things without "damaging" the stable version - release management - custom/user specific changes - ... == <!-- .slide: data-state="standard center" --> ## Keeping a centralized repository <img style="height: 350px;" src="https://www.researchgate.net/profile/Mark-Ziemann/publication/371671830/figure/fig2/AS:11431281168661745@1687060872300/Distributed-version-control-Adapted-from-48.png" alt="Distributed version control. Adapted from [48]."/> <small>[The five pillars of computational reproducibility: Bioinformatics and beyond](https://www.researchgate.net/figure/Distributed-version-control-Adapted-from-48_fig2_371671830)</small> note: Collaborative code developers often make use of a remote server (like GitHub, or GitLab) as a central repository from which all other repositories derive. But this is not the only way you can use version control to do collaborative development. Separate local repositories are similar to but distinct from individual branches of the main repository. == <!-- .slide: data-state="standard center" --> ## Hosting the main repository <img src="media/repository_logos_focused.png" width="80%"> <!-- non-focused image can be found until SHA 8c658f43, v1.6.0 --> note: Many different tools exist specifically for collaborative version control of computer source code and other simple text-based documents. Git (for version control) with GitLab and GitHub (for collaboration) are the mainstream, used by many and with lots of features. We recommend against using any of the other tools unless the users already know what they are doing or have very strong reasons for doing so. GitLab is fully open source and offers a self-managed option, allowing organizations to host and manage their own GitLab instances on-premises or in their private cloud environments. This provides full control over data and customization. Conversely, GitHub is owned by Microsoft and uses (some) proprietary software, but is more widely used and more people will be familiar with the interface and functionality. It offers no self-managed option, but does allow for private repositories. === <!-- .slide: data-state="standard center" --> ## What can go wrong? <img src="media/conflict.png" height="80%"> note: If changes are made to the same section (usually the same or consecutive line(s) of text) of a document a [merge conflict](version_control_terminology) arises. Changes cannot be automatically merged, as the interpreter does not know which version or which combination to use. Human intervention is required and can involve rolling back a change, finding common ground between changes, etc. Resolving merge conflicts can be time-consuming and error prone, especially for large conflicts. Therefore it is a good idea for teams to agree on some basic practices to avoid creating conflicts in the first place. == <!-- .slide: data-state="standard" --> ## Merge conflicts If two people change the same line... ```bash This line contains a typos. ``` ... [merge conflicts](version_control_terminology) may arise: <!-- .element: class="fragment" data-fragment-index="1" --> ```bash <<<<<< HEAD (Current Change) This line contained a typo. ======= This line contained typos. >>>>>> feature-branch (Incoming Change) ``` <!-- .element: class="fragment" data-fragment-index="1" --> note: If changes are made to the same section (usually the same or consecutive line(s) of text) of a document a merge conflict arises. Changes cannot be automatically merged, as the interpreter does not know which version or which combination to use. Human intervention is required and can involve rolling back a change, finding common ground between changes, etc. Resolving merge conflicts can be time-consuming and error prone, especially for large conflicts. Therefore it is a good idea for teams to agree on some basic practices to avoid creating conflicts in the first place. == <!-- .slide: data-state="standard" --> ## Avoiding merge conflicts <div style="display: grid; grid-template-columns: repeat(2, 2fr); gap: 10; text-align: center;"> <div class="fragment" data-fragment-index="1"> <div> <img src="media/communication.png" style="height: 100px;"> <strong>Communication</strong> <ul> <li>Who is working on what?</li> <li>Follow or create common standards</li> </ul> </div> </div> <div class="fragment" data-fragment-index="2"> <div> <img src="media/updating.png" style="height: 100px;"> <strong>Frequent updates</strong> <ul> <li><a href="version_control_terminology">push and pull</a> changes regularly</li> <li>Review each other's work before merging</li> </ul> </div> </div> <div class="fragment" data-fragment-index="3"> <div> <img src="media/workflow.png" style="height: 100px;"> <strong>Organized workflow</strong> <ul> <li>Commit hygiene and feature branches</li> <li>Avoid "scope creep"</li> </ul> </div> </div> <div class="fragment" data-fragment-index="4"> <div> <img src="media/tools.png" style="height: 100px;"> <strong>Tools are your friends</strong> <ul> <li><em>Kanban boards</em> for task assignment</li> <li><em>linters</em> for style adherence</li> </ul> </div> </div> </div> <small>All icons from <a href="https://icons8.com/icons/pricing">Icons8</a></small> note: Keep in mind, that it is unrealistic to prevent all conflicts. This is fine, but they will need some attention to resolve. Communication is key! In general: The more files you touch, the shorter the branch should live to avoid merge conflicts A strategy to specifically avoid is a "one branch per student" model: such branches usually end up diverging so much that they end up never getting merged. == <!-- .slide: data-state="standard" --> ## Short-lived Feature Branches One dedicated branch for one task (feature) <div class="fragment" data-fragment-index="1"> <strong>Iterate quickly:</strong> <br> <ul> <li>create branch</li> <li>make changes</li> <li>merge changes</li> <li>delete branch</li> </ul> </div> Note: A task can be anything, small or large: fixing a typo, updating a reference, adding some documentation, fixing a bug, adding some new functionality, improving performance, performing a backend rewrite. Deleting branches after merging helps keeping the overview of what is actively being worked on. === <!-- .slide: data-state="standard" --> ## Key Points - Version control is like unlimited undo in MS Word... and more! - Version control streamlines working in parallel - A remote repository is often used as central hub for collaborative development - Communication is key to avoid conflicting versions of the same documents === <!-- .slide: data-state="keepintouch" --> [www.esciencecenter.nl](https://www.esciencecenter.nl) info@esciencecenter.nl 020 - 460 47 70