Andre's Blog
Perfection is when there is nothing left to take away
What is build metadata good for?

Build metadata in Semantic Versioning is a quite commonly misunderstood concept, which often sparks passionate online discussions on whether build metadata should be allowed in package repositories or not, and some of the confusion around this topic even seeps through into prominent online services and applications.

What is build metadata, anyway?

Build metadata is tucked at the end of the application version, such as 004 in 1.2.3+004 or 2020-12-05.1 in 1.2.3+2020-12-05.1, and identifies the underlying build information, which plays important role in tracking application builds towards a future release and, to a smaller extent, in being able to identify the exact build of a released application.

The latter point is more relatable to developers who deal with application binaries, which not always can be easily traced back to a source commit or a specific pipeline build just by the application version. Debugging a crash dump one would always look at the application version and the build number inside a crash dump in order to use correct debug symbols, which must be preserved by the build system because they cannot be built again from a source tag.

The former point, however, is where waters get muddy, probably because popular online sources describe what build metadata is, but not what it is supposed to be used for. The Semantic Versioning specification describes build metadata as follows, but provides little detail on how it is intended to be used and why it is excluded from version ordering.

10. Build metadata MAY be denoted by appending a plus sign and a series of dot separated identifiers immediately following the patch or pre-release version. Identifiers MUST comprise only ASCII alphanumerics and hyphens [0-9A-Za-z-]. Identifiers MUST NOT be empty. Build metadata MUST be ignored when determining version precedence. Thus two versions that differ only in the build metadata, have the same precedence.

This clause is different from clauses describing other version components in that it does not require numbers to be interpreted as such, so build metadata is compared as text and 12 will compare less than 2, for example, which discourages numeric comparisons, even though it can be worked around by adding leading zeros.

This leaves many people wondering how one would distinguish application versions 2.0.0+025 and 2.0.0+032 and how to maintain build metadata in packages, given that most public repositories, like npm, silently drop build metadata.

The short answer is that build metadata has little value after an application has been published to a public repository, but is absolutely invaluable when an application is being prepared for the next release. Let's have a closer look at each of these cases.

Application consumer view

Consider a release timeline of a typical database-driven application, as seen from the application consumer's point of view. Note that the diagram below is a calendar timeline and not a source repository branch.

In this timeline, the application was released as version 1.0.0, followed by two patches with bug fixes, versioned as 1.0.1 and 1.0.2. After the last patch release, the application vendor announced a future date for a new version 2.0.0.

Application Consumer's View of a Release

The application consumer can reasonably expect that when 2.0.0 is released, they will be able to upgrade from version 1.0.2 to version 2.0.0, which will be done via vendor-provided application upgrade tools for the database schema and data.

Notice that nowhere in this view the application consumer can see build metadata. Even if they choose to participate in a beta testing program, they would install a pre-release version, such as 2.0.0-beta.1, and would never see build metadata in the form of 2.0.0+025.

From the application consumer's point of view, there is no such thing as package build metadata. The only place where they may see build information is some hidden corner of the application UI.

Development and QA teams view

The same timeline looks quite different behind the curtain, where the development and QA teams work their magic towards the release of version 2.0.0.

From QA's perspective, every application package they receive from the development team may potentially be released without having to repackage it in some way, so they expect each package versioned as 2.0.0 and the only way QA can distinguish these packages is to use build information. This is where it gets interesting.

People intuitively expect that it should be possible to upgrade the installed version 2.0.0+025 to a new version 2.0.0+032 because otherwise it would be a dead-end installation. Well, it actually is, and not because Semantic Versioning calls for it, but rather because application upgrades are expected to move application state from one released version to another, not between different builds of the same version, and Semantic Versioning just documents it.

Upgrading application between builds does not make much sense because it would be used only internally and only until a release, so all the effort that would go into working with rather volatile code, database and configuration, would largely be in vain.

As a side note, applications with large databases may benefit from incremental upgrades, but those upgrades would typically be done based on the presence of various application state bits, such as database columns or configuration attributes, not using application version and build information.

With this in mind, QA are not supposed to just install each new build on top of the previous one, which would require an upgrade from 2.0.0+025 to 2.0.0+032, but instead are expected to restore their test database to the version 1.0.2 state, as the application consumer would have it, and then upgrade application from version 1.0.2 to 2.0.0, which just happens to be 2.0.0+032 under the covers in this scenario. This process would repeat for every new build, as shown on the diagram below.

Development and QA Teams' View of a Release

Once QA pick a release candidate, it will be deployed to the staging environment and the same upgrade pattern will be repeated by the team managing application staging for every new release candidate. The last release candidate will be published for the intended application audience as an official release.

Throughout the entire process, actual application packages would be used, not some specially crafted ones to accommodate building, testing and versioning requirements. This gives QA confidence that the release package is the exact same package they thoroughly tested and not a new package built again from the same source tag or repackaged from the same artifacts.

Build metadata vs. pre-release versions

When package repositories intended for public package distribution are used in build pipelines, people face the fact that such repositories will not accept packages that differ only in build metadata and often end up following the common online advice to use pre-release versions for tracking build numbers (e.g. 2.0.0-25), which is even promoted by Microsoft in their Azure DevOps docs.

This workaround may seem as a workable solution at first, but it is fundamentally broken in a couple of ways.

Intermediate build packages disguised as pre-releases have to provide working upgrade functionality, so developers have to write unnecessarily complex upgrade scripts, compared to upgrade scripts working against a well-defined database schema of the previous officially released application version. Not only this would be a throw-away effort that is used only once, but it also means that the actual upgrade that would be performed by the application consumer will not be tested because of these internal pre-release upgrades were done in testing.

After all testing is done against the final pre-release package, in order to produce an actual release package, it would have to be repackaged or a new release package would have to be built from the same artifacts or same source tag, which introduces new steps into the release process and may break during repackaging or in the release pipeline.

A pre-release application published to a public repository without repackaging will show it with a note that this release may be less stable, which is not something one would want on their new shiny release that is supposed to be a stable release. Although this point is moot for applications that are only released for their own deployment teams to use.

Lastly, if an actual pre-release is planned, things get even more complicated because various release management integrations will need to distinguish between pre-release version variants intended just for tracking build information and those used as actual pre-release versions.

A pre-release version is intended to indicate that a publicly-released application is less stable or is not feature-complete, but otherwise is no different from a target application release and is subject to version ordering and possible upgrades to and from this version, depending on the application complexity and quality of support. Build metadata is intended to track build maturity when preparing an application for a public release of any kind, including a pre-release.

It's better with build metadata

Application versions with build metadata should not be viewed as a sequence of versions with an upgrade path between them, but rather as a pool of candidates for the planned application release at a specific version planned by the product owner. This candidate pool may contain package versions 2.0.0+025, 2.0.0+026 and 2.0.0+032, but as far as all development and QA environments can see, version 2.0.0 is resolved as the package file versioned as 2.0.0+032 and all other package files are invisible.

A poor man's way to maintain such pool of packages is to use multiple network shares to store packages and move them between these shares as a way to promote packages to the next release level.

For example, a build pipeline may name package files to include build metadata, such as myapp-2.0.0+030.tgz, and may place them on a network share used by the automated integration testing environment. One of package files with a passing integration testing grade may be moved (promoted) to a QA share for testing and later to a release candidate share and, finally, to a release share, at which point it can be published to a package repository, which will drop the build metadata.

A better and more straightforward way to do the same is to use a package repository that recognizes build information and resolves it into the application version that is being worked on. For example, JFrog's Artifactory will accept packages with the same version and different build numbers and will expose one of them as the one and only application package available in this repository for this version.

Multiple repositories may be set up for different stages of application testing, such as a repository for integration testing, one for QA and one for staging, and each stage will work with the last promoted build number from the previous stage as if it is the only application version that ever existed.

JFrog's Artifactory Build Integration

Another advantage of build package repositories is that such repositories can be configured to connect to a number of public repositories transparently, as if those public packages came from the same repository configured for its stage, which means that QA just have to manage one package repository URL instead of deal with a mix of package files and standard repository installations.

It is worth noting that while artifact feeds in Azure DevOps resemble Artifactory and even use similar terminology, such as promoting packages through release views, artifact feeds do not allow build numbers in package versions and expect people to use pre-release versions instead, which makes these artifact feeds fairly useless for build promotions described above. At best, they can be used as internal repositories for private released packages, with transparent upstream package feeds.

Privately Deployed Applications

Website-type applications that are deployed within a single organization may not even require Semantic Versioning because their deployments (releases) are focused on specific features and not on a traditional set of loosely-related features grouped for a public release under a semantic version that hints a potential risk level for all of those features applied on top of the previous version.

Such applications may have new features deployed weekly, with simple and complex features intermixing in any way Product Owner sees fit. Trying to juggle semantic versioning for such deployments could be counterproductive, as major and minor versions could be jumping up every week when large feature are queued for deployment week after week.

However, all these features still need to be built and tested at various deployment stages and builds need to be identified at each stage, not only for automated consumption, but also for team members to be able to communicate which build works and which does not in conversations, as well as to identify which specific build some feature was released in.

This makes an interesting point that one can have an application without an explicitly spelled-out multilevel version, but not without a build number, whether it is explicitly expressed in package names or implied by the build pipeline machinery.

For example, Azure DevOps build pipeline may produce files myapp-backend.zip and myapp-ui.zip as build artifacts and will have build information stored in a set of build pipeline variables, such as Build.DefinitionName and Build.BuildNumber. These artifacts and variables may be fed into a release pipeline, which is structured as a sequence of stages, each representing a deployment environment. Intermediate stages are used for testing and the final stage is the production stage. Build artifacts are passed from a stage to a stage, along with build pipeline variables, which makes build information unnecessary in file names, as the first set of artifacts that makes it to the production stage is the actual release.

As a side note, some privately deployed applications may be maintained by different development teams within the same organization and may still benefit from Semantic Versioning, as if their components were publicly released.

Build metadata vs. build numbers

Build metadata is the term used for build identification in Semantic Versioning and it may be a bit misleading because it provides little detail on how build identifiers could be structured and some of the examples even suggest that it does not have to be sequential, such as the one that contains a commit hash, which does work as a pure build identifier, but makes it harder to locate builds without some additional information, such as a build date.

Another common name for a build identifier is a build number, which implies some form of sequencing and makes it easier to find a specific build among a set of builds ordered in some predictable way, as new builds keep coming off of a build pipeline.

Build identifiers point to the location of all build artifacts for each build, not only those that end up in the released package. These additional artifacts may include a list of addressed issues, debug symbols, various test results, code coverage results and other similar artifacts, some of which cannot be produced reliably later, even for the same source repository tag.

In practical terms, many people choose to maintain build numbers in the source. Perhaps because many applications can print build numbers along with the application version, so source code may sound like a good place for it. I used to do this as well, until I realized that build numbers do not belong in the source for a couple of reasons.

Probably most important one is that the same source may be used to produce different builds, so having one shared build counter in the source will produce confusing results.

For example, if myapp is being planned for a 2.1.0 release, feature branches my-12 and my-34 may be created for development. Each of them may start with the same source, so each will have the same initial build number and CI builds on each branch will produce myapp-2.1.0+005.tar.gz and there will be no way to tell them apart if any of these artifacts ends up in a shared location. Similarly, CI builds and nightly builds will use the same build counter and may also cause confusion.

Less obvious reason is that one needs to increment build numbers at some point and some try to tie this increment to a successful build, so they check out the source, increment the build number, make a build and, if it was successful, commit the changed build number to the source repository. Problem is, the tagged commit number from which the build was made does not contain the incremented build number, which may be a source of confusion when troubleshooting issues later.

Build numbers belong in the build system and some of the newer build environments accommodate more complex requirements for build numbers, which make them more like what Semantic Versioning suggests, build metadata, because they contain more than just numeric sequences.

For example, Azure DevOps pipelines allow naming builds in the pipeline YAML, so one can have a version-only name for a release branch:

name: 2.1.0+$(Rev:r)

, and a name comprised of a version, branch name, which typically is named after its user story, and a numeric build number in the feature branch:

name: 2.1.0+$(Build.SourceBranchName).$(Rev:r)

The resulting build name is exposed in a variable $(Build.BuildNumber), so the first pipeline will produce an artifact myapp-2.1.0+17.tar.gz and the second one will produce myapp-2.1.0+my-12.2.tar.gz. Build pipelines also allow explicit counters for arbitrary prefixes, which allows one to create a build number maintained by the pipeline that can be passed into the application (e.g. $[counter(variables.VERSION, 1)]).

Not all popular build environments are as good for tracking build numbers, though. For example, GitHub actions do not allow variables in workflow names and share the same run number counters between all source branches, regardless of the workflow name, so build numbers on a single branch would skip numbers when other branches are built. Even worse, GitHub does not distinguish workflow names from workflow run names and merely uses the workflow name property in YAML to report both, so when workflows with different names from different branches run, all past workflow runs get the name of the currently running workflow. It's quite a mess.

Conclusion

Product versions are managed by product owners and indicate the level of changes in the released application, compared to previous versions of the same application, hinting how much of a risk would it be to upgrade to the new version. Version component values are always assigned by the product owner and even when their increments are automated, the logic of what is incremented and when must be controlled entirely by the product owner to produce meaningful versions.

Build metadata, on the other hand, uniquely identifies every build and is used mostly by development and QA teams when working towards the next application release. Build metadata is always auto-advanced monotonically by the build system within the application version that is being prepared for a release. Build metadata plays absolutely no role in managing released packages and application upgrades.

Confusion between these two usually starts with not being able to upgrade packages that only differ in build metadata and eventually ends up in using either the pre-release part of the version or the patch level as a surrogate build number, which results in releasing untested repackaged artifacts or in patch levels skipping numbers or in pre-release packages being used as final packages. Any of these and similar hacks go against the grain of what Semantic Versioning is set to accomplish.

Diagrams in this post are created with app.diagrams.net.

Comments:
Name:

Comment: