It's time for standardized labels

We will hate ourselves if we don't do this soon

Mar 06, 2021

Here's a list of technologies that I'd wager you have at least one or two of in your stack:

Amazon Web Services / Google Cloud / DigitalOcean
Kubernetes / Nomad
Prometheus / DataDog / SignalFX
ElasticSearch + Kibana / LogDNA / Splunk
FireHydrant (right?)

You might look at this list and go, "that's a pretty typical stack nowadays," and you'd be right. But all of these technologies also support labels.

What is a label?

A label is a method of organizing something. Labels are used to attach small pieces of information to an asset. They're effective ways of denoting what something is, its expiration date or manufactured it. Labels are everywhere, from a banana in the grocery store to your Kubernetes deployment running in AWS.

Software is badly labeled.

We've all collectively (whether we realize it) have agreed on one label: name—the name of our repositories, Kubernetes deployments, AWS account names, domains. We're really good at giving things a name consistently. Names are an easy way to identify something; they always will be.

However, the name label is no longer adequate for the complexities of our jobs as engineers. We need to know what revision a process is currently running. We need to discover every memory metric for all of our Go applications. We need to know which processes are a part of our Kafka pipeline. Sadly, the lowly name label cannot deliver on these needs.

The power of keys and values

It is the absolute wild west out there when it comes to labeling software systems. For example, no one has agreed on the same term in the first place. AWS and DigitalOcean use "tags" on their assets. Kubernetes and Prometheus have settled on "labels." Fundamentally, labels and tags aren't very different. A label is a key/value; a tag is the application of that label to something.

A name sticker at a speed dating event is the label, and placing it on your nicely pressed shirt makes it a tag. You look great.

Having a key with a value is simple but powerful. When you introduce the concept of a key that has a value, it means you can search on the value's intent, not just "hey do you have this value?". It aligns the value of the label to something tangible. It's why we see "Hi, my name is" on those stickers for awkward social events.

Keywords, on the other hand, are unstructured words attached to an asset. When read by a person, keywords have implied keys to humans, but to computers, they're utterly meaningless. For example, if I keyword a DigitalOcean droplet "rails@5.2.1", I can read that on a screen and go, "oh, this is probably a rails application." But ask a database to give you a list of rails, and now it has to search every keyword and effectively guess what you mean.

Standard labels will set us free.

We have far too many things operating in our software stacks to not use standard labels on them anymore. Kubernetes has recommended labels, but that's Kubernetes. We need to follow the FDA's example and create a labeling standard that everything uses. Using that list above, we'd be empowered to find all related assets in our system using a standardized labeling scheme. We need to define standard labels for assets that can be applied to all layers of the stack. My proposal: A standard called OpenAsset.io

The labels

This standard does not need 100 label definitions to be successful. It needs to cover the most extensive use case while also not tiring out hands. If a labeling standard has too many options, it becomes less and less valuable. Automation tools become harder to maintain, and engineers will simply omit labels because it becomes redundant typing them (admit it, you've done this).

Format

All open asset label keys start with "openasset.io" and must be a valid URL. A valid URL ensures other parties (vendors, internal tools) can parse keys quickly. Every language has a URL parser. By using a domain/path format, it makes it easy to create a statement that recognizes the host "openasset.io"

Specified keys and their purpose

openasset.io/name: "laddertruck"

The most straightforward label of all, what is the name of this asset?

openasset.io/language: "ruby"

What programming language is this asset/application written in?

openasset.io/language-version: "3.0.0"

What is the language of the version of the programming language used?

openasset.io/framework: "rails"

If this asset uses a framework, which framework is it?

openasset.io/framework-version: "5.2.1"

What is the version of the framework this asset is using?

openasset.io/component: "web"

What component of your stack is this asset a part of?

openasset.io/deployed-by: "weave"

What deploys this asset to an environment?

openasset.io/revision: "3c12d41301a7eca481c8eda0564d79a935bafd27"

What is the revision for this asset? This can be a git commit, semver, etc.

openasset.io/tier: “4”

What service tier does this asset have?

An example with Kubernetes

Kubernetes is probably the best example of how this standard could be applied. The API allows filtering by key presence and value set.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: pubsub
  namespace: laddertruck
  labels:
    openasset.io/name: "laddertruck"
    openasset.io/language: "ruby"
    openasset.io/language-version: "3.0.0"
    openasset.io/framework: "rails"
    openasset.io/framework-version: "5.2.1"
    openasset.io/component: "laddertruck"
    openasset.io/deployed-by: "weave"
    openasset.io/tier: "4"
    openasset.io/revision: "3c12d41301a7eca481c8eda0564d79a935bafd27"

My vision for this labeling standard is knowing I can take the labels above and quickly search in my logs for openasset.io/revision: "3c12d41301a7eca481c8eda0564d79a935bafd27" and instantly see all records for that revision. This becomes especially powerful when combined with your infrastructure provider. If I have a database hosted on AWS RDS powering Laddertruck web, I can tag that database in AWS with openasset.io/component: "laddertruck".

What does this unlock?

Discoverability

The most apparent advancement teams gain by utilizing a standardized labeling scheme is discoverability. Logs, metrics, deploys, etc, are all easily found since there’s no variance in the labels used. Knowing what assets are running and what they do helps breaks down village knowledge barriers.

Billing insights

When you tag your infrastructure with standard labels, it makes billing insights and management easier to understand and maintain. Most infrastructure providers (such as GCP and AWS) allow getting insights on asset spend filtered by labels.

Access management

Using an open labeling standard allows building tools that enforce rules about access such as SSH, resource creation, etc.

Incident management

By labeling assets with the keys/values proposed, the service's incident management process becomes even more flexible. Defining tiers, components, and revisions empowers responding engineers to have more context about the degraded service.

Why do I care

I've built several internal tools in my career. I worked on the internal inventory management at DigitalOcean (named Atlantis) for a bit. I helped build service deployment and discovery at Namely, and now I'm making FireHydrant which offers service catalogs. After years of watching CNCF technologies explode, everyone moving to the cloud, and orchestration take over like Kubernetes, it has become painfully evident that our industry needs this standard. We currently live in an unorganized hell of infrastructure and service. It's time we get organized.

The Thought Drop

Discussion about this post