May we suggest...

Infrastructure

Tainting and Labeling Kubernetes Nodes to Run Special Workload — A quick guide that is finally NOT confusing

All right folks, I intend to keep this one short and that’s what I will do. I mean, it’s supposed to be easy but the official documentation(1, 2) makes it unnecessarily confusing. So I think maybe I can help to fill in the gap.

I will be using one of our business requirements at Buffer in this project, as an example for this blog post.

Quick recap

So, we need a few nodes that are dedicated to running cronjobs, and nothing else. At the same time, we want to make sure the cornjobs are scheduled to these nodes, and nowhere else. This means we need 2 things

  • Tainted nodes that don’t take other workloads
  • The workload that only goes to the destination nodes

Now, let’s start from nodes, then the workload

Nodes

Since the requirement is broken down to 2 aspects (see above), there are 2 things we will need to specify for nodes. As always, kops is my weapon of choice.

In kops, you can do this kops edit ig <INSTANCE GROUP IN INTEREST>

Tainting nodes

This prevents other workloads from being scheduled to them. It’s achieved by these 2 lines

Labeling nodes

This helps a specialized workload to locate the nodes. It’s achieved by these 2 lines

I know there are people who don’t use kops out there. If you are one of them, here are 2 commands to help

kubectl taint nodes <NODE IN INTEREST> dedicated=frequent-cronjob-nodes:NoSchedule

kubectl label nodes <NODE IN INTEREST> kops.k8s.io/instancegroup=frequent-cronjob-node

Workload

Similar to nodes, we will need to do 2 things to the deployment/cronjob yaml file. I’m including a complete yaml to save our eyes from this (yeah, you know what I’m talking about).

Tolerating taints

This makes sure the workload can be scheduled to the tainted nodes. It’s achieved by these lines

Specifying destination nodes

This makes sure the workload is only to be scheduled to the specified nodes. It’s achieved by these 2 lines

Profit

This is it. We can now rest assure the right workload will be going to the right nodes. In this way, we can start building some specialized node groups for specialized workloads, say GPU nodes for machine learning or memory-intensive nodes for local caching.

I hope this helps in any way. Until next time, please feel free to hit me up on Twitter should you have any questions. 😀

80,000+ social media marketers trust Buffer

See all case studies