Azure Kubernetes (AKS) Security Best Practices Part 1 of 4: Designing Secure Clusters and Container Images

Microsoft’s Azure Kubernetes Service (AKS), launched in June 2018, has become one of the most popular managed Kubernetes services. Like any infrastructure platform or Kubernetes service, though, the Azure customer has to make important decisions and formulate a plan for creating and maintaining secure AKS clusters. While many of these requirements and responsibilities apply to all Kubernetes clusters, regardless of where they are hosted, AKS also has some specific requirements that the platform users must consider and act on to ensure that their AKS clusters and the workloads their organization runs on them will be safeguarded from possible breaches or other malicious attacks.

This post launches a four-part series going over best practices for creating and operating secure AKS clusters and the containerized applications that run on those clusters. In this installment, we will focus on what you need to know when planning and creating your AKS clusters. We will also cover another fundamental piece of secure AKS workloads: building and using secure container images.

Cluster Design

Securing AKS clusters require a security-minded design. Understanding the fundamentals of Kubernetes security and specific AKS security options before creating clusters will make it easier to secure and manage clusters.

Some critical AKS security features can only be enabled at cluster creation time. For existing clusters which were not created with those features, it is highly advisable to create new clusters and migrate existing workloads.

Using consistent configurations across all clusters will also make them easier to manage via automation and prevent issues stemming from an incorrect assumption that all clusters have the same protections. In particular, clusters used for different application lifecycles, like development and staging, should have the same security settings as production. Identical environments allow for testing the security posture of a cluster and its workload before promotion to a production environment. This practice also helps to ensure applications running on clusters with those settings will still function as expected by the time they are deployed to production.

Enable Kubernetes RBAC

Why: Kubernetes Role-Based Access Control provides the preferred method for controlling authorization for a cluster’s Kubernetes API, both for users and for workloads in the cluster. AKS adds the option of integrating K8s RBAC with Azure Active Directory, which can be enabled at any time for a cluster.

What to do: RBAC currently can only be enabled at AKS cluster creation time. It is enabled by default for new clusters.

Enable Network Policies

Why: Kubernetes Network Policies provide firewall controls to segment network traffic to and from workloads in a cluster.

What to do: Network Policy can only be enabled at AKS cluster creation time. Your options depend on your cluster’s network type.

Clusters using basic networking, with the kubenet plugin for the cluster network, support using the Calico CNI (Container Network Interface) to implement network policies.
For clusters with advanced networking, which uses the Azure CNI, users can choose between Calico and Azure Network Policy.

Regardless of which implementation you use, both should function identically, and you can use the same Kubernetes network policy manifests with either provider.

Create Private Clusters

This feature is currently in preview and therefore may not be suitable for production use until it reaches general availability.

Why: Private AKS clusters have all their control plane components, including the cluster’s Kubernetes API service, in a private RFC1918 network space. This limits access and keeps all traffic within Azure’s networks. Access to the API can then be locked down by the user to specific VNets. Without this feature, the cluster’s API has a public IPv4 address and all traffic to it, including from the cluster’s node pools, goes over public network space.

Note that private clusters do have some limitations, and an Azure VM is required to use a bastion VM for connections to the cluster API not originating from the cluster’s node pools. Azure also charges for the Private Endpoint resource needed to make the Kubernetes API available to VNets.

What to do: The private cluster option can only be selected at AKS cluster creation time.

Container Images

Kubernetes workloads are built around container images, therefore ensuring that those images are difficult to exploit and are kept free of security vulnerabilities should be a cornerstone of your AKS security strategy.

Build Secure Images

Why: Following a few best practices for building secure container images will minimize the exploitability of running containers and simplify both security updates and scanning. Images that contain only the files required for the application’s runtime and which do not include frequently exploited or vulnerable tools like Linux package managers, web or other network clients, or Unix shells make it much more difficult for malicious attacks to compromise or exploit the containers in your cluster.

What to do:

Use a minimal, secure base image. Google’s “Distroless” images are a good choice because they do not install OS package managers or shells.
Only install tools needed for the container’s application. Debugging tools should be omitted from production containers.
Instead of putting exploitable tools like curl in your image for long-running applications, if you only need network tools at pod start-time, consider using separate init containers or delivering the data using a more Kubernetes-native method, such as ConfigMaps.
If you do need to install additional OS packages, remove the package manager from the image in a later build step.
Keep your images up-to-date. This practice includes watching for new versions of both the base image and any third-party tools you install.

Use a Vulnerability Scanner

Why: Using containers free of known software security vulnerabilities requires ongoing vigilance. All the images deployed to a cluster should be scanned regularly by a scanner that keeps an up-to-date database of CVEs (Common Vulnerabilities and Exposure).

What to do: Use an image scanner. Azure has a scanner service available in preview mode, or you can choose your own paid or open source scanner. Be sure to choose a solution that scans for vulnerabilities in OS packages and in third-party runtime libraries for the programming languages your software uses. For example, the Apache Struts vulnerability exploited in the Equifax hack would not have been detected by an OS package scan alone, because Struts is a set of Java libraries, and most Linux distributions no longer manage software library dependencies that are not required by system services.

To address CVEs when they are found in your internally maintained images, your organization should have a policy for updating and replacing images known to have serious, fixable vulnerabilities for images that are already deployed. Image scanning should be part of your CI/CD pipeline process and images with high-severity, fixable CVEs should generate an alert and fail a build.

If you also deploy third-party container images in your cluster, scan those as well. If those images have serious fixable vulnerabilities that do not seem to get addressed by the maintainer, you should consider creating your own images for those tools.

Future installments in this series will discuss best practices for AKS application workloads and proper care and feeding of your AKS clusters. In part two, we will cover networking safeguards to protect your AKS cluster’s nodes and infrastructure.