Azure Kubernetes (AKS) Security Best Practices Part 2 of 4: Networking

In part one of this series on Azure Kubernetes Service (AKS) security best practices, we covered how to plan and create AKS clusters to enable crucial Kubernetes security features like RBAC and network policies. We also discussed best practices for creating secure images to deploy to your AKS cluster and the need for performing regular vulnerability scans on those images.

This post will cover topics related to the networking infrastructure of AKS clusters and suggestions for locking those networks down to protect against external attacks and internal misconfigurations of a cluster’s workloads.

Networking

Zero-trust principles have gained a great deal of acceptance as security best practice, especially for orchestrators like Kubernetes that can host multi-tenant workloads on shared infrastructure. The practice involves using tight access controls, following the principle of least privilege when granting access, encrypting all in-flight data, and requiring proof of identity from other services instead of blindly accepting connections just because they originated from the same shared network. Adopting a strict zero-trust environment requires a great deal of planning and management of the infrastructure and applications, but even achieving a subset of those goals can greatly benefit the security of your AKS clusters and their workloads.

Limit Node SSH Access

Why: By default, the SSH port on the nodes is open to all pods running in the cluster. Preventing direct SSH access from the pod network to the nodes helps limit the potential blast radius of damage if a container in a pod becomes compromised. Azure’s recommended method of getting ssh access to nodes, via a jump pod deployed in the AKS cluster, relies on allowing SSH access from the pod network to the nodes. You can create and use a bastion VM instead.

What to do: Find the Network Security Group(s) for your AKS subnet(s). Add an inbound security rule with a low, unused priority number and the following values:

Source address prefixes: The CIDR block(s) of the AKS subnet(s) in the VNet
Destination port ranges: 22
Destination source prefixes: VirtualNetwork
Protocol: TCP
Access: Deny

Note that Azure does not officially support using network security group rules to limit a subnet’s internal traffic for AKS (see the note in the introduction section of the linked page), but the above rule still works as expected.

You can also block pod access to the nodes’ SSH ports using a Kubernetes Network Policy, if enabled in your cluster. Combining both methods provides the best protection. However, the Kubernetes Network Policy API does not support cluster-wide egress policies; network policies are namespace-scoped, which requires making sure a policy is added for each namespace, which requires ongoing vigilance. For clusters using the Calico CNI for network policy, users have another option. Calico supports additional Kubernetes resources beyond the standard Kubernetes network policy types, including GlobalNetworkPolicy, which can apply to the entire cluster.

Limit Network Access to the Kubernetes API

Why: By default in AKS, the Kubernetes API service for each cluster has a public IP address and has no firewall restrictions limiting access. Securing the API endpoint by limiting access only to the IP addresses which absolutely require access reduces risk due to unpatched or future vulnerabilities in the Kubernetes API service, exploitation of stolen credentials, and DDoS attacks.

What to do: If you aren’t using a private AKS cluster, which does not have a public endpoint for the Kubernetes API, you can follow these instructions to restrict the public IP addresses able to access an AKS cluster’s Kubernetes API.

Block Pod Access to VM Instance Metadata

Why: The Azure VM instance metadata endpoint, when accessed from an Azure VM, returns a great deal of information about the VM’s configuration, including, depending on the VM and cluster configuration, Azure Active Directory tokens. This endpoint is accessible by any AKS container on the node by default. Most workloads will not need this information and having access to that information can carry substantial risks.

What to do: Add a network policy in all user namespaces to block pod egress to the metadata endpoint.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: deny-instance-metadata
spec:
  podSelector:
    matchLabels: {}
  policyTypes:
  - Egress
  egress:
  - to:
    - ipBlock:
        cidr: 0.0.0.0/0\t# Preferably something smaller here
        except:
        - 169.254.169.254/32

Or, if you are already using network policy allow lists to control pod egress traffic, be sure to exclude 169.254.169.254/32 from the allowed IP blocks.

If some workloads absolutely need access to the VM metadata, you can either add exceptions with label-scoped policies, or use a service mesh which can allow path-based routing restrictions to an endpoint.

Restrict Cluster Egress Traffic

Why: By limiting egress network traffic only to known, necessary external endpoints, you can limit the potential for exploitation by compromised workloads in your cluster.

What to do: AKS provides several options for controlling cluster egress traffic. They can be used separately or together for better protection.

Use Kubernetes network policies to limit pod egress endpoints. Policies need to be created for every namespace or workload.
Use the Calico Network Policy option in AKS, which adds additional resource types to Kubernetes Network Policy, including a non-namespaced GlobalNetworkPolicy.
Use an Azure firewall to control cluster egress from the VNet. If using this method, note the external endpoints that the cluster’s nodes (not necessarily the workload pods) need to reach for proper functionality and make firewall exceptions as needed.

Disable HTTP Application Routing

Why: The AKS HTTP Application Routing add-on is an ingress controller that creates public DNS entries for Kubernetes Services deployed in a cluster and exposes them on the cluster load balancer. While this feature can simplify development and testing of applications, it lacks options for securing the exposed services, such as being able to secure HTTP services with TLS. It should never be enabled in a production cluster, and even for development clusters, it can pose a risk by exposing untested, unsecured applications to the Internet.

What to do: Do not enable the HTTP Application Routing add-on for any cluster. If it has already been enabled in a cluster, remove it.

Deploy a Service Mesh

Why: Service meshes have emerged in the past few years to address the networking complexities of microservices. Different service mesh offerings have different feature sets, but most of them offer advanced traffic management, observability, and critical zero-trust network support by providing seamless pod-to-pod encryption, authentication, and authorization.

What to do: Evaluate which service mesh offering best meets your organization’s needs. Istio, Linkerd, and Consul can all be deployed to AKS clusters. Service meshes can be quite powerful, but they can also require a great deal of configuration to secure workloads properly.

Coming up in part three of this series, we will cover the variety of options available in AKS and Kubernetes for enforcing secure best practices for your container runtimes.