Whitelist Ingress Access into AKS Clusters

If you read my last post on setting up ExternalDNS and CertManager on AKS you may have noticed that ingress to published services is open to Internet traffic. In this post I’ll look at two simple ways to lock down services so that only specific addresses have ingress access. First we’ll cover how to setup a network security group on the agent pool subnet to limit access and then we’ll take a look at some features of NGINX that offer a bit more flexibility to who has access to what. We’ll also look at some changes we’ll need to make to cert-manager once ingress is locked down.

Problem

In my last post I setup cert-manager and external-dns services on AKS so that DNS records and SSL certificates were managed automatically for services deployed to AKS. However, in that configuration one of the things you may have noticed is that ingress was open to the Internet. Anyone who stumbles onto your service would have access to attempt logins or, in some cases if you don’t have authentication, use it! What we’d like to do is limit access to our services based on network CIDR.

Solution

Agent Pool Subnet NSG

The first thing to note about limiting access to your cluster is that AKS creates an NSG for the agent pool subnet when the cluster is created. When you install NGINX this NSG is modified to include inbound port rules. By default these are ports 80 and 443. If you manually modify the NSG rules everytime the security group is modified by NGINX your changes will be overwritten. However, we can use the LoadBalancerSourceRanges option of the Kubernetes service that manages the LoadBalancer. This will implement Source rules on the NSG managed by NGINX. This way when NGINX modifies any inbound rules the source range will be maintained and managed.

apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  ports:
  - port: 8765
    targetPort: 9376
  selector:
    app: example
  type: LoadBalancer
  loadBalancerSourceRanges:
  - 130.211.204.1/32
  - 130.211.204.2/32

Helm also has a option for implementing this feature on the Service.


>> CIDRS=130.211.204.1/32,130.211.204.2/32
>> helm upgrade nginx-ingress stable/nginx-ingress \
      --wait --namespace ingress --reuse-values \
      --set controller.service.loadBalancerSourceRanges={$CIDRS}

At this point if you view the inbound rules on the NSG attached to the agent pool subnet you’ll see that the Source is now set to the CIDRS you provided.

NGINX Ingress Options

Using the Agent Pool NSG is a great way to manage inbound ingress access but it has the limitation of being inflexible. You are locking down ingress except to the specified CIDRs so if you want some services to be exposed to certain IP ranges while other services have different IP ranges you don’t have that kind of flexibility. Enter NGINX.

NGINX has lots of configuation options available. The whitelist-source-range option sets the IPs for a particular Ingress. This can be done globally at the Service level, only at the Ingress level or you can use both. The global option will be the default and the individual Ingress objects can override the global option.

The global option can be set using a ConfigMap that overrides the default vaules for the NGINX service. If we use Helm again to configure this option you’d have something like the following:


>> CIDRS=130.211.204.1/32,130.211.204.2/32
>> helm upgrade --install nginx-ingress stable/nginx-ingress \
  --wait --namespace ingress --reuse-values \
  --set controller.config.whitelist-source-range=$CIDRS \
  --set controller.service.externalTrafficPolicy=Local 

In order to use whitelist-source-range NGINX will need access to the external IP of incoming requests. Typically NGINX would see the NAT’d Kubernetes address but because we want to limit which external IPs have access we’ll need NGINX to have access to the request’s source IP. This can be done by setting controller.service.externalTrafficPolicy=Local on the NGINX controller.

Again, this sets the global configuration of NGINX so every Ingress option will inherit this whitelist unless overridden at the Ingress level. This can be done by setting an annonation on each Ingress object that you want to expose to a different set of IPs. For instance if we have a service we want to open to all traffic we can use an Ingress object similar to the following to override the global whitelist.

---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: nginx
  annotations:
    nginx.ingress.kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/whitelist-source-range: "0.0.0.0/0"
    cert-manager.io/cluster-issuer: letsencrypt
spec:
  tls:
  - hosts:
    - nginx.example.com
    secretName: tls-secret
  rules:
  - host: nginx.example.com
    http:
      paths:
      - backend:
          serviceName: nginx-svc
          servicePort: 80
        path: /

Let’s Encrypt Changes

NOTE: If you used the scripts in the last post to setup cert-manager and external-dns you’ll have to make some changes to cert-manager before locking down your services. This is because the last post used the http resolver for Let’s Encrypt to verify domain ownership. If you limit ingress to your cluster Let’s Encrypt won’t be able to verify your domains any longer using http.

Let’s Encrypt has different types of verification for domain ownership and one supported option on Azure is DNS. When you use DNS to verify domain ownership Let’s Encrypt will request certain txt records be created that it can verify before issuing SSL certificates. Since we used Azure DNS Zones for our DNS entries in the last post we’ll make a few modifications so that cert-manager can modify DNS records.

To enable this feature we’ll need to do two things. First, we’ll need to create a Kubernetes secret that contains the password of a service principal that external-dns can use to modify DNS records. Secondly, we’ll also need to change the ClusterIssuer from http to dns. The following script fragments can be used with the information from a service principal. The service principal created will need Contributor access to the DNS Zone to allow cert-manager to make the appropriate changes. If you use the same service principal we created last time that should work just fine.

SP_APPID={Service Principal AppID}
SP_PASSWORD={Service Principal Password}
SP_TENANTID={Service Principal Tenant}
SUBSCRIPTION_ID={Subscription Id}
AZURE_CLOUD={Which Azure Cloud You are Using}
DOMAIN_NAME={Domain that will be managed}
DNS_RG={Resource Group for DNS Zone}
EMAIL={Let's Encrypt Email Address}

>> helm install cert-manager \
    --wait --namespace cert-manager \
    --version v0.13.0 \
    jetstack/cert-manager

>> kubectl create secret generic azuredns-config \
    --from-literal=client-secret=$SP_PASSWORD \
    -n cert-manager 

# Create Prod LetsEncrypt ClusterIssuer
>> cat <<-EOF | kubectl apply -f -
apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
  name: letsencrypt-staging
spec:
  acme:
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    email: $EMAIL 
    privateKeySecretRef:
      name: letsencrypt
    solvers:
    - dns01:
        azuredns:
          clientID: $SP_APPID 
          clientSecretSecretRef:
            name: azuredns-config
            key: client-secret
          subscriptionID: $SUBSCRIPTION_ID 
          tenantID: $SP_TENANTID 
          resourceGroupName: $DNS_RG 
          hostedZoneName: $DOMAIN_NAME
          environment: $AZURE_CLOUD 
EOF

So at this point you should have cert-manager configured to create the necessary DNS records so that Let’s Encrypt can verify domain ownership. Once this is completed and working you can lock down ingress into your Kubernetes cluster.