Using Istio with Kubernetes native sidecars on Azure Kubernetes Service

March 18, 2024, 1:16 am

≫ Next: Egress traffic blocking with Cilium cluster-wide network policies on Azure Kubernetes Service

≪ Previous: Using HTTP status code 307/308 for HTTPS redirect with the Istio ingress gateway

In my previous blog post, I showed you how to check for specific feature gates on an Azure Kubernetes Service cluster.

-> https://www.danielstechblog.io/show-enabled-feature-gates-on-an-azure-kubernetes-service-cluster/

Especially for the SidecarContainers feature gate, which is enabled on Azure Kubernetes Service running Kubernetes version 1.29 or higher.

The SidecarContainers feature gate brings support for running sidecar containers as init containers. For instance, a service mesh proxy container now starts before the main container and solves a couple of issues with service mesh proxies in Kubernetes.

It was introduced in Kubernetes version 1.28 as an alpha version and graduated to beta with Kubernetes version 1.29.

-> https://kubernetes.io/blog/2023/08/25/native-sidecar-containers/

Today, I am walking you through how to use Istio with Kubernetes native sidecars on Azure Kubernetes Service.

As stated in the Istio blog post from 2023, it is an environment variable called ENABLE_NATIVE_SIDECARS that needs to be set to true.

-> https://istio.io/latest/blog/2023/native-sidecars/

I use the IstioOperator custom resource definition to define my Istio installation configuration options in a YAML file.

The following configuration activates the Kubernetes native sidecar support in Istio.

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  namespace: istio-system
  name: istiocontrolplane
spec:
  components:
    ...
  meshConfig:
    ...
  values:
    global:
      ...
    pilot:
      env:
        PILOT_ENABLE_STATUS: true
        ENABLE_NATIVE_SIDECARS: true
    sidecarInjectorWebhook:
      rewriteAppHTTPProbe: true

After applying the IstioOperator configuration, we check if the istio-proxy is now running as an init container. For that, I deployed a simple container application in its own namespace.

❯ kubectl images -c 1,2
[Summary]: 1 namespaces, 3 pods, 9 containers and 2 different images
+----------------------------+--------------------+
|            Pod             |     Container      |
+----------------------------+--------------------+
| go-webapp-64cc9779d4-8kp7m | go-webapp          |
+                            +--------------------+
|                            | (init) istio-init  |
+                            +--------------------+
|                            | (init) istio-proxy |
+----------------------------+--------------------+
| go-webapp-64cc9779d4-f4hrf | go-webapp          |
+                            +--------------------+
|                            | (init) istio-init  |
+                            +--------------------+
|                            | (init) istio-proxy |
+----------------------------+--------------------+
| go-webapp-64cc9779d4-mrbc9 | go-webapp          |
+                            +--------------------+
|                            | (init) istio-init  |
+                            +--------------------+
|                            | (init) istio-proxy |
+----------------------------+--------------------+

As seen in the above output, the istio-proxy is now running as a Kubernetes native sidecar.

You can find the full example IstioOperator configuration file on my GitHub repository.

-> https://github.com/neumanndaniel/kubernetes/blob/master/istio/istio-1.21.yaml

↧

Egress traffic blocking with Cilium cluster-wide network policies on Azure Kubernetes Service

January 15, 2025, 11:48 pm

≫ Next: Egress traffic blocking with Calico global network policies on Azure Kubernetes Service

≪ Previous: Using Istio with Kubernetes native sidecars on Azure Kubernetes Service

Today, we talk about how to block egress traffic with Cilium cluster-wide network policies on Azure Kubernetes Service. For this, we need an Azure Kubernetes Service cluster with Cilium installed via the bring-your-own CNI approach.

Azure CNI powered by Cilium unfortunately only partially supports Cilium network policies. However, Cilium cluster-wide network policies and Cilium CIDR groups are not officially supported. Even the required custom resource definitions are installed on the Azure Kubernetes Service cluster.

Can I use CiliumNetworkPolicy custom resources instead of Kubernetes NetworkPolicy resources?

CiliumNetworkPolicy custom resources are partially supported. Customers may use FQDN filtering as part of the Advanced Container Networking Services feature bundle.

-> https://learn.microsoft.com/en-us/azure/aks/azure-cni-powered-by-cilium?WT.mc_id=AZ-MVP-5000119#frequently-asked-questions

The Cilium version I used is the RC.1 of the upcoming 1.17 release. Why not the latest stable version 1.16? I will outline this later in this blog post.

Egress traffic blocking is done to prevent network traffic to malicious network entities, to country-level network CIDR ranges, called geo-blocking, etc.

Using Cilium cluster-wide network policies with Cilium CIDR groups is an easy way to achieve this without setting up additional infrastructure or services. As security, including network security, is a multi-layer approach today’s presented solution is only one building block but a powerful one.

Configuration

The entire configuration in this example is kept simple and can become a bit more complex when it should be covering the geo-blocking approach. Geo-blocking requires constant updates to the Cilium CIDR group and includes the usage of third-party services like MaxMind.

In my example, I am using the three IP addresses of my blog and added a /32 at the end to define them in CIDR annotation.

The corresponding Cilium CIDR group template is shown below.

apiVersion: cilium.io/v2alpha1
kind: CiliumCIDRGroup
metadata:
  name: egress-traffic-blocking
  labels:
    policy: egress-traffic-blocking
spec:
  externalCIDRs:
    - 217.160.0.92/32
    - 217.160.0.111/32
    - 217.160.223.1/32

Without being referenced by a Cilium cluster-wide or Cilium network policy a Cilium CIDR group is just another resource in Kubernetes and does not affect egress traffic at all.

apiVersion: cilium.io/v2
kind: CiliumClusterwideNetworkPolicy
metadata:
  name: egress-traffic-blocking
  labels:
    policy: egress-traffic-blocking
spec:
  endpointSelector: {}
  enableDefaultDeny:
    egress: false
  egressDeny:
    - toCIDRSet:
        - cidrGroupRef: egress-traffic-blocking

As seen above in the Cilium cluster-wide network policy we select every possible endpoint in the Azure Kubernetes Service cluster this policy should be applied, to block egress traffic.

We set the enableDefaultDeny setting for egress to false as we want to allow every other egress traffic except for the ranges defined in the Cilium CIDR group.

Under egressDeny, we reference the Cilium CIDR group. If we want, we could add additional CIDR group references to the template.

After rolling out both templates to the Azure Kubernetes Service cluster, we test the egress traffic blocking in an application pod using the curl command and hubble observe to monitor if the egress traffic is blocked as intended.

root@bash:/# curl https://www.danielstechblog.de
------------------------------------------------------------------------------------------------------------------------------------------
❯ hubble observe --drop-reason-desc POLICY_DENY -f
Jan 15 07:19:58.346: default/bash:57152 (ID:21894) <> 217.160.0.92:443 (ID:16777218) policy-verdict:L3-Only EGRESS DENIED (TCP Flags: SYN)
Jan 15 07:19:58.346: default/bash:57152 (ID:21894) <> 217.160.0.92:443 (ID:16777218) Policy denied by denylist DROPPED (TCP Flags: SYN)

As seen above the egress traffic is blocked successfully by the defined Cilium cluster-wide network policy.

Difference between Cilium 1.16 and 1.17

There have been some heavy investments into Cilium’s CIDR group capabilities for the upcoming 1.17 release.

First and foremost, how Cilium handles the identities for CIDR ranges. Before 1.17, Cilium created for every CIDR range its own identity. In my example, it would be three identities as seen below.

❯ kubectl exec -it cilium-6dddh -c cilium-agent -- cilium-dbg identity list
ID         LABELS
1          reserved:host
           reserved:kube-apiserver
2          reserved:world
...
16777217   cidr:217.160.0.111/32
           reserved:world
16777218   cidr:217.160.223.1/32
           reserved:world
16777219   cidr:217.160.0.92/32
           reserved:world

With only a few hundred CIDR ranges in a CIDR group, this should not be a concern. However, in the scenario of geo-blocking, it can be tens of thousands of CIDR ranges which actually brings Cilium before 1.17 to its limits.

In Cilium 1.17 an identity is only created for the entire CIDR group and not for every CIDR range that is part of the CIDR group.

❯ kubectl exec -it cilium-vplkw -c cilium-agent -- cilium-dbg identity list
ID         LABELS
1          reserved:host
           reserved:kube-apiserver
2          reserved:world
...
16777218   cidrgroup:io.cilium.policy.cidrgroupname/egress-traffic-blocking
           reserved:world

Looking back to the geo-blocking scenario, that means we would only have one identity for all the tens of thousands of CIDR ranges which dramatically improves performance and scaling.

Summary

Doing egress traffic blocking on Azure Kubernetes Service using Cilium is straightforward with its CIDR groups and cluster-wide network policies. Especially, with the improvements in the upcoming 1.17 release.

The examples can be found on my GitHub repository.

-> https://github.com/neumanndaniel/kubernetes/tree/master/cilium/egress-traffic-blocking

↧

Egress traffic blocking with Calico global network policies on Azure Kubernetes Service

January 20, 2025, 11:55 pm

≫ Next: Using Cilium Hubble Exporter to log blocked egress traffic on Azure Kubernetes Service

≪ Previous: Egress traffic blocking with Cilium cluster-wide network policies on Azure Kubernetes Service

In my last blog post, I covered how to do egress traffic blocking with Cilium bring-your-own CNI on Azure Kubernetes Service as Azure CNI powered by Cilium does not officially support Cilium cluster-wide network policies and Cilium CIDR groups.

-> https://www.danielstechblog.io/egress-traffic-blocking-with-cilium-cluster-wide-network-policies-on-azure-kubernetes-service/

In addition to the Cilium option on Azure Kubernetes Service, there has been and still is the option to deploy an Azure Kubernetes Service cluster with Azure CNI and Calico for the network policy part.

So, in today’s blog post, we talk about how to block egress traffic with Calico global network policies on Azure Kubernetes Service. For this, we need an Azure Kubernetes Service cluster with Azure CNI and Calico for the network policy part or alternatively with Calico installed via the bring-your-own CNI approach.

Egress traffic blocking is done to prevent network traffic to malicious network entities, to country-level network CIDR ranges, called geo-blocking, etc.

Using Calico global network policies with Calico global network sets is an easy way to achieve this without setting up additional infrastructure or services. As security, including network security, is a multi-layer approach today’s presented solution is only one building block but a powerful one.

Configuration

The entire configuration in this example is kept simple and can become a bit more complex when it should be covering the geo-blocking approach. Geo-blocking requires constant updates to the Calico global network set and includes the usage of third-party services like MaxMind.

In my example, I am using the three IP addresses of my blog and added a /32 at the end to define them in CIDR annotation.

The corresponding Calico global network set template is shown below.

apiVersion: crd.projectcalico.org/v1
kind: GlobalNetworkSet
metadata:
  name: egress-traffic-blocking
  labels:
    policy: egress-traffic-blocking
spec:
  nets:
    - 217.160.0.92/32
    - 217.160.0.111/32
    - 217.160.223.1/32

Without being referenced by a Calico global network policy a Calico global network set is just another resource in Kubernetes and does not affect egress traffic at all.

apiVersion: crd.projectcalico.org/v1
kind: GlobalNetworkPolicy
metadata:
  name: egress-traffic-allow
  labels:
    policy: egress-traffic-allow
spec:
  order: 1000
  types:
    - Egress
  egress:
    - action: Allow
---
apiVersion: crd.projectcalico.org/v1
kind: GlobalNetworkPolicy
metadata:
  name: egress-traffic-blocking
  labels:
    policy: egress-traffic-blocking
spec:
  order: 0
  types:
    - Egress
  egress:
    - action: Log
      destination:
        selector: policy == 'egress-traffic-blocking'
    - action: Deny
      destination:
        selector: policy == 'egress-traffic-blocking'

As seen above, we require two Calico global network policies.

The first one with order 1000 ensures that all egress traffic not blocked by the egress-traffic-blocking policy is allowed. Otherwise, egress traffic would be blocked by default after applying the first global network policy targeting egress traffic.

The second Calico global network policy references the Calico global network set via a defined label to block egress traffic targeting the specified CIDR ranges. Besides blocking egress traffic, we like to generate logs for it. Hence, we use the action Log in the policy definition. It is important that the Log action comes before the Deny action as actions are processed in the order they have been defined in the template.

After rolling out all three templates to the Azure Kubernetes Service cluster, we test the egress traffic blocking in an application pod using the curl command and tail on the Kubernetes node’s syslog to monitor if the egress traffic is blocked as intended.

root@bash:/# curl https://www.danielstechblog.de
-----------------------------------------------------------------------------------------------------------------------
sh5.1# tail -f /var/log/messages | grep 'calico-packet'
2025-01-19T16:13:46.406155+00:00 aks-default-30331449-vmss000000 kernel: [ 6236.5514311 calico-packet: IN=azv7c40892f581 OUT=ethO MAC=aa:aa:aa:aa:aa:aa:da:a0:27:a7:b9:41:08:00 SRC=10.244.0.179 DST=217.160.0.92 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=27413 DF PROTO=TCP SPT=53940 DPT=443 WINDOW=64240 RES=0x00 SYN URGP=0
2025-01-19T16:13:46.406172+00:00 aks-default-30331449-vmss000000 kernel: calico-packet: IN=azv7c40892f581 OUT=eth0 MAC=aa:aa:aa:aa:aa:aa:da:a0:27:a7:b9:41:08:00 SRC=10.244.0.179 DST=217.160.0.92 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=27413 DF PROTO=TCP SPT=53940 DPT=443 WINDOW=64240 RES=0x00 SYN URGP=0

As seen above the egress traffic is blocked successfully by the defined Calico global network policy.

Calico only produces log output for network policies, when we use the Log action. The log output ends up in the Syslog and not in the Calico pod stdout log. On an Azure Linux Kubernetes node, the Syslog can be found in /var/log/messages. So, gathering those logs requires extra configuration, when you do not gather the Syslog already with your logging solution.

Summary

Doing egress traffic blocking on Azure Kubernetes Service using Calico is straightforward with its global network sets and global network policies.

The examples can be found on my GitHub repository.

-> https://github.com/neumanndaniel/kubernetes/tree/master/calico/egress-traffic-blocking

↧

Using Cilium Hubble Exporter to log blocked egress traffic on Azure Kubernetes Service

February 5, 2025, 12:45 pm

≫ Next: Using Hubble CLI’s automatic port forwarding

≪ Previous: Egress traffic blocking with Calico global network policies on Azure Kubernetes Service

In one of my previous blog posts, I covered how to do egress traffic blocking with Cilium bring-your-own CNI on Azure Kubernetes Service

-> https://www.danielstechblog.io/egress-traffic-blocking-with-cilium-cluster-wide-network-policies-on-azure-kubernetes-service/

Today we look into Cilium Hubble Exporter which lets us write Hubble flows to the Cilium agent log output. Thus, Hubble flows can be collected by the logging solution running on an Azure Kubernetes Service cluster.

On my Azure Kubernetes Service cluster, I use Fluent Bit for the log collection and Azure Data Explorer as the logging backend.

Enable Cilium Hubble Exporter

The Cilium Hubble Exporter can be enabled in two different modes: static or dynamic. We use the dynamic mode, which provides several advantages over the static mode. It allows you to configure multiple filters and does not require restarting the Cilium agent to apply changes.

-> https://docs.cilium.io/en/stable/observability/hubble/configuration/export/

Using the Helm Chart we set hubble.export.dynamic.enabled to true to deploy the Cilium Hubble Exporter in its default configuration.

-> https://artifacthub.io/packages/helm/cilium/cilium?modal=values&path=hubble.export.dynamic

❯ helm upgrade --install cilium cilium/cilium --version 1.17.0 \
  --wait \
  --namespace kube-system \
  --kubeconfig "$KUBECONFIG" \
  ...
  --set hubble.export.dynamic.enabled=true

We do that to retrieve the config map structure for the Cilium Hubble Exporter configuration. The config map is called cilium-flowlog-config.

apiVersion: v1
kind: ConfigMap
metadata:
  name: cilium-flowlog-config
  namespace: kube-system
data:
  flowlogs.yaml: |
    flowLogs:
    - excludeFilters: []
      fieldMask: []
      filePath: /var/run/cilium/hubble/events.log
      includeFilters: []
      name: all

Once we have the structure, we start by fine-tuning the Cilium Hubble Exporter to log only Hubble flows for egress traffic that has been denied by a Cilium network or cluster-wide network policy. Furthermore, those Hubble flows should be logged to stdout instead of to the Cilium’s agent file system.

Fine-tune Cilium Hubble Exporter

The fine-tuning of the Cilium Hubble Exporter requires some deeper looks into the flow API documentation as well into Cilium’s source code to gather the required configuration keys and values.

-> https://docs.cilium.io/en/stable/_api/v1/flow/README/#flowfilter
-> https://github.com/cilium/cilium/blob/v1.17.0/pkg/monitor/api/types.go
-> https://github.com/cilium/cilium/blob/v1.17.0/pkg/monitor/api/drop.go

Before we start this journey let us configure the filePath the Hubble flows should be logged to.

apiVersion: v1
kind: ConfigMap
metadata:
  name: cilium-flowlog-config
  namespace: kube-system
data:
  flowlogs.yaml: |
    flowLogs:
    - name: egress-traffic-blocking
      excludeFilters: []
      fieldMask: []
      filePath: /dev/stdout
      includeFilters: []

Why do we use /dev/stdout and not a file stored in the Cilium’s agent file system? The answer to that is simple. We want to collect those log data with an already existing logging solution. In our example, Fluent Bit ingests those logs into Azure Data Explorer.

Now, comes the interesting part, remember we want only Hubble flows for egress traffic that has been denied by a network policy. For that we add the traffic_direction condition to the includeFilters section with the value EGRESS.

Identifying the correct value for the event_type condition is a bit tricky and requires a look into Cilium’s source code.

-> https://github.com/cilium/cilium/blob/v1.17.0/pkg/monitor/api/types.go#L19-L58

As we want to log blocked egress traffic, we need to identify the value that stands for type DROPPED. Looking at the comment in the source code the counting starts at 0 and 1 is the value for the type DROPPED.

-> https://github.com/cilium/cilium/blob/v1.17.0/pkg/monitor/api/types.go#L25

The sub_type for traffic denied by a network policy is way easier to identify by looking at the Go map errors and it is 181.

-> https://github.com/cilium/cilium/blob/v1.17.0/pkg/monitor/api/drop.go#L17-L103
-> https://github.com/cilium/cilium/blob/v1.17.0/pkg/monitor/api/drop.go#L79

Our final Cilium Hubble Exporter configuration is shown below.

apiVersion: v1
kind: ConfigMap
metadata:
  name: cilium-flowlog-config
  namespace: kube-system
data:
  flowlogs.yaml: |
    flowLogs:
    - name: egress-traffic-blocking
      excludeFilters: []
      fieldMask: []
      filePath: /dev/stdout
      includeFilters:
      - event_type:
        - type: 1
          sub_type: 181
        traffic_direction:
        - EGRESS

Before we roll out our final configuration, we update the Cilium installation to not automatically create the config map for us.

❯ helm upgrade --install cilium cilium/cilium --version 1.17.0 \
  --wait \
  --namespace kube-system \
  --kubeconfig "$KUBECONFIG" \
  ...
  --set hubble.export.dynamic.enabled=true \
  --set hubble.export.dynamic.config.configMapName=cilium-flowlog-config \
  --set hubble.export.dynamic.config.createConfigMap=false

Afterward, we apply our Cilium Hubble Exporter configuration and check with the following command if the configuration was applied successfully.

❯ kubectl logs cilium-9rl9n cilium-agent | grep "Configuring Hubble event exporter"
time="2025-01-23T07:50:22.887800089Z" level=info msg="Configuring Hubble event exporter" flowLogName=egress-traffic-blocking options="{0x3c55a00 0x3c561c0 [] [] map[] [] [0x3b817c0] []}" subsys=hubble

Next step is the testing if everything works correctly by running the curl command against a blocked CIDR range within a pod.

❯ kubectl logs cilium-9rl9n cilium-agent -f | grep '"flow":'
{"flow":{"time":"2025-01-23T07:59:23.501253423Z","uuid":"62b81705-ed79-4e5b-8f10-70a06970104d","verdict":"DROPPED","drop_reason":181,"ethernet":{"source":"5e:a0:3e:10:9e:71","destination":"82:a8:b0:95:ff:15"},"IP":{"source":"100.64.0.183","destination":"217.160.0.92","ipVersion":"IPv4"},"l4":{"TCP":{"source_port":44240,"destination_port":443,"flags":{"SYN":true}}},"source":{"ID":3810,"identity":14406,"cluster_name":"aks-azst-2","namespace":"default","labels":[...],"pod_name":"bash"},"destination":{"identity":16777218,"labels":["cidrgroup:io.cilium.policy.cidrgroupname/egress-traffic-blocking","cidrgroup:policy=egress-traffic-blocking","reserved:world"]},"Type":"L3_L4","node_name":"aks-azst-2/aks-nodepool1-25275757-vmss000000","node_labels":[...],"event_type":{"type":1,"sub_type":181},"traffic_direction":"EGRESS","file":{"name":"bpf_lxc.c","line":1360},"drop_reason_desc":"POLICY_DENY","Summary":"TCP Flags: SYN"},"node_name":"aks-azst-2/aks-nodepool1-25275757-vmss000000","time":"2025-01-23T07:59:23.501253423Z"}

As seen above the egress traffic denied by a network policy is logged correctly to stdout of the Cilium agent. Furthermore, Fluent Bit ingests the Hubble flow output into the Azure Data Explorer cluster.

Summary

The Cilium Hubble Exporter feature is a powerful functionality of Cilium to write Hubble flows as logs to a specified output whether it is a file or directly to stdout.

However, the configuration of the Cilium Hubble Exporter requires a deeper look into the flow API and Cilium’s source code and has a steeper learning curve than other Cilium features.

The example configuration can be found on my GitHub repository.

-> https://github.com/neumanndaniel/kubernetes/tree/master/cilium/hubble-exporter

↧

Using Hubble CLI’s automatic port forwarding

March 6, 2025, 11:37 pm

≫ Next: Azure Load Balancer Health Event Logs

≪ Previous: Using Cilium Hubble Exporter to log blocked egress traffic on Azure Kubernetes Service

This will be a rather short blog post today, but it will highlight a new feature in the Hubble CLI in version 1.17 and later.

Since version 1.17, the option -P has been added to the Hubble CLI. -P enables the automatic port forwarding to the Hubble relay in the Kubernetes cluster.

As seen in the screenshot above, with the option -P, you can run immediately the Hubble CLI command you want without using cilium hubble port-forward to connect to the Hubble relay upfront with another process.

For comparison, have a look at the screenshot below that shows the process before version 1.17 or without using the option -P.

Even though this is only a small new feature it is one that makes day-to-day usage of the Hubble CLI way more convenient than before.

More details about the implementation can be found in the PR.

-> https://github.com/cilium/cilium/pull/35483

↧

Azure Load Balancer Health Event Logs

March 24, 2025, 1:13 am

≫ Next: Provide additional metadata information to Cilium for IP addresses outside of the Kubernetes cluster scope

≪ Previous: Using Hubble CLI’s automatic port forwarding

In February, Microsoft announced the general availability of the Azure Load Balancer health event logs.

-> https://azure.microsoft.com/en-us/updates?WT.mc_id=AZ-MVP-5000119&id=481818

Those health event logs are part of the diagnostic logs of an Azure Load Balancer

As seen in the screenshot above, I have configured them on the Azure Load Balancer, part of my Azure Kubernetes Service cluster, and sent those logs to an Azure Log Analytics workspace.

The following types of health events are published when they are detected: DataPathAvailabilityWarning, DataPathAvailabilityCritical, NoHealthyBackends, HighSnatPortUsage, and SnatPortExhaustion.

-> https://learn.microsoft.com/en-us/azure/load-balancer/load-balancer-health-event-logs?WT.mc_id=AZ-MVP-5000119#health-event-types-and-publishing-frequency

The first two events are published when platform issues in Azure affect the data path availability. Looking at the other three events, they notify you when something has happened that you can solve yourself within the Azure Load Balancer or application configuration.

When you receive the NoHealthyBackends event, your application behind the Azure Load Balancer is affected and has a complete outage.

So, should you depend on this health event to detect an outage of your application? The answer is no. It is an addition to your existing monitoring solution that provides additional detection from within the platform.

In the past I have written two blog posts about how to detect and mitigate SNAT port exhaustion in Azure.

-> https://www.danielstechblog.io/detecting-snat-port-exhaustion-on-azure-kubernetes-service/
-> https://www.danielstechblog.io/preventing-snat-port-exhaustion-on-azure-kubernetes-service-with-virtual-network-nat/

When you still need to rely on an outbound public IP configuration for an Azure Load Balancer instead of using an Azure NAT Gateway, then the health events HighSnatPortUsage and SnatPortExhaustion are for you. Both events allow you to detect and mitigate an SNAT port exhaustion faster than without those events.

Summary

The new Azure Load Balancer health events complement the existing Load Balancer metrics and support you in your day-to-day operations for Azure services that rely on Azure Load Balancer.

-> https://learn.microsoft.com/en-us/azure/load-balancer/load-balancer-health-event-logs?WT.mc_id=AZ-MVP-5000119

↧

Provide additional metadata information to Cilium for IP addresses outside of the Kubernetes cluster scope

April 19, 2025, 1:47 am

≪ Previous: Azure Load Balancer Health Event Logs

In Cilium, IP addresses that do not belong to the Pod CIDR or Kubernetes Service CIDR range, and some special ranges like the Kubernetes API server, are recognized as the reserved:world identity. So, to say they do not belong to the Kubernetes cluster scope, known to Cilium itself.

-> https://docs.cilium.io/en/stable/gettingstarted/terminology/#special-identities

When you start using DNS-based Cilium network policies, you automatically add additional metadata information, identity labels, to the IP addresses that the FQDN resolves to.

-> https://docs.cilium.io/en/stable/security/dns/

However, you might want to add additional metadata information to IP addresses that are part of the reserved:world identity and not covered by a DNS-based Cilium network policy. The question is now, how do you do that?

Let us have a look into an Azure Kubernetes Service cluster, running Cilium in BYOCNI mode, and the two special IP addresses that Azure uses for the internal DNS service and the Instance Metadata Service short IMDS.

If you have not configured a custom DNS server for the Virtual Network that the Azure Kubernetes Service cluster uses, then CoreDNS and the Virtual Machine Scale Set instances are using Azure’s internal DNS service that operates under the 168.63.129.16 IP address. Azure’s IMDS service operates under the 169.254.169.254 IP address.

Looking into the network traffic of the kube-system namespace using Cilium’s Hubble UI, we see outbound traffic from the CoreDNS pods to the IP address 168.63.129.16 on port 53.

As not everyone is aware of those two Azure-specific IP addresses, we want to provide additional context to them.

In Cilium we can achieve this by using Cilium’s CIDR groups. The following two CIDR groups make Cilium aware of the two IP addresses.

apiVersion: cilium.io/v2alpha1
kind: CiliumCIDRGroup
metadata:
  name: azure-imds
  labels:
    k8s-app: azure-imds
spec:
  externalCIDRs:
    - 169.254.169.254/32
---
apiVersion: cilium.io/v2alpha1
kind: CiliumCIDRGroup
metadata:
  name: azure-internal-dns
  labels:
    k8s-app: azure-internal-dns
spec:
  externalCIDRs:
    - 168.63.129.16/32

Once applied to the Azure Kubernetes Service cluster, we see the additional metadata information for the IP 168.63.129.16.

According to the CIDR group definition, the destination labels cidrgroup:io.cilium.policy.cidrgroupname/azure-internal-dns and cidrgroup:k8s-app=azure-internal-dns are added. Also, the destination identity changes to the one for the CIDR group.

This allows us to provide additional context for IP addresses that reside outside of the Kubernetes cluster scope.

The example CIDR group definition can be found on my GitHub repository.

-> https://github.com/neumanndaniel/kubernetes/blob/master/cilium/metadata-information/azure-specific-ip-addresses.yaml

↧