If you’ve been following our recent Kubernetes migration blog, you already know the journey has been full of challenges. From configuring pods to tackling networking issues, it’s been a rollercoaster. We’ve explored several tricky problems in previous blogs, and today, we invite you to put on your detective hat and join us as we investigate another Kubernetes mystery.
The Mysterious Case of NXDomain Errors
Imagine this: You’re checking your Kubernetes observability tools, and suddenly, you notice something strange over a million NXDomain errors! What could be causing this? Let’s break it down together.
What Are NXDomain Errors?
Before we jump in, let’s test your DNS knowledge:
Pop Quiz: What does an NXDomain error indicate?
A) A domain exists but is unreachable.
B) A domain doesn’t exist.
C) A domain is experiencing high latency.
(Take a moment to think! Scroll down for the answer...)
The Answer: If you guessed B) A domain doesn’t exist, you’re right! These errors occur when a DNS query is made for a non-existent domain.
Unraveling the Clues
We took a closer look at the logs and found something unusual—external domains were mysteriously gaining extra words like .cluster.local
or .internal.cloudapp.net
. Here are two examples:
gmail.googleapis.com.cluster.local
oauth2.googleapis.com.es52e2p4cafzg4m1it5a.bx.internal.cloudapp.net
Now, let’s put your troubleshooting skills to the test:
What do you think is happening here?
A) These domains are being redirected intentionally.
B) Kubernetes is modifying external domains.
C) A rogue service is interfering with DNS.
(Think about it before scrolling!)
The Answer: B) Kubernetes is modifying external domains. But why? Let’s find out.
How Kubernetes Handles DNS Queries
To solve this puzzle, we need to understand how Kubernetes resolves DNS queries. When a pod performs a DNS lookup, Kubernetes doesn’t always send the request as-is. Instead, it applies search domains and NDots rules to the query.
Here’s a fun experiment: Try running the following command inside a Kubernetes pod:
cat /etc/resolv.conf
What do you see? You should find an entry for search
domains and an ndots
value. These settings influence how Kubernetes resolves domain names.
Connecting the Dots
Because the ndots
value was set to 5, Kubernetes treated gmail.googleapis.com
as an incomplete domain and appended search domains, turning it into:
gmail.googleapis.com.svc.cluster.local
gmail.googleapis.com.cluster.local
These domains don’t exist, leading to the dreaded NXDomain errors!
Fixing the Problem
Now that we’ve cracked the case, let’s apply the fix. Here’s how you can customize DNS settings to prevent Kubernetes from modifying external domains:
apiVersion: v1
kind: Pod
metadata:
namespace: default
name: dns-example
spec:
containers:
- name: test
image: nginx
dnsPolicy: "None"
dnsConfig:
nameservers:
- 1.2.3.4
searches:
- ns1.svc.cluster-domain.example
- my.dns.search.suffix
options:
- name: ndots
value: "2"
- name: edns0
The Outcome: A Smooth DNS Experience
By adjusting the DNS configuration, we prevent Kubernetes from mistakenly modifying external queries. This eliminates NXDomain errors and ensures external services resolve correctly.
Top comments (0)