Kubernetes ‘exec’ DNS failure – Updated

UPDATE: While the below definitely works, the correct way to do this is to properly add a DNS suffix. This should be set in your DHCP configuration if your nodes are getting their IP info from DHCP. If you’re using static IP addresses, you should run the following commands on each node. Replace <ifname> with the name of your network interface (i.e. eno1, eth0, etc.) and <domain.name> with the domain suffix you want appended.

# This change is immediate, but not persistent
sudo resolvectl domain <ifname> <domain.name>
# This makes it permanent
## Turns out, this sets the global search domain, but still fails
## echo "Domains=<domain.name>" | sudo cat /etc/systemd/resolved.conf -
## Netplan is what is setting the interface info, so be sure to edit its configuration
sudo sed -i 's|search: \[\]|search: \[ <domain.name> \]|' /etc/netplan/<netplan file>

From https://askubuntu.com/a/1211705


I have finally migrated all of my containers from my docker-ce server to kubernetes (microk8s server). The point was so that I could wipe the docker-ce server and make a microk8s cluster – which has been done and was super easy!

However, after getting the cluster setup I wasn’t able to exec into certain pods from a remote machine with kubectl. The error I was getting was below:

Error from server: error dialing backend: dial tcp: lookup <node-name>: Temporary failure in name resolution

As I had originally only had a single node, my kubectl config referenced the original nodes IP address directly. Additionally, I noticed that this error happened when the pod was located on the node that wasn’t the api server I was accessing. By changing my kube config api server to the node that hosted the pod, it then worked.

After a lot of playing with kube-dns and coredns, it really came down to something easy/obvious. When I was on one node, I couldn’t resolve the shortname of the other node, and therefore node1 couldn’t proxy to node2 to run the exec.

While there are multiple ways I could have fixed this (and I did get the right DNS suffixes added to DHCP too), I ended up editing the /etc/hosts on each node and ensuring there was an entry for the other node. Tada, exec works across nodes now.