I have installed prometheus in kubernetes 1.17, which based on busybox image. After investigation I have realise that problem may be in busybox image. The problem seems to be similar as https://github.com/kubernetes/kubernetes/issues/66924#issuecomment-411806846 part of kubernetes deploymet config: - args: - -c - while true; do nslookup alertmanager-bot; sleep 10; done command: - /bin/sh image: busybox:1.31.1 pod log: Server: 10.245.0.10 Address: 10.245.0.10:53 ** server can't find alertmanager-bot.monitoring.svc.cluster.local: NXDOMAIN *** Can't find alertmanager-bot.svc.cluster.local: No answer *** Can't find alertmanager-bot.cluster.local: No answer *** Can't find alertmanager-bot.monitoring.svc.cluster.local: No answer *** Can't find alertmanager-bot.svc.cluster.local: No answer *** Can't find alertmanager-bot.cluster.local: No answer coredns log: coredns-84c79f5fb4-vkc7j coredns 2020-06-11T14:29:27.561Z [INFO] 10.244.0.215:43144 - 19456 "AAAA IN alertmanager-bot.cluster.local. udp 48 false 512" NXDOMAIN qr,aa,rd 141 0.000202924s coredns-84c79f5fb4-vkc7j coredns 2020-06-11T14:29:27.562Z [INFO] 10.244.0.215:43144 - 19456 "A IN alertmanager-bot.monitoring.svc.cluster.local. udp 63 false 512" NOERROR qr,aa,rd 124 0.000145229s coredns-84c79f5fb4-vkc7j coredns 2020-06-11T14:29:27.562Z [INFO] 10.244.0.215:43144 - 19456 "A IN alertmanager-bot.svc.cluster.local. udp 52 false 512" NXDOMAIN qr,aa,rd 145 0.000084224s coredns-84c79f5fb4-vkc7j coredns 2020-06-11T14:29:27.562Z [INFO] 10.244.0.215:43144 - 19456 "A IN alertmanager-bot.cluster.local. udp 48 false 512" NXDOMAIN qr,aa,rd 141 0.000056272s coredns-84c79f5fb4-vkc7j coredns 2020-06-11T14:29:27.562Z [INFO] 10.244.0.215:43144 - 19456 "AAAA IN alertmanager-bot.monitoring.svc.cluster.local. udp 63 false 512" NOERROR qr,aa,rd 156 0.000060009s coredns-84c79f5fb4-vkc7j coredns 2020-06-11T14:29:27.562Z [INFO] 10.244.0.215:43144 - 19456 "AAAA IN alertmanager-bot.svc.cluster.local. udp 52 false 512" NXDOMAIN qr,aa,rd 145 0.000051978s pod log with busybox 1.28.4: Name: alertmanager-bot Address 1: 10.245.48.126 alertmanager-bot.monitoring.svc.cluster.local Server: 10.245.0.10 Address 1: 10.245.0.10 kube-dns.kube-system.svc.cluster.local coredns log: coredns-84c79f5fb4-vkc7j coredns 2020-06-11T14:34:42.790Z [INFO] 10.244.0.204:53241 - 3 "AAAA IN alertmanager-bot.monitoring.svc.cluster.local. udp 63 false 512" NOERROR qr,aa,rd 156 0.000207196s coredns-84c79f5fb4-bspnj coredns 2020-06-11T14:34:42.792Z [INFO] 10.244.0.204:57444 - 4 "A IN alertmanager-bot.monitoring.svc.cluster.local. udp 63 false 512" NOERROR qr,aa,rd 124 0.000175375s / # cat /etc/resolv.conf # the same on both images nameserver 10.245.0.10 search monitoring.svc.cluster.local svc.cluster.local cluster.local options ndots:5
I see this problem in the uclibc version: ``` $ kubectl run -it --rm evanstest-b5 --image=busybox:1.32.0-uclibc -- sh If you don't see a command prompt, try pressing enter. / # nslookup opentelemetry-collector.observability.svc.cluster.local Server: 10.0.0.10 Address: 10.0.0.10:53 *** Can't find opentelemetry-collector.observability.svc.cluster.local: No answer *** Can't find opentelemetry-collector.observability.svc.cluster.local: No answer / # ``` The musl and glibc versions partially work: ``` $ kubectl run -it --rm evanstest-b3 --image=busybox:1.32.0-musl -- sh If you don't see a command prompt, try pressing enter. / # nslookup opentelemetry-collector.observability.svc.cluster.local Server: 10.0.0.10 Address: 10.0.0.10:53 *** Can't find opentelemetry-collector.observability.svc.cluster.local: No answer Name: opentelemetry-collector.observability.svc.cluster.local Address: 10.0.228.84 / # $ kubectl run -it --rm evanstest-b4 --image=busybox:1.32.0-glibc -- sh If you don't see a command prompt, try pressing enter. / # nslookup opentelemetry-collector.observability.svc.cluster.local Server: 10.0.0.10 Address: 10.0.0.10:53 Name: opentelemetry-collector.observability.svc.cluster.local Address: 10.0.228.84 *** Can't find opentelemetry-collector.observability.svc.cluster.local: No answer / # ``` The Kubernetes community has been working around this issue for a couple of years by pinning to busybox:1.28 - what can we do to help fix this issue?
It looks like this is a duplicate of https://bugs.busybox.net/show_bug.cgi?id=11161, which was closed, but apparently not resolved.
(In reply to evans.tucker from comment #1) > I see this problem in the uclibc version uclibc has a rather primitive support for resolver routines, thus nslookup on uclibc is likely to be buggy: nslookup just overrides default DNS address, then uses standard name resolution in libc to get addresses. Even such trivial thing as setting DNS server address to a IPv6 address does not work. LEDE project contributed a better version of nslookup, it is enabled by setting FEATURE_NSLOOKUP_BIG=y but it requires a libc with a decent resolver support: uses ns_initparse(), ns_msg_count(), ns_parserr() etc. It will not compile on uclibc (did not try uclibc-ng). So. What version of busybox is it? Is FEATURE_NSLOOKUP_BIG=y ? > The musl and glibc versions partially work: > $ kubectl run -it --rm evanstest-b3 --image=busybox:1.32.0-musl -- sh > If you don't see a command prompt, try pressing enter. > / # nslookup opentelemetry-collector.observability.svc.cluster.local > Server: 10.0.0.10 > Address: 10.0.0.10:53 > > *** Can't find opentelemetry-collector.observability.svc.cluster.local: No answer > > Name: opentelemetry-collector.observability.svc.cluster.local > Address: 10.0.228.84 What is the exact problem in the above? It said that AAAA request was replied with no records ("No answer") and A request was successfully answered with an IP address.
Aha. I see. We indeed differ from bind. Fixed in git: commit 49fd1d69babc6945175068e8fe4c85713fe5fcdb (HEAD -> master) Date: Thu Dec 31 01:39:44 2020 +0100 nslookup: do not print "No answer" for NODATA replies, closes 13006 Only print when there was no answer at all.