Bug 13006 - nslookup problems
Summary: nslookup problems
Status: RESOLVED FIXED
Alias: None
Product: Busybox
Classification: Unclassified
Component: Networking (show other bugs)
Version: 1.31.x
Hardware: PC Linux
: P5 blocker
Target Milestone: ---
Assignee: unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-06-11 14:36 UTC by teke97@gmail.com
Modified: 2020-12-31 00:42 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description teke97@gmail.com 2020-06-11 14:36:26 UTC
I have installed prometheus in kubernetes 1.17, which based on busybox image.
After investigation I have realise that problem may be in busybox image.
The problem seems to be similar as https://github.com/kubernetes/kubernetes/issues/66924#issuecomment-411806846




part of kubernetes deploymet config:

        - args:
        - -c
        - while true; do nslookup alertmanager-bot; sleep 10; done
        command:
        - /bin/sh
        image: busybox:1.31.1




pod log:

Server:		10.245.0.10
Address:	10.245.0.10:53

** server can't find alertmanager-bot.monitoring.svc.cluster.local: NXDOMAIN

*** Can't find alertmanager-bot.svc.cluster.local: No answer
*** Can't find alertmanager-bot.cluster.local: No answer
*** Can't find alertmanager-bot.monitoring.svc.cluster.local: No answer
*** Can't find alertmanager-bot.svc.cluster.local: No answer
*** Can't find alertmanager-bot.cluster.local: No answer




coredns log:

coredns-84c79f5fb4-vkc7j coredns 2020-06-11T14:29:27.561Z [INFO] 10.244.0.215:43144 - 19456 "AAAA IN alertmanager-bot.cluster.local. udp 48 false 512" NXDOMAIN qr,aa,rd 141 0.000202924s
coredns-84c79f5fb4-vkc7j coredns 2020-06-11T14:29:27.562Z [INFO] 10.244.0.215:43144 - 19456 "A IN alertmanager-bot.monitoring.svc.cluster.local. udp 63 false 512" NOERROR qr,aa,rd 124 0.000145229s
coredns-84c79f5fb4-vkc7j coredns 2020-06-11T14:29:27.562Z [INFO] 10.244.0.215:43144 - 19456 "A IN alertmanager-bot.svc.cluster.local. udp 52 false 512" NXDOMAIN qr,aa,rd 145 0.000084224s
coredns-84c79f5fb4-vkc7j coredns 2020-06-11T14:29:27.562Z [INFO] 10.244.0.215:43144 - 19456 "A IN alertmanager-bot.cluster.local. udp 48 false 512" NXDOMAIN qr,aa,rd 141 0.000056272s
coredns-84c79f5fb4-vkc7j coredns 2020-06-11T14:29:27.562Z [INFO] 10.244.0.215:43144 - 19456 "AAAA IN alertmanager-bot.monitoring.svc.cluster.local. udp 63 false 512" NOERROR qr,aa,rd 156 0.000060009s
coredns-84c79f5fb4-vkc7j coredns 2020-06-11T14:29:27.562Z [INFO] 10.244.0.215:43144 - 19456 "AAAA IN alertmanager-bot.svc.cluster.local. udp 52 false 512" NXDOMAIN qr,aa,rd 145 0.000051978s


pod log with busybox 1.28.4:

Name:      alertmanager-bot
Address 1: 10.245.48.126 alertmanager-bot.monitoring.svc.cluster.local
Server:    10.245.0.10
Address 1: 10.245.0.10 kube-dns.kube-system.svc.cluster.local

coredns log:

coredns-84c79f5fb4-vkc7j coredns 2020-06-11T14:34:42.790Z [INFO] 10.244.0.204:53241 - 3 "AAAA IN alertmanager-bot.monitoring.svc.cluster.local. udp 63 false 512" NOERROR qr,aa,rd 156 0.000207196s
coredns-84c79f5fb4-bspnj coredns 2020-06-11T14:34:42.792Z [INFO] 10.244.0.204:57444 - 4 "A IN alertmanager-bot.monitoring.svc.cluster.local. udp 63 false 512" NOERROR qr,aa,rd 124 0.000175375s



/ # cat /etc/resolv.conf # the same on both images
nameserver 10.245.0.10
search monitoring.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
Comment 1 evans.tucker 2020-12-29 21:21:47 UTC
I see this problem in the uclibc version:

```
$ kubectl run -it --rm evanstest-b5 --image=busybox:1.32.0-uclibc -- sh
If you don't see a command prompt, try pressing enter.
/ # nslookup opentelemetry-collector.observability.svc.cluster.local
Server:		10.0.0.10
Address:	10.0.0.10:53

*** Can't find opentelemetry-collector.observability.svc.cluster.local: No answer

*** Can't find opentelemetry-collector.observability.svc.cluster.local: No answer

/ # 
```

The musl and glibc versions partially work:

```
$ kubectl run -it --rm evanstest-b3 --image=busybox:1.32.0-musl -- sh
If you don't see a command prompt, try pressing enter.
/ # nslookup opentelemetry-collector.observability.svc.cluster.local
Server:		10.0.0.10
Address:	10.0.0.10:53

*** Can't find opentelemetry-collector.observability.svc.cluster.local: No answer

Name:	opentelemetry-collector.observability.svc.cluster.local
Address: 10.0.228.84

/ # 

$ kubectl run -it --rm evanstest-b4 --image=busybox:1.32.0-glibc -- sh
If you don't see a command prompt, try pressing enter.
/ # nslookup opentelemetry-collector.observability.svc.cluster.local
Server:		10.0.0.10
Address:	10.0.0.10:53

Name:	opentelemetry-collector.observability.svc.cluster.local
Address: 10.0.228.84

*** Can't find opentelemetry-collector.observability.svc.cluster.local: No answer

/ #
```

The Kubernetes community has been working around this issue for a couple of years by pinning to busybox:1.28 - what can we do to help fix this issue?
Comment 2 evans.tucker 2020-12-29 21:25:17 UTC
It looks like this is a duplicate of https://bugs.busybox.net/show_bug.cgi?id=11161, which was closed, but apparently not resolved.
Comment 3 Denys Vlasenko 2020-12-30 13:52:01 UTC
(In reply to evans.tucker from comment #1)
> I see this problem in the uclibc version

uclibc has a rather primitive support for resolver routines, thus nslookup on uclibc is likely to be buggy: nslookup just overrides default DNS address, then uses standard name resolution in libc to get addresses.

Even such trivial thing as setting DNS server address to a IPv6 address does not work.

LEDE project contributed a better version of nslookup, it is enabled by setting FEATURE_NSLOOKUP_BIG=y but it requires a libc with a decent resolver support: uses ns_initparse(), ns_msg_count(), ns_parserr() etc. It will not compile on uclibc (did not try uclibc-ng).

So. What version of busybox is it?
Is FEATURE_NSLOOKUP_BIG=y ?

> The musl and glibc versions partially work:
> $ kubectl run -it --rm evanstest-b3 --image=busybox:1.32.0-musl -- sh
> If you don't see a command prompt, try pressing enter.
> / # nslookup opentelemetry-collector.observability.svc.cluster.local
> Server:		10.0.0.10
> Address:	10.0.0.10:53
>
> *** Can't find opentelemetry-collector.observability.svc.cluster.local: No answer
>
> Name:	opentelemetry-collector.observability.svc.cluster.local
> Address: 10.0.228.84

What is the exact problem in the above? It said that AAAA request was replied with no records ("No answer") and A request was successfully answered with an IP address.
Comment 4 Denys Vlasenko 2020-12-31 00:42:35 UTC
Aha. I see. We indeed differ from bind. Fixed in git:

commit 49fd1d69babc6945175068e8fe4c85713fe5fcdb (HEAD -> master)
Date:   Thu Dec 31 01:39:44 2020 +0100

    nslookup: do not print "No answer" for NODATA replies, closes 13006

    Only print when there was no answer at all.