Bug 5954

Summary: res_nclose never returns
Product: uClibc Reporter: Greg Beresford <greg.beresford>
Component: NetworkingAssignee: Bernhard Reutner-Fischer <aldot>
Status: RESOLVED FIXED    
Severity: minor CC: uclibc-cvs
Priority: P5    
Version: 0.9.34   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Host: arm-buildroot-linux-uclibcgnueabi Target: arm-buildroot-linux-uclibcgnueabi
Build: x86_64-linux-gnu
Attachments: demo program
uClibc .config
bug-5954a.patch
bug-5954b.patch

Description Greg Beresford 2013-02-19 16:58:46 UTC
Created attachment 4760 [details]
demo program

The connman 1.10 package in buildroot uses the res_nclose function as part of shutting down it's internal resolver representation. Since commit
1f9808dbf74fb771f6a631a4f37a2f2c3b9e4643 this function has a noreturn attribute, and a while(1)..; at the end of it, causing it to spin forever and use 100% cpu when res_nclose is called.

This doesn't seem correct and the noreturn attribute and the loop should probably be removed.

I'm using uClibc daily snapshot 20120213, built by buildroot 2012.02_rc1, with the attached .config. Specifically, gcc 4.6.3 and binutils 2.21.1.

Also attached is a little demo program which works when compiled with my host toolchain, but hangs forever after printing 'Initialised' when compiled with uClibc and run on my target.
Comment 1 Greg Beresford 2013-02-19 16:59:11 UTC
Created attachment 4766 [details]
uClibc .config
Comment 2 Bernhard Reutner-Fischer 2013-02-19 18:03:07 UTC
(In reply to comment #0)
> Created attachment 4760 [details]
> demo program
> 
> The connman 1.10 package in buildroot uses the res_nclose function as part of
> shutting down it's internal resolver representation. Since commit
> 1f9808dbf74fb771f6a631a4f37a2f2c3b9e4643 this function has a noreturn
> attribute, and a while(1)..; at the end of it, causing it to spin forever and
> use 100% cpu when res_nclose is called.
> 
> This doesn't seem correct and the noreturn attribute and the loop should
> probably be removed.

yes, looks like.
Comment 3 Bernhard Reutner-Fischer 2013-02-20 12:48:10 UTC
Created attachment 4772 [details]
bug-5954a.patch

revert aforementioned commit
Comment 4 Bernhard Reutner-Fischer 2013-02-20 12:55:02 UTC
Created attachment 4778 [details]
bug-5954b.patch

Wouldn't we need to also clear that res_state that you pass in like in the b.patch?
Consider your testcase:

#include <sys/types.h>
#include <netinet/in.h>
#include <arpa/nameser.h>
#include <resolv.h>
#include <netdb.h>
#include <assert.h>

int main(int argc, char **argv)
{
    int r;
    struct __res_state state;

    r = res_ninit(&state);
    if (r < 0)
        herror("init");
    printf("Initialised\n");
#ifdef __UCLIBC__
    assert (state._u._ext.nscount >= 0);
#else
    /* Looks like glibc did not populate the nameservers yet,
       perhaps since there was no query yet? FIX the testcase,
       make a query against that user-res-state..
     */
    assert (state._u._ext.nscount == 0);
#endif
    res_nclose(&state);

    printf("Closed\n");
    assert (state._u._ext.nscount == 0);

    return 0;
}

That connman application does odd things, i guess it should not need to do that at all in the first place..
Comment 5 Greg Beresford 2013-02-20 14:47:07 UTC
Yes - it looks like res_nclose should close the res_state that is passed, and not the global one.

Connman only uses the res_ninit/res_nclose with a user res_state so that it can read the nameservers out to add to its own internal list. So connman won't care what really happens in res_nclose, but I imagine other applications might get an unpleasant surprise if res_nclose closes the global state, not the user-supplied one.

On closer inspection, connman expects different behaviour in the population of res_state than what uClibc provides. It expects res.nscount and res.nsaddr_list to contain both v4 and v6 nameservers, and only gets the ipv6 address from res._u._ext.nsaddrs[i]->sin6_addr if res.nsaddr_list[i].sin_family == AF_INET6. 

Example largely created from code taken from connman:

/etc/resolv.conf:
nameserver 127.0.0.1
nameserver ::1
nameserver ::2

restest2.c:
#include <sys/types.h>
#include <netinet/in.h>
#include <arpa/nameser.h>
#include <resolv.h>
#include <netdb.h>

int main(int argc, char **argv)
{
    int r, i;
    struct __res_state state;

    r = res_ninit(&state);
    if (r < 0)
        herror("init");
    printf("Initialised: %d nameservers, %d ext nameservers\n", state.nscount, state._u._ext.nscount);


    for (i = 0; i < state.nscount; i++) {
        char buf[100];
        int family = state.nsaddr_list[i].sin_family;
        void *sa_addr = &state.nsaddr_list[i].sin_addr;

        if (family != AF_INET && state._u._ext.nsaddrs[i]) {
            family = AF_INET6;
            sa_addr = &state._u._ext.nsaddrs[i]->sin6_addr;
        }

        if (family != AF_INET && family != AF_INET6)
            continue;

        if (inet_ntop(family, sa_addr, buf, sizeof(buf)))
            printf("Read nameserver: %s\n", buf);
    }

    res_nclose(&state);

    printf("Closed: %d nameservers, %d ext nameservers\n", state.nscount, state._u._ext.nscount);

    return 0;
}

glibc output:
Initialised: 3 nameservers, 0 ext nameservers
Read nameserver: 127.0.0.1
Read nameserver: ::1
Read nameserver: ::2
Closed: 3 nameservers, 0 ext nameservers

uClibc output:
Initialised: 1 nameservers, 3 ext nameservers
Read nameserver: 127.0.0.1
Closed: 1 nameservers, 3 ext nameservers

I haven't got time to try to put together a patch now, but I can when I return from holiday in 2 weeks.
Comment 6 Bernhard Reutner-Fischer 2014-04-01 16:48:23 UTC
I suppose this was fixed on master by 3d791bda2e68136e5cfc52b5386e0db805b5d3ba and 950fcf0f68732a9aafacb5dfd33e7a7612141722

thanks!