Bug 13776 - udhcpc allows short lease to expire
Summary: udhcpc allows short lease to expire
Status: RESOLVED FIXED
Alias: None
Product: Busybox
Classification: Unclassified
Component: Networking (show other bugs)
Version: 1.33.x
Hardware: All Linux
: P5 normal
Target Milestone: ---
Assignee: unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-04-29 20:05 UTC by Paul
Modified: 2022-04-29 20:47 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:


Attachments
log of new udhcpc code (4.02 KB, text/plain)
2021-05-14 04:41 UTC, Paul
Details
Avoid infinite loop broadcasting renew (725 bytes, patch)
2021-05-22 13:43 UTC, Paul
Details
log of new udhcpc code with patch (2.44 KB, text/plain)
2021-05-22 13:48 UTC, Paul
Details
log of even newer udhcpc code (5.61 KB, text/plain)
2021-06-01 04:40 UTC, Paul
Details
log of same udhcpc code with new modem firmware (10.61 KB, text/plain)
2021-10-08 03:21 UTC, Paul
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Paul 2021-04-29 20:05:53 UTC
The 2 lines below pretend that the lease is 122s when it is in fact shorter. The udhcp_run_script() informs the client of the actual lease time not the make believe one.

https://git.busybox.net/busybox/tree/networking/udhcp/dhcpc.c#n1749

A modem that gives out a temporary IP address with a 60s lease time to dd-wrt will see the lease time reach zero. Busybox dhcpc.c will only attempt to renew after half the 122s make believe lease time (61s) which is after the actual lease time (60s) has already expired.

The fudging of tiny lease times first appeared here enforcing a 10s minimum:
"udhcpc: guard against zero lease time"

https://git.busybox.net/busybox/commit/networking/udhcp/dhcpc.c?id=fc9e1080320e52fb97ae694627d8101af64f3dad

It then became 16s here:
"udhcpc: code shrink"

https://git.busybox.net/busybox/commit/networking/udhcp/dhcpc.c?id=0fd4347cedd1d32341e827c04dd36aaaf2e8040d

Lastly a large jump to enforcing a 122s minimum:

"udhcpc: ensure at least one unicast renew attempt"

https://git.busybox.net/busybox/commit/networking/udhcp/dhcpc.c?id=c05aa6a776ab2420a42c041a3b5d45db587fd9ef

After the expired lease, unpredictable connection failures occur between my modem and router.
Comment 1 Denys Vlasenko 2021-04-29 23:39:27 UTC
Please add more details.

Your ISP gives you 60 second-long leases?

You yourself configured DCP server to hand out 60 second lease? If yes, why?
Comment 2 Paul 2021-04-30 02:55:12 UTC
The modem is supplied by the ISP so I have no control over it. When the modem boots up it immediately (within seconds) gives the router a private ipv4 address (192.168.x.x) with a 60 second lease. The modem has a web interface on that private subnet so users can view how efforts to connect to the internet are coming along as well as some statistics, firmware status...

Once the modem does get online it gives the router a publicly routable IPv4 address with a lease of at least an hour.

With stock router firmware the handoff from the short leased private address to the long leased public address was fast and smooth. With dd-wrt (which uses busybox's udhcpc) the handoff is long, shaky and sometimes fails altogether.

I'm not a networking expert and don't even understand what the comment "udhcpc: ensure at least one unicast renew attempt" means when the change to 122s was introduced. The short lease does reach zero every time which can't be good.
Comment 3 Denys Vlasenko 2021-04-30 08:58:02 UTC
(In reply to Paul from comment #2)
The modem is supplied by the ISP so I have no control over it. When the modem boots up it immediately (within seconds) gives the router a private ipv4 address (192.168.x.x) with a 60 second lease.

Please clarify. Modem and router are two different devices? Modem is connected to ISP, and router is connected to this modem? Or I misunderstand?

> Once the modem does get online it gives the router a publicly routable IPv4 address with a lease of at least an hour.
> With stock router firmware the handoff from the short leased private address to the long leased public address was fast and smooth. With dd-wrt (which uses busybox's udhcpc) the handoff is long, shaky and sometimes fails altogether.

Please describe how it happens when it fials altogether. Waiting an extra minute should not cause a permanent failure, I need to know more to understand why that happens. udhcpc log (with increased verbosity if necessary) should show what's going on.

> I'm not a networking expert and don't even understand what the comment "udhcpc: ensure at least one unicast renew attempt" means when the change to 122s was introduced.

It means "avoid always renewing the lease using broadcasts".
Comment 4 Paul 2021-04-30 13:59:57 UTC
You are exactly right. The ISP's modem has a coaxial cable coming in. It also has a single ethernet port going out to the wireless router's WAN port. I have installed dd-wrt on the wireless router that I bought/own.

After 61s (half of the 122s "assumed" lease time) udhcpc will try to renew but this fails since the actual lease (60s) is already expired. There are timeouts, delays, repeated renew reattempts with progressively shorter timeouts until udhcpc eventually decides that the lease is lost. After a longer than normal delay, this usually ends up with the internet working but maybe once every week or two, manual dhcp release/renew or even router reboot are required. The modem is turned off nightly for family reasons but the router is always on.

There also could be a bug in dd-wrt but the long delays and dd-wrt displaying a remaining lease time of zero during all of the failed renew attempts, is a result of this 122s lease assumption. Note that udhcpc assumes 122s but correctly reports 60s to dd-wrt.

I'll try to get some actual udhcpc logs.

What is special about 60s timeouts? Why is udhcpc selecting (60s + 1s )*2 ?

Thanks for the help/explanations.
Comment 5 Denys Vlasenko 2021-05-01 10:37:56 UTC
Fixed in git:

commit a1a77ad5eae7384dd6b1ccaa436a1df48c3c9cd7
Author: Denys Vlasenko <vda.linux@googlemail.com>
Date:   Sat May 1 11:54:08 2021 +0200

    udhcpc[6]: untangle "timeout" and "remaining lease"; reduce min lease to 30 seconds

    This allows to fix a problem that we wait for renew replies
    for up to half the lease (!!!) if they never come.

    Make it so that lease of 60 seconds is not "rounded up" to 120 seconds -
    set lower "sanity limit" to 30 seconds.

    After 3 failed renew attempts, switch to rebind.

    After this change, we can have more flexible choice of when to do
    the first renew - does not need to be equal to lease / 2.
Comment 6 Paul 2021-05-01 14:28:59 UTC
The decoupling of timeout and remaining lease is great. I admit I don't fully understand the implementation. The 30 second sanity lower limit will play well with my modem's initial 60s lease. The max 3 failed renew limit will speed things up to figure out the lease is lost which happens on my modem by design with the private ipv4 address once the publicly routable one is obtained.

Fantastic work, thanks!
Comment 7 Paul 2021-05-14 04:41:08 UTC
Created attachment 8941 [details]
log of new udhcpc code

Dropped the new udhcpc source code into my dd-wrt build. Makefile still says v1.33.0 but the udhcpc code only is from master.

The new code now uses the 60s lease and renews it successfully after half that (so every 30s) a few times (nice!)

Once the modem has a public IP address it no longer allows renewal of the private IP address and the new code now only attempts to renew 3 times (nice!)

Unfortunately after the 3 failed renew attempts it starts broadcasting renew attempts forever.

This line of code is always true so we are stuck:

https://git.busybox.net/busybox/tree/networking/udhcp/dhcpc.c#n1548

Before this code change we were checking if timeout was positive which eventually became false which allowed us to reach the code below ie:

/* Timed out, enter init state */

I added a paul_* debug message printing lease_remaining any time it changes and it starts at zero then always stays at 60. We no longer ever reach the /* Timed out, enter init state */ code. Those lease_remaining log messages also print the timeout variable in case that helps.
Comment 8 Paul 2021-05-22 13:43:10 UTC
Created attachment 8971 [details]
Avoid infinite loop broadcasting renew
Comment 9 Paul 2021-05-22 13:48:08 UTC
Created attachment 8976 [details]
log of new udhcpc code with patch
Comment 10 Paul 2021-05-22 14:16:59 UTC
Created a patch to avoid the infinite loop broadcasting renew attempts shown in the previously attached logs. Used the same code that limits us to three renew attempts to also limit us to three broadcast renew attempts. This works in the case of my modem (new logs attached).

I don't know if this can fail if the REBINDING state is entered through a different code path with packet_num not being reset. Could we go straight from failed retries to /* Timed out, enter init state */ without going through all the broadcast renew attempt failures?

Perhaps it would be a better idea to reduce the lease_remaining variable as retries happen to eventually reach zero or less and allow us to reach /* Timed out, enter init state */ that way?

If I can provide any useful information to create a proper fix please let me know.
Comment 11 Denys Vlasenko 2021-05-31 22:48:04 UTC
Hopefully fixed now:

commit 687f41f10bd5f493e82395e127f7d4d52b11d308
Date:   Tue Jun 1 00:19:03 2021 +0200

    udhcpc[6]: fix "untangle timeout and remaining lease" fallout

    As reported in bug 13776, before this fix the renew never times out.
Comment 12 Paul 2021-06-01 04:40:17 UTC
Created attachment 8986 [details]
log of even newer udhcpc code

Nailed it! Everything is working as designed:

1) private IP 60s lease obtained and renewed twice
2) three failed renews after modem has public IP
3) three failed broadcasting renew after modem has public IP
4) lease lost, entering init state and public IP lease obtained

The attached log of the new code includes "paul_" messages that display "lease_remaining" and "timeout" each time "lease_remaining" is modified. Just delete these lines to see the regular logs.

The commit is June 1st and the log messages testing the commit are May 31st. :) I'm easily amused...
Comment 13 Paul 2021-10-08 03:21:46 UTC
Created attachment 9111 [details]
log of same udhcpc code with new modem firmware

New modem firmware takes almost six minutes to respond to the "broadcasting discover" after the modem has acquired the public IP address. In the previous firmware it would respond immediately. Leaving this bug closed as this is likely a bug for the ISP or even the modem provider.
Comment 14 Denys Vlasenko 2021-10-08 14:55:50 UTC
It took only 18 seconds for your router to time out on renew's and reach "lease lost, entering init state" state.

However, then the modem did not respond at all to your broadcast discovers, for ~6 minutes.
Comment 15 Paul 2022-04-29 20:47:18 UTC
Modem got another automatic firmware update from the ISP and busybox udhcpc is running perfectly again so as expected the modem firmware was to blame.