Bug 9471 - wget doesn't work on my system (http)
Summary: wget doesn't work on my system (http)
Status: RESOLVED FIXED
Alias: None
Product: Busybox
Classification: Unclassified
Component: Networking (show other bugs)
Version: unspecified
Hardware: PC Linux
: P5 normal
Target Milestone: ---
Assignee: unassigned
URL:
Keywords:
: 9981 (view as bug list)
Depends on:
Blocks:
 
Reported: 2016-12-09 05:12 UTC by mrs.sub
Modified: 2017-06-22 04:01 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description mrs.sub 2016-12-09 05:12:45 UTC
Hi. wget refuses to wget on my systems (ArchLinux 4.8.12 and custom 4.9-rc8)

busybox versions: 
    v1.26.0.git snapshot from site (2016-12-09 00:20)
    v1.25.1 from site              (2016-10-07 15:24)

GCC version:
    6.2.1 20160830 (from ArchLinux repositories)



After I request `wget http://localhost/` it returns

   | Connecting to localhost (127.0.0.1:80)
   | wget: error getting response



I'v tried to build it in many different ways: statically linked, dynamically linked, 
using glibc or musl - result was always the same. At the time wget from busybox 1.25.1 
from official Arch repos works. `strace` comparison gave me a hint that the source 
code has been changed:

strace from version that *works*:

    socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
    connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
    ioctl(3, TIOCGWINSZ, 0x7ffcd6cf39a8)    = -1 ENOTTY (Inappropriate ioctl for device)
    writev(3, [{iov_base="GET / HTTP/1.1\r\nHost: localhost\r"..., iov_len=72}, {iov_base=NULL, iov_len=0}], 2) = 72
    setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={900, 0}}, {it_interval={0, 0}, it_value={899, 999701}}) = 0
    readv(3, [{iov_base="", iov_len=0}, {iov_base="HTTP/1.1 200 OK\r\nServer: nginx/1"..., iov_len=1024}], 2) = 497

strace from version that *doesn't work*:

    socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
    connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
    ioctl(3, TIOCGWINSZ, 0x7fff515b5668)    = -1 ENOTTY (Inappropriate ioctl for device)
    writev(3, [{iov_base="GET / HTTP/1.1\r\nHost: localhost\r"..., iov_len=72}, {iov_base=NULL, iov_len=0}], 2) = 72
  * shutdown(3, SHUT_WR)                    = 0
    setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={900, 0}}, {it_interval={0, 0}, it_value={899, 966622}}) = 0
    readv(3, [{iov_base="", iov_len=0}, {iov_base="", iov_len=1024}], 2) = 0


And then I've found this commit: https://git.busybox.net/busybox/commit/networking/wget.c?id=de3da6bf87a579a344b0581c6f2ce6a40166b432 
I commented the line out, recompile busybox and get wget working. I don't really know 
if it affects others' systems. It can be that my host is misconfigured.

A guess is that kernel closes write ability for a connection before all the request 
is transmitted but I really don't know.
Comment 1 Denys Vlasenko 2016-12-10 19:41:17 UTC
Works for me:

$ strace -ozz -tt -s99 ./busybox wget http://localhost/
Connecting to localhost (127.0.0.1:80)
index.html           100% |*******************************|  1321   0:00:00 ETA

Log:

20:39:28.722827 write(3, "GET / HTTP/1.1\r\nHost: localhost\r\nUser-Agent: Wget\r\nConnection: close\r\n\r\n", 72) = 72
20:39:28.722953 shutdown(3, SHUT_WR)    = 0
20:39:28.723047 alarm(900)              = 900
20:39:28.723123 read(3, "HTTP/1.0 200 OK\r\n", 4096) = 17
20:39:28.723517 alarm(900)              = 900
20:39:28.723591 read(3, "\r\n<html><head><title>Index of /</title>\n<style>\ntable {\nwidth:100%;\nbackground-color:#fff5ee;\nborde"..., 4096) = 1323
Comment 2 4mlinux 2017-01-11 16:41:54 UTC
I confirm this bug. I made some tests with my domain (www.4mlinux.com), which is managed via CloudFlare. When its IP is not hidden then wget is able to connect. When the real IP is hidden (standard settings in CloudFlare and similar services), then wget ends up with the network unreachable error. 
I also tried to connect to various popular sites in my country with the same result: wget was able reach only part of them.

The problem described above did not exist in the BusyBox 1.25 series (end earlier) . 

Regards,
zk1234
Comment 3 4mlinux 2017-01-11 17:01:33 UTC
If someone wants to reproduce this error: 4mlinux.com is now setup to hide its real IP. You will be able to "wget" 4mlinux.com using the BusyBox 1.25 series, while the BusyBox 1.26 series won't be able to connect.

Regards,
zk1234
Comment 4 Denys Vlasenko 2017-01-11 19:09:08 UTC
(In reply to 4mlinux from comment #3)
This is what happens:

20:03:03.976590 write(2, "Connecting to ", 14) = 14
20:03:03.976804 write(2, "4mlinux.com", 11) = 11
20:03:03.976953 write(2, " (", 2)       = 2
20:03:03.977106 write(2, "104.28.8.32:80", 14) = 14
20:03:03.977201 write(2, ")\n", 2)      = 2
20:03:03.977282 alarm(900)              = 0
20:03:03.977353 socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 3
20:03:03.977439 connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("104.28.8.32")}, 16) = 0
20:03:03.982732 fcntl(3, F_GETFL)       = 0x2 (flags O_RDWR)
20:03:03.982829 ioctl(3, TCGETS, 0xffdd354c) = -1 ENOTTY (Inappropriate ioctl for device)
20:03:03.982917 write(3, "GET / HTTP/1.1\r\nHost: 4mlinux.com\r\nUser-Agent: Wget\r\nConnection: close\r\n\r\n", 74) = 74
20:03:03.983025 shutdown(3, SHUT_WR)    = 0
20:03:03.983110 alarm(900)              = 900
20:03:03.983174 read(3, "", 1024)       = 0
20:03:03.997494 write(2, "wget: error getting response\n", 29) = 29
20:03:03.997859 exit(1)                 = ?
20:03:03.998347 +++ exited with 1 +++

The peer simply does not return anything. It closes its connection.

Probably it detects wget closing its writing end (the shutdown(3, SHUT_WR) thing).

The point it, closing write side of the socket is _valid_ for HTTP.
wget sent the full request, it won't be sending anything more: it will only receive the response, and that's it.
(It even said so with "Connection: close" header, although it should work without that too).
Comment 5 Denys Vlasenko 2017-01-11 19:19:23 UTC
Applied workaround in git, but I think it's a broken server.
Comment 6 4mlinux 2017-01-11 21:53:00 UTC
(In reply to Denys Vlasenko from comment #5)
Wget "talks to" cloudflare-nginx (the real 4MLinux LAMP server is, in a sense, hidden). CloudFlare (and the like) are becoming more and more popular. Therefore I decided to report this issue.

Thanks for your explanation!
zk1234
Comment 7 Denys Vlasenko 2017-01-12 09:18:43 UTC
(In reply to 4mlinux from comment #6)
> Wget "talks to" cloudflare-nginx

This explains things. Cloudflare is a tool to counteract DDOS attacks. Many DDOS attacks are done with tools which try to open gazillion connections, and to do that efficiently, attacking tool _closes_ its write side (to conserve its resources and be able to send larger number of requests).

So Cloudflare thinks that "this guy looks like an attacker (even though its request is technically valid, browsers don't do that (they use persistent connections, for one))" and drops the connection
Comment 8 Neil MacLeod 2017-06-22 04:01:28 UTC
*** Bug 9981 has been marked as a duplicate of this bug. ***