Bug 6836

Summary: wget redirected page use wrong authentication info
Product: Busybox Reporter: Dalei Liu <daleiliu>
Component: NetworkingAssignee: unassigned
Status: RESOLVED FIXED    
Severity: minor CC: busybox-cvs
Priority: P5    
Version: 1.22.x   
Target Milestone: ---   
Hardware: Other   
OS: Linux   
Host: Target:
Build:
Attachments: wget patch to add --user and --password

Description Dalei Liu 2014-02-01 00:58:51 UTC
When wget an URL with username/password and received a 302 message, the second request sometimes uses wrong username/password info.

For example, run this command:
wget http://user:password@primary.example.com/my-test-file

If the server sends back 302 and redirect to:
http://backup.example.com:80/my-test-file

There will be a chance it uses "backup.example.com:80" or some other random string as username/password.
Comment 1 Dalei Liu 2014-02-01 01:09:32 UTC
I checked networking/wget.c file and found that in parse_url() function, the h->user pointed to h->allocated, which will be freed each time it parses a new url.  So when it's called the second time after 302 response in line 889, the h->user pointed to a freed area.

Here is my patch as a workaround, but I didn't think it's a good fix.  I'm not sure if a redirected page should use new location url with or without old username/password.  If it's not allowed, probably we should set h->user to NULL in parse_url().  Or maybe we should implement --user and --password just like GNU wget to avoid confusion.

diff --git a/networking/wget.c b/networking/wget.c
index d6c509e..8b2800d 100644
--- a/networking/wget.c
+++ b/networking/wget.c
@@ -327,13 +327,21 @@ static void parse_url(const char *src_url, struct host_info *h)
 
 	sp = strrchr(h->host, '@');
 	if (sp != NULL) {
+		char *user;
+
 		// URL-decode "user:password" string before base64-encoding:
 		// wget http://test:my%20pass@example.com should send
 		// Authorization: Basic dGVzdDpteSBwYXNz
 		// which decodes to "test:my pass".
 		// Standard wget and curl do this too.
 		*sp = '\0';
-		h->user = percent_decode_in_place(h->host, /*strict:*/ 0);
+		user = percent_decode_in_place(h->host, /*strict:*/ 0);
+		if (user) {
+			// release previous user info
+			if (h->user)
+				free(h->user);
+			h->user = strdup(user);
+		}
 		h->host = sp + 1;
 	}
Comment 2 Denys Vlasenko 2014-02-03 13:11:16 UTC
Thanks for investigating!

This should fix it:

http://git.busybox.net/busybox/commit/?id=d353bfff467517608af7468198431d32406bb943
Comment 3 Dalei Liu 2014-02-03 17:34:31 UTC
Thanks for the quick response.  This patch will surely work.  But here is what I observed from GNU wget:

1. It only sends user/password after the 401 from the server.

2. If the first http request has user/password inside the URL, it will use it for this original request but not the second 302 redirected URL.  It makes sense because the user/password is only a part of the original URL.

3. If --user and --password is specified, it applies to both original and redirected requests, and if there is another set of user/password inside the URL, wget will override them with the --user/--password settings.

In the patch, it implies the redirected URL inherits the user/password from the original URL, which may or may not what the user wants.  I could not find a define behaviour of this situation from the RFCs, but it makes it a little bit different between busybox wget and GNU version.
Comment 4 Dalei Liu 2014-02-03 22:44:45 UTC
Created attachment 5210 [details]
wget patch to add --user and --password

Attached file is my patch to add --user and --password long options to wget.  Besides the new options, this patch does not apply user/password inside an URL to the redirected page, which is different than the previous patches.
Comment 5 Denys Vlasenko 2014-02-04 12:26:46 UTC
(In reply to comment #3)
> Thanks for the quick response.  This patch will surely work.  But here is what
> I observed from GNU wget:
> 
> 1. It only sends user/password after the 401 from the server.
> 
> 2. If the first http request has user/password inside the URL, it will use it
> for this original request but not the second 302 redirected URL.  It makes
> sense because the user/password is only a part of the original URL.

This will work for GNU wget since after redirect to a protected resource,
it gets 401 from it, and responds with Authorization: header.

Since we don't do "wait for 401" thing, we should send Authorization:
right away.
Comment 6 Dalei Liu 2014-02-04 17:29:56 UTC
Yes, GNU wget only sends Authorization after 401, but doesn't reuse the credentials in the original URL.
Comment 7 Dalei Liu 2014-02-04 17:33:04 UTC
It's not about when we send Authorization header, right away or after 401, but about if we want to reuse the user/password inside the original URL.
Comment 8 Denys Vlasenko 2014-02-11 10:29:12 UTC
(In reply to comment #6)
> Yes, GNU wget only sends Authorization after 401, but doesn't reuse the
> credentials in the original URL.

What happens when the following occurs:

- wget contacts the server with normal request
- server responds with 401
- wget resends request, this time with Authorization: header with username/pwd from URL
- server responds with 302
- wget resends normal request
- server responds with 401
- wget resends request with Authorization: header with WHICH username/pwd?