Bug 14396 - GIthub schema is broken
Summary: GIthub schema is broken
Status: RESOLVED WONTFIX
Alias: None
Product: buildroot
Classification: Unclassified
Component: Other (show other bugs)
Version: 2021.08.2
Hardware: All Linux
: P5 normal
Target Milestone: ---
Assignee: Yann E. MORIN
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-11-26 15:13 UTC by Olga
Modified: 2021-11-28 16:49 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Olga 2021-11-26 15:13:58 UTC
Github seems to have adopted a new schema for releases: instead of:

https://github.com/<user>/<project>/archive/<version>/<project>-<version>.tar.gz

it's now:

https://github.com/<user>/<project>/archive/refs/tags/<version>.tar.gz

which is then redirected to a location containing <project>-<version>.tar.gz.
This effectively breaks at least jsmn and gtest, but probably all github packages.

Steps to reproduce:

1. Download a fresh Buildroot, e.g. the master branch.
2. make menuconfig and select gtest
3. In menuconfig: disable BR2_BACKUP_SITE
3. make source

This results in GTest download failing with error 404:

wget --passive-ftp -nd -t 3 -O '/home/user/projects/buildroot/output/build/.gtest-1.11.0.tar.gz.FcavV0/output' 'https://github.com/google/googletest/archive/release-1.11.0/gtest-1.11.0.tar.gz' 
--2021-11-26 18:12:27--  https://github.com/google/googletest/archive/release-1.11.0/gtest-1.11.0.tar.gz
Location: https://codeload.github.com/google/googletest/tar.gz/release-1.11.0/gtest-1.11.0 [following]
--2021-11-26 18:12:27--  https://codeload.github.com/google/googletest/tar.gz/release-1.11.0/gtest-1.11.0
Connecting to proxy.elvees.com (proxy.elvees.com)|192.168.1.15|:3128... connected.
Proxy request sent, awaiting response... 404 Not Found
2021-11-26 18:12:28 ERROR 404: Not Found.
Comment 1 Jens Maus 2021-11-27 12:03:38 UTC
I am seeing the same behaviour here, thus can reproduce it.
Comment 2 Olga 2021-11-27 21:12:15 UTC
To save maintainers some trouble, I've been brainstorming how to fix this issue.

The main problem that I see in this is that the value that should be assigned to <PACKAGE>_SOURCE and the actual name of the .tar.gz differ. The bluntest solution is to define <PACKAGE>_SOURCE for each package as something like release-$(PACKAGE_VERSION).tar.gz, but that means that all mirrors of those packages will fail and have to be rebuilt.

Instead, maybe $(call github,author,repo,version) should define a fake URI protocol called github, which allows us to bodge the download process via a new script located in support/download/github, saving the original <PACKAGE>_SOURCE definitions while allowing download from unusual URLs that github provides.
Comment 3 Yann E. MORIN 2021-11-27 21:32:43 UTC
Olga, All,

Thanks for the report.

Indeed, this is very broken: for tags, github.com returns 404, while for
hashes, it returns 500. It sometimes also returns 502 or 504 in either
case.

> it's now:
> https://github.com/<user>/<project>/archive/refs/tags/<version>.tar.gz

This unfortunately does not work either: it returns 500:

    wget wget https://github.com/google/googletest/refs/tags/release-1.11.0.tar.gz
    [...]
    HTTP request sent, awaiting response... 500 Internal Server Error

Another try a few minutes later yielded:

    HTTP request sent, awaiting response... 502 Bad Gateway

Besides, the URLs makes it look that it would anyway only work for tag, not
hashes.

Of course, this is not going to be easy to fix, and there does not seem like
there is an obvious fix anyway...

My opinion is that we should stop using the forges' archive mechanisms, which
have proved to be fluky in time, and use git for what it is and use git-clone
to retrieve git-hosted code (or svn et al.). This will however raise two issues.
First, we have a lot (like a *lot*) of packages using the github macro; fixing
them is quite easy with a bit of sed, and manual review/tweaks in corner cases.
Second, this will trigger much bigger downloads, which can be a bit overwhelming
for large repositories. That latter issue is smoothened by the use of the local
git cache, so this is only really a problem on the first download.

But right now, github.com seems to be having issues; it is very slow to return
any result, and even the website is very un-responsive... I'll try to investigate
further tomorrow (GMT+1)...

Regards,
Yann E. MORIN.
Comment 4 Olga 2021-11-27 21:54:24 UTC
(In reply to Yann E. MORIN from comment #3)
> This unfortunately does not work either: it returns 500:
>
>    wget wget https://github.com/google/googletest/refs/tags/release-1.11.0.tar.gz
>    [...]
>    HTTP request sent, awaiting response... 500 Internal Server Error
>
>Another try a few minutes later yielded:
>
>    HTTP request sent, awaiting response... 502 Bad Gateway

That was because you missed the "/archive/" part of the URL between <project> and /refs/tags/. With that part, it currently works.
Comment 5 Yann E. MORIN 2021-11-27 22:14:21 UTC
(In reply to Olga from comment #4)
> That was because you missed the "/archive/" part of the URL between <project> and /refs/tags/.

Indeed. And although this does work for tags, it does not for hashes.

It seems the old schema works for hashes, though. But given a FOO_VERSION,
there is no systematic solution to automatically tell hashes apart from tags,
so we don't have a solution to expand the github macro one way or the other.

Regards,
Yann E. MORIN.
Comment 6 Jens Maus 2021-11-28 08:39:15 UTC
(In reply to Yann E. MORIN from comment #3)

> Second, this will trigger much bigger downloads, which can be a bit overwhelming
> for large repositories. That latter issue is smoothened by the use of the local
> git cache, so this is only really a problem on the first download.

Please keep in mind that there are a lot of CI driven build environments out there (like in my case) where no git cache environment is available, thus packages are always freshly downloaded. Therefore the direct archive download - rather than doing a full fledged git clone on each run - resulted in significantly smaller (i.e faster) downloads and thus faster CI build-times.
Comment 7 Yann E. MORIN 2021-11-28 15:52:55 UTC
FTR: this is tracked in Github with: https://github.com/github/feedback/discussions/8149
Comment 8 Yann E. MORIN 2021-11-28 16:28:01 UTC
(In reply to Jens Maus from comment #6)
> there are a lot of CI driven build environments out there (like in my case)
> where no git cache environment is available, thus packages are always freshly
> downloaded

Yes, sure, this was the reason for the github helper macro in the first place.

But if Github changes their download scheme (that's not the first time), there
is no way we can have our releases that are not broken at some point. For
example, 2021.02.7 (latest LTS release) is now broken, 2021.08.2 (latest
maintenance release) is now broken.

And if we have no way to detect whether we need the hash scheme, or the tag
scheme, what is the alternate option?

(In reply to Olga from comment #2)
> The bluntest solution is to define <PACKAGE>_SOURCE for each package as
> something like release-$(PACKAGE_VERSION).tar.gz,

For some packages, the tag is exactly the version string, for others a
leading 'v' is added, others have a custom, non-standard prefix (like gtest,
which has 'release-').

But such a mass fixup is not very nice... It needs carefull sed replacements
and a lot of review time to catch corner cases... We should try and find
another solution...

> but that means that all mirrors of those packages will fail and have to
> be rebuilt.

sources.buildroot.org is automatically rebuilt, so that is not an issue for
us. Others whi maintain their own mirror, will have to, well, just maintain
it by opushing the new archives as needed. ;-)

Regards,
Yann E. MORIN.
Comment 9 Yann E. MORIN 2021-11-28 16:49:31 UTC
Olga, All,

Github has reverted the change now:

https://github.com/github/feedback/discussions/8149#discussioncomment-1712006

As they said they will avoid such a regression in the future, we can
(hopefully!) consider the issue fixed, I believe.

Thanks for the report! :-)

Regards,
Yann E. MORIN.