Bug 13671 - openSSH server closes connection before authentication is finished
Summary: openSSH server closes connection before authentication is finished
Status: RESOLVED WORKSFORME
Alias: None
Product: buildroot
Classification: Unclassified
Component: Other (show other bugs)
Version: 2021.02
Hardware: All Linux
: P5 normal
Target Milestone: ---
Assignee: unassigned
URL:
Keywords:
: 13626 (view as bug list)
Depends on:
Blocks:
 
Reported: 2021-03-23 19:28 UTC by Geert Lens
Modified: 2021-08-17 16:59 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:


Attachments
OpenSSH client and server logs and defconfig (24.73 KB, text/plain)
2021-03-23 19:30 UTC, Geert Lens
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Geert Lens 2021-03-23 19:28:39 UTC
After updating from v2019.02.09 to v2021.02, I am not able to connect to my board via SSH. The OpenSSH server on my device closes the connection abruptly and exits with 255 before the keys can be exchanged. I have tried to use a password, but this gave the same result. I am using the default sshd_config as this was working with v2019.02.09.

I attached the server and client logs as well as the defconfig for more information.
Comment 1 Geert Lens 2021-03-23 19:30:24 UTC
Created attachment 8831 [details]
OpenSSH client and server logs and defconfig
Comment 2 Matt Weber 2021-03-23 20:01:39 UTC
We've noticed a similar issue and haven't quite narrowed it down. So far what we observed is that everything works on a 2020.02 LTS through the latest master with a GCC8.x Buildroot internal toolchain.  However, when we switched to GCC9.x (bootlin stable toolchain) while using 2020.02 LTS we noticed this same behavior with packet type 50 (https://tools.ietf.org/html/rfc4252#section-6 ).
Comment 3 Peter Seiderer 2021-03-23 20:45:41 UTC
Additional/similar bug report:

https://bugs.busybox.net/show_bug.cgi?id=13626

Older OpenSSH login problem report:

http://lists.busybox.net/pipermail/buildroot/2020-August/289111.html
http://lists.busybox.net/pipermail/buildroot/2020-September/291853.html

Dropbear login problem with password and BR2_TARGET_GENERIC_PASSWD_SHA512:

http://lists.busybox.net/pipermail/buildroot/2020-August/288682.html

By the way, you disabled BR2_TARGET_ENABLE_ROOT_LOGIN, how did you setup the root and/or test account/password?

And another reported OpenSSH login problem:

http://lists.busybox.net/pipermail/buildroot/2020-August/289111.html
http://lists.busybox.net/pipermail/buildroot/2020-September/291379.html
Comment 4 Geert Lens 2021-03-23 21:16:28 UTC
I overwrite the default root password with the correct SHA512 and add the authorized public key using the post-build.sh script of my board.
Comment 5 Peter Seiderer 2021-03-23 22:10:08 UTC
I can reproduce (maybe the same) problem on Rpi4 with this defconfig:

BR2_arm=y
BR2_cortex_a72=y
BR2_ARM_FPU_NEON_VFPV4=y
BR2_TOOLCHAIN_EXTERNAL=y
BR2_TARGET_GENERIC_PASSWD_SHA512=y
BR2_ROOTFS_DEVICE_CREATION_DYNAMIC_EUDEV=y
BR2_ROOTFS_MERGED_USR=y
BR2_SYSTEM_BIN_SH_BASH=y
BR2_SYSTEM_DHCP="eth0"
BR2_SYSTEM_DEFAULT_PATH="/bin:/sbin:/usr/bin:/usr/sbin"
BR2_TARGET_TZ_INFO=y
BR2_ROOTFS_POST_BUILD_SCRIPT="board/raspberrypi4/post-build.sh"
BR2_ROOTFS_POST_IMAGE_SCRIPT="board/raspberrypi4/post-image.sh"
BR2_LINUX_KERNEL=y
BR2_LINUX_KERNEL_CUSTOM_TARBALL=y
BR2_LINUX_KERNEL_CUSTOM_TARBALL_LOCATION="$(call github,raspberrypi,linux,967d45b29ca2902f031b867809d72e3b3d623e7a)/linux-967d45b29ca2902f031b867809d72e3b3d623e7a.tar.gz"
BR2_LINUX_KERNEL_DEFCONFIG="bcm2711"
BR2_LINUX_KERNEL_DTS_SUPPORT=y
BR2_LINUX_KERNEL_INTREE_DTS_NAME="bcm2711-rpi-4-b"
BR2_LINUX_KERNEL_NEEDS_HOST_OPENSSL=y
BR2_PACKAGE_BUSYBOX_SHOW_OTHERS=y
BR2_PACKAGE_STRACE=y
BR2_PACKAGE_RPI_FIRMWARE=y
BR2_PACKAGE_RPI_FIRMWARE_VARIANT_PI4=y
BR2_PACKAGE_RPI_FIRMWARE_CONFIG_FILE="board/raspberrypi4/config_4.txt"
BR2_PACKAGE_DBUS=y
BR2_PACKAGE_LIBCAP=y
BR2_PACKAGE_OPENSSH=y
BR2_PACKAGE_KMOD_TOOLS=y
BR2_PACKAGE_UTIL_LINUX_AGETTY=y
BR2_PACKAGE_UTIL_LINUX_FSCK=y
BR2_PACKAGE_UTIL_LINUX_MOUNT=y
BR2_TARGET_ROOTFS_EXT2=y
BR2_TARGET_ROOTFS_EXT2_4=y
BR2_TARGET_ROOTFS_EXT2_SIZE="120M"
# BR2_TARGET_ROOTFS_TAR is not set
BR2_PACKAGE_HOST_DOSFSTOOLS=y
BR2_PACKAGE_HOST_GENIMAGE=y
BR2_PACKAGE_HOST_MTOOLS=y


On the serial console I get the following log in case of ssh login abort/failure:

[  110.415395] audit: type=1326 audit(110.409:3): auid=4294967295 uid=1001 gid=1001 ses=4294967295 pid=248 comm="sshd" exe="/usr/sbin/sshd" sig=31 arch=40000028 syscall=403 compat=0 ip=0xb6b9b766 code=0x0


Strace output looks like the following:

243   write(6, "\0\0\0e\0\0\0\23ecdsa-sha2-nistp256\0\0\0J\0"..., 105 <unfinished ...>                                   
248   read(5,  <unfinished ...>                                                                                 
243   <... write resumed>)              = 105                                                                   
248   <... read resumed>"\7\0\0\0e\0\0\0\23ecdsa-sha2-nistp256\0\0\0J"..., 106) = 106                           
243   poll([{fd=6, events=POLLIN}, {fd=7, events=POLLIN}], 2, -1 <unfinished ...>                                
248   clock_gettime64(CLOCK_BOOTTIME,  <unfinished ...>) = ?                                                  
248   +++ killed by SIGSYS +++                                                                              
243   <... poll resumed>)               = 2 ([{fd=6, revents=POLLIN|POLLHUP}, {fd=7, revents=POLLHUP}])          
243   --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=248, si_uid=1001, si_status=SIGSYS, si_utime=4, si_stime=1} ---


The call to clock_gettime64() is aborted with SIGSYS...., but there is already an (doubled) entry for it in openssh-8.4p1/sandbox-seccomp-filter.c (maybe  __NR_clock_gettime64 is not defined), see e.g. [1]...

[1] http://lists.busybox.net/pipermail/buildroot/2020-August/289369.html
Comment 6 Peter Seiderer 2021-03-23 22:32:27 UTC
The following patch/hack fixed the problem for my testcase:

--- openssh-8.4p1/sandbox-seccomp-filter.c_orig	2021-03-23 23:15:02.131964000 +0100
+++ openssh-8.4p1/sandbox-seccomp-filter.c	2021-03-23 23:24:24.388408285 +0100
@@ -189,6 +189,11 @@
 #ifdef __NR_clock_gettime
 	SC_ALLOW(__NR_clock_gettime),
 #endif
+
+#ifndef __NR_clock_gettime64
+#define __NR_clock_gettime64 403
+#endif
+
 #ifdef __NR_clock_gettime64
 	SC_ALLOW(__NR_clock_gettime64),
 #endif
@@ -252,6 +257,11 @@
 #ifdef __NR_clock_nanosleep
 	SC_ALLOW(__NR_clock_nanosleep),
 #endif
+
+#ifndef __NR_clock_nanosleep_time64
+#define __NR_clock_nanosleep_time64 407
+#endif
+
 #ifdef __NR_clock_nanosleep_time64
 	SC_ALLOW(__NR_clock_nanosleep_time64),
 #endif
Comment 7 Geert Lens 2021-03-24 11:35:40 UTC
When I use the Arm ARM 2020.11 toolchain (GCC 10.2, GDB 10.1, glibc 2.31, Binutils 2.35.1) to build 2021.02, the OpenSSH is being killed when trying to login.
I made a strace using that setup and got this:

[pid 14957] write(2, "debug3: mm_request_send entering"..., 52 <unfinished ...>
[pid 15027] write(7, "\0\0\0/\0\0\0\6\0\0\0'input_userauth_reque"..., 51 <unfinished ...>
[pid 14957] <... write resumed>)        = 52
[pid 15027] <... write resumed>)        = 51
[pid 14957] poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}], 2, -1 <unfinished ...>
[pid 15027] write(7, "\0\0\08\0\0\0\7\0\0\0000user_specific_delay:"..., 60 <unfinished ...>
[pid 14957] <... poll resumed>)         = 2 ([{fd=5, revents=POLLIN}, {fd=6, revents=POLLIN}])
[pid 15027] <... write resumed>)        = 60
[pid 14957] read(6,  <unfinished ...>
[pid 15027] clock_gettime(CLOCK_BOOTTIME,  <unfinished ...>
[pid 14957] <... read resumed>"\0\0\0/", 4) = 4
[pid 15027] <... clock_gettime resumed>{tv_sec=3303, tv_nsec=218403125}) = 0
[pid 14957] read(6,  <unfinished ...>
[pid 15027] write(7, "\0\0\0[\0\0\0\7\0\0\0Sensure_minimum_time_"..., 95 <unfinished ...>
[pid 14957] <... read resumed>"\0\0\0\6\0\0\0'input_userauth_request: "..., 47) = 47
[pid 15027] <... write resumed>)        = 95
[pid 14957] write(2, "debug2: input_userauth_request: "..., 59 <unfinished ...>
[pid 15027] clock_nanosleep_time64(CLOCK_REALTIME, 0, {tv_sec=0, tv_nsec=24392751410935043},  <unfinished ...>
[pid 14957] <... write resumed>)        = 59
[pid 15027] <... clock_nanosleep_time64 resumed> <unfinished ...>) = ?
[pid 14957] poll([{fd=5, events=POLLIN}, {fd=6, events=POLLIN}], 2, -1 <unfinished ...>
[pid 15027] +++ killed by SIGSYS +++
<... poll resumed>)                     = 2 ([{fd=5, revents=POLLIN}, {fd=6, revents=POLLIN}])
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=15027, si_uid=1001, si_status=SIGSYS, si_utime=8, si_stime=4} ---

It looks similar to what you could reproduce, the only difference is that the call to clock_nanosleep_time64() is aborted with SIGSYS.

When I use the Arm ARM 2019.2 toolchain (GCC 9.2.1, GDB 8.3.0, glibc 2.30, Binutils 2.33.1) to build 2021.02, OpenSSH is not being killed and I am able to login in.
Do you have any idea what could cause this?

Furthermore, I am using the Linux kernel v4.14.78 in my project.
Comment 8 Peter Seiderer 2021-03-25 21:09:02 UTC
(In reply to Geert Lens from comment #7)

No other advise at the moment as avoid the buggy (?) pre-build toolchains,
change to a buildroot build ones (and/or avoid/fix uclibc?), or patch openssh
according/specific to your toolchain/openssh failure (see suggested patch above),
or disable seccomp-filter for openssh (did not investigate yet if or how it is
possible)...
Comment 9 romain.naour 2021-03-27 16:51:50 UTC
Hello,

I believe it's an issue between the kernel-headers version and glibc >= 2.31.
I'm able to reproduce with the Arm ARM 2020.11 and Bootlin toolchain stable 2020.08-1 but not with Bootlin toolchain bleeding-edge 2020.08-1.

The Arm ARM 2020.11 provide 4.20.3 kernel headers while the Bootlin toolchain bleeding-edge 2020.08-1 provide 5.4.61. Both use glibc 2.31.

Since glibc 2.31, there is a tables with system call numbers that come from kernel 5.4 [1]. But syscall like __NR_clock_gettime64 has been added since kernel 5.1 [2]

It seems we need a toolchain with kernel headers >= 5.1 to workaround the issue.

[1] https://sourceware.org/git/?p=glibc.git;a=commit;h=4cf0d223052dabb9caed29e1e91e1d61933e14fb
[2] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=48166e6ea47d23984f0b481ca199250e1ce0730a

Best regards,
Romain
Comment 10 romain.naour 2021-03-28 17:41:43 UTC
See https://lore.kernel.org/lkml/20190719170343.GA13680@linux.intel.com/

"Using __NR_clock_gettime64 instead of __NR_clock_gettime breaks userspace
applications that use seccomp filtering to block syscalls, as applications
are completely unaware of the newly added of __NR_clock_gettime64, e.g.
sshd gets zapped on syscall(403) when attempting to ssh into the system."

The issue doesn't seems fixed with a running kernel 5.10.7 with a system built with a toolchain using 4.19.x kernel headers.
Comment 11 Geert Lens 2021-03-30 10:18:20 UTC
Hi,

Thank you for all your help clarifying why this is happening. I have for the time being implemented the patch/hack that Peter provided and will when available switch to a toolchain that provides the needed headers.

Best regards,
Geert
Comment 12 Peter Korsgaard 2021-08-17 16:59:33 UTC
*** Bug 13626 has been marked as a duplicate of this bug. ***