Bug 11166

Summary: Erlang bad argument on valid uint64 when crosscompiled on 64-bit host
Product: buildroot Reporter: Frank Vasquez <frankv>
Component: OtherAssignee: Frank Hunleth <fhunleth>
Status: RESOLVED WONTFIX    
Severity: major CC: buildroot
Priority: P3    
Version: 2018.02.1   
Target Milestone: 2018.08   
Hardware: PC   
OS: Linux   
Host: x86-64 Target: armv7
Build:

Description Frank Vasquez 2018-07-21 05:43:10 UTC
We cross-compiled Erlang for a 32-bit ARM target on an x86 64-bit host.
We ran our Erlang application which uses bitcask on the target.
Shortly afterwards our application "crashed with reason: bad argument in call to bitcask_nifs:keydir_get_int".

We cross-compiled Erlang for a 32-bit ARM target on an x86 32-bit host.
We ran our Erlang application which uses bitcask on the target.
Our application ran continuosly without crashing.
Comment 1 Frank Hunleth 2018-07-21 06:35:21 UTC
Is it possible to reproduce this with a small Erlang application? Or with any other code that calls a NIF that runs enif_get_uint64()? This sounds more like an issue with how bitcask was cross-compiled. I assume bitcask was cross-compiled outside of Buildroot, right?
Comment 2 Frank Vasquez 2018-08-03 22:35:16 UTC
Thank you for your prompt reply, Frank.  I was away on vacation and am just getting back to this now.

>  I assume bitcask was cross-compiled outside of Buildroot, right?

Yes and no.  Our application is being built using rebar3 as opposed to Buildroot's built-in rebar-package support.  Our package .mk file looks like this.

define BEAMCOIN_BUILD_CMDS
    $(MAKE) $(TARGET_CONFIGURE_OPTS) -C $(@D) compile \
        PATH='$(BR2_EXTERNAL_HELIUM_PATH)/output/host/lib/erlang/bin:$(PATH)' \
        CPATH='$(BR2_EXTERNAL_HELIUM_PATH)/output/target/usr/lib/erlang/usr/include:$(CPATH)' \
        LDFLAGS='-L$(BR2_EXTERNAL_HELIUM_PATH)/output/build/erlang-20.0/lib/erl_interface/obj/arm-buildroot-linux-gnueabihf -fPIC -shared'
endef

$(eval $(generic-package))

And the underlying Makefile looks like this.

REBAR=./rebar3

compile:
	$(REBAR) compile

You can see from FOOBAR_BUILD_CMDS that I set PATH, CPATH and LDFLAGS so that the NIFs cross-compile correctly.  Don't know if I am going about that the right way.  At least the NIFs appear to build fine on x86 32-bit hosts.  My team is doing bleeding edge blockchain work in Erlang (much of it already open source) so they really want to use rebar3.

> Is it possible to reproduce this with a small Erlang application?

I believe so.  I will try to assemble a small repro using just rebar3 and bitcask.

Cheers,
Frank
Comment 3 Frank Vasquez 2018-08-16 18:51:09 UTC
I have a very simple repro for this bug now.  This bitcask example is taken straight from Joe Armstrong's Programming Erlang book.  Here I am running the example on a Buildroot image cross-compiled on a 32-bit Linux VM.


# erl -pa ebin
Erlang/OTP 20 [erts-9.0] [source] [smp:2:2] [ds:2:2:10] [async-threads:10] [kernel-poll:false]

Eshell V9.0  (abort with ^G)
1> Handle = bitcask:open("some_db", [read_write]).
#Ref<0.4215703536.2955149313.140816>
2> N = 1.
1
3> bitcask:put(Handle, <<"some_key">>, term_to_binary(N)).
ok


And here I am running the same example on a Buildroot image cross-compiled on a 64-bit Linux VM.

# erl -pa ebin
Erlang/OTP 20 [erts-9.0] [source] [smp:2:2] [ds:2:2:10] [async-threads:10] [kernel-poll:false]

Eshell V9.0  (abort with ^G)
1> Handle = bitcask:open("some_db", [read_write]).
#Ref<0.992227993.3221749761.137832>
2> N = 1.
1
3> bitcask:put(Handle, <<"some_key">>, term_to_binary(N)).
** exception error: bad argument
     in function  bitcask_nifs:keydir_get_int/3
        called as bitcask_nifs:keydir_get_int(#Ref<0.992227993.3221880833.137823>,
                                              <<"some_key">>,
                                              18446744073709551615)
     in call from bitcask_nifs:keydir_get/3 (/home/frank/nextgate/rootfs/output/build/erlccbug-0.1.0/_build/default/lib/bitcask/src/bitcask_nifs.erl, line 181)
     in call from bitcask:do_put/5 (/home/frank/nextgate/rootfs/output/build/erlccbug-0.1.0/_build/default/lib/bitcask/src/bitcask.erl, line 1760)
     in call from bitcask:put/3 (/home/frank/nextgate/rootfs/output/build/erlccbug-0.1.0/_build/default/lib/bitcask/src/bitcask.erl, line 298)

Line 181 of bitcask_nifs.erl is the call to keydir_get_int(Ref, Key, Epoch) below.

keydir_get(Ref, Key, Epoch) ->
    case keydir_get_int(Ref, Key, Epoch) of
        E when is_record(E, bitcask_entry) ->
            <<Offset:64/unsigned-native>> = E#bitcask_entry.offset,
            E#bitcask_entry{offset = Offset};
        _ ->
            not_found
    end.

The bad argument exception is coming from this NIF code.

ERL_NIF_TERM bitcask_nifs_keydir_get_int(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[])
{
    bitcask_keydir_handle* handle;
    ErlNifBinary key;
    uint64 epoch; //intentionally odd type to get around warnings

    if (enif_get_resource(env, argv[0], bitcask_keydir_RESOURCE, (void**)&handle) &&
        enif_inspect_binary(env, argv[1], &key) &&
        enif_get_uint64(env, argv[2], &epoch))
    {
        ...
    }
    else
    {
        return enif_make_badarg(env);
    }
}

My theory is that the enif_get_uint64(env, argv[2], &epoch) condition is returning false which should rarely be the case if Erlang is cross-compiled correctly.  Maybe Uint64 is incorrectly defined as unsigned long or something to that effect when cross-compiling on a 64-bit VM.
Comment 4 Frank Hunleth 2018-08-16 22:45:37 UTC
I agree with your assessment. 

I did a quick test on a 32-bit target (64-bit host) locally and Erlang's Uint64 is compiled to "unsigned long long" which is correct for my target. Looking through the Erlang source code, I can see how if SIZEOF_LONG were detected incorrectly that you'd have the problem that you're seeing. It seems to be fine for me. 

Can you check that sizeof(Uint64) is 8 in both your NIF and in Erlang when they're compiled? You can modify the Erlang source code in your output/build/erlang directory and then do a "make erlang-rebuild all".

Also, is your simple example and Buildroot tree somewhere public?
Comment 5 Frank Vasquez 2018-08-17 21:34:31 UTC
Thank you for looking into this, Frank.  I'm relieved that Uint64 compiles to "unsigned long long" on your 64-bit host.

> Can you check that sizeof(Uint64) is 8 in both your NIF and in Erlang when they're compiled?

I ran this test on our device.

Eshell V10.0  (abort with ^G)
1> byte_size(binary:encode_unsigned(16#ffffffffffffffff)).
8

I also instrumented erl_nif.c as follows and rebuilt Erlang.

int enif_inspect_binary(ErlNifEnv* env, Eterm bin_term, ErlNifBinary* bin)
{
    printf("enif_inspect_binary: size of Uint64 is %u\n", sizeof(Uint64));
    ...
}

And here is what I got when I opened a bitcask store on our device.

1> Handle = bitcask:open("some_db", [read_write]).
enif_inspect_binary: size of Uint64 is 8
                                        enif_inspect_binary: size of Uint64 is 8
                                                                                #Ref<0.1026228257.3490185218.216151>

So both Erlang and erl_nif.c agree that the size of a Uint64 is 8 bytes.

Something odd I noticed is that instrumenting the suspect enif_get_uint64 resulted in no debug output when calling the bitcask:put function that is throwing.

#if HAVE_INT64 && SIZEOF_LONG != 8 
int enif_get_int64(ErlNifEnv* env, ERL_NIF_TERM term, ErlNifSInt64* ip)
{
    return term_to_Sint64(term, ip);
}

int enif_get_uint64(ErlNifEnv* env, ERL_NIF_TERM term, ErlNifUInt64* ip)
{
    printf("enif_get_uint64: sizeof Uint64 is %u\n", sizeof(Uint64));
    /* return term_to_Uint64(term, ip); FIXME */
    return 1;
}
#endif /* HAVE_INT64 && SIZEOF_LONG != 8 */

It's as if enif_get_uint64 is never being executed.  I thought maybe the bad arg exception was suppressing the printf so I modified enif_get_uint64 to always return 1.  Erlang shell still reports bad arg exception at line 181.

> Also, is your simple example and Buildroot tree somewhere public?

The Buildroot tree for our device is currently still in a private GitHub repo but I could create a similar Buildroot tree for a BeagleBone Black and publish that for you to clone.  Our device is built on TI's Sitara AM5728 processor so it is very similar to a BeagleBoard x15.
Comment 6 Frank Vasquez 2018-08-20 17:16:21 UTC
> Also, is your simple example and Buildroot tree somewhere public?

Here it is.

https://github.com/fvasquez/buildroot

See the following commit for my rebar3-related modifications.

https://github.com/fvasquez/buildroot/commit/6411ce9d06a1880e082674231a1902f364fc4da8

Make sure to checkout the bbb-bitcask branch before building.  This branch is based off of the Buildroot 2018.02.01 tag since that's what we're running on our device.

$ git checkout bbb-bitcask
$ make bbb-bitcask_defconfig
$ make

I booted the resulting image on a BeagleBone Black and verified that my bug still repros.

# erl
Eshell V9.0  (abort with ^G)
1> Handle = bitcask:open("some_db", [read_write]).
#Ref<0.3081199234.2883585.143186>
2> N = 1.
1
3> bitcask:put(Handle, <<"some_key">>, term_to_binary(N)).
** exception error: bad argument
     in function  bitcask_nifs:keydir_get_int/3
        called as bitcask_nifs:keydir_get_int(#Ref<0.3081199234.3014657.143177>,
                                              <<"some_key">>,
                                              18446744073709551615)
     in call from bitcask_nifs:keydir_get/3 (/home/frank/buildroot/output/build/erlang-bitcask-0.1.0/_build/default/lib/bitcask/src/bitcask_nifs.erl, line 181)
     in call from bitcask:do_put/5 (/home/frank/buildroot/output/build/erlang-bitcask-0.1.0/_build/default/lib/bitcask/src/bitcask.erl, line 1760)
     in call from bitcask:put/3 (/home/frank/buildroot/output/build/erlang-bitcask-0.1.0/_build/default/lib/bitcask/src/bitcask.erl, line 298)

Make sure to delete the some_db directory or rename your bitcask store before re-running bitcask:open.
Comment 7 Frank Vasquez 2018-10-20 07:36:28 UTC
We have moved on from Bitcask to a more robust key-value store.  Marking issue as resolved.