We cross-compiled Erlang for a 32-bit ARM target on an x86 64-bit host. We ran our Erlang application which uses bitcask on the target. Shortly afterwards our application "crashed with reason: bad argument in call to bitcask_nifs:keydir_get_int". We cross-compiled Erlang for a 32-bit ARM target on an x86 32-bit host. We ran our Erlang application which uses bitcask on the target. Our application ran continuosly without crashing.
Is it possible to reproduce this with a small Erlang application? Or with any other code that calls a NIF that runs enif_get_uint64()? This sounds more like an issue with how bitcask was cross-compiled. I assume bitcask was cross-compiled outside of Buildroot, right?
Thank you for your prompt reply, Frank. I was away on vacation and am just getting back to this now. > I assume bitcask was cross-compiled outside of Buildroot, right? Yes and no. Our application is being built using rebar3 as opposed to Buildroot's built-in rebar-package support. Our package .mk file looks like this. define BEAMCOIN_BUILD_CMDS $(MAKE) $(TARGET_CONFIGURE_OPTS) -C $(@D) compile \ PATH='$(BR2_EXTERNAL_HELIUM_PATH)/output/host/lib/erlang/bin:$(PATH)' \ CPATH='$(BR2_EXTERNAL_HELIUM_PATH)/output/target/usr/lib/erlang/usr/include:$(CPATH)' \ LDFLAGS='-L$(BR2_EXTERNAL_HELIUM_PATH)/output/build/erlang-20.0/lib/erl_interface/obj/arm-buildroot-linux-gnueabihf -fPIC -shared' endef $(eval $(generic-package)) And the underlying Makefile looks like this. REBAR=./rebar3 compile: $(REBAR) compile You can see from FOOBAR_BUILD_CMDS that I set PATH, CPATH and LDFLAGS so that the NIFs cross-compile correctly. Don't know if I am going about that the right way. At least the NIFs appear to build fine on x86 32-bit hosts. My team is doing bleeding edge blockchain work in Erlang (much of it already open source) so they really want to use rebar3. > Is it possible to reproduce this with a small Erlang application? I believe so. I will try to assemble a small repro using just rebar3 and bitcask. Cheers, Frank
I have a very simple repro for this bug now. This bitcask example is taken straight from Joe Armstrong's Programming Erlang book. Here I am running the example on a Buildroot image cross-compiled on a 32-bit Linux VM. # erl -pa ebin Erlang/OTP 20 [erts-9.0] [source] [smp:2:2] [ds:2:2:10] [async-threads:10] [kernel-poll:false] Eshell V9.0 (abort with ^G) 1> Handle = bitcask:open("some_db", [read_write]). #Ref<0.4215703536.2955149313.140816> 2> N = 1. 1 3> bitcask:put(Handle, <<"some_key">>, term_to_binary(N)). ok And here I am running the same example on a Buildroot image cross-compiled on a 64-bit Linux VM. # erl -pa ebin Erlang/OTP 20 [erts-9.0] [source] [smp:2:2] [ds:2:2:10] [async-threads:10] [kernel-poll:false] Eshell V9.0 (abort with ^G) 1> Handle = bitcask:open("some_db", [read_write]). #Ref<0.992227993.3221749761.137832> 2> N = 1. 1 3> bitcask:put(Handle, <<"some_key">>, term_to_binary(N)). ** exception error: bad argument in function bitcask_nifs:keydir_get_int/3 called as bitcask_nifs:keydir_get_int(#Ref<0.992227993.3221880833.137823>, <<"some_key">>, 18446744073709551615) in call from bitcask_nifs:keydir_get/3 (/home/frank/nextgate/rootfs/output/build/erlccbug-0.1.0/_build/default/lib/bitcask/src/bitcask_nifs.erl, line 181) in call from bitcask:do_put/5 (/home/frank/nextgate/rootfs/output/build/erlccbug-0.1.0/_build/default/lib/bitcask/src/bitcask.erl, line 1760) in call from bitcask:put/3 (/home/frank/nextgate/rootfs/output/build/erlccbug-0.1.0/_build/default/lib/bitcask/src/bitcask.erl, line 298) Line 181 of bitcask_nifs.erl is the call to keydir_get_int(Ref, Key, Epoch) below. keydir_get(Ref, Key, Epoch) -> case keydir_get_int(Ref, Key, Epoch) of E when is_record(E, bitcask_entry) -> <<Offset:64/unsigned-native>> = E#bitcask_entry.offset, E#bitcask_entry{offset = Offset}; _ -> not_found end. The bad argument exception is coming from this NIF code. ERL_NIF_TERM bitcask_nifs_keydir_get_int(ErlNifEnv* env, int argc, const ERL_NIF_TERM argv[]) { bitcask_keydir_handle* handle; ErlNifBinary key; uint64 epoch; //intentionally odd type to get around warnings if (enif_get_resource(env, argv[0], bitcask_keydir_RESOURCE, (void**)&handle) && enif_inspect_binary(env, argv[1], &key) && enif_get_uint64(env, argv[2], &epoch)) { ... } else { return enif_make_badarg(env); } } My theory is that the enif_get_uint64(env, argv[2], &epoch) condition is returning false which should rarely be the case if Erlang is cross-compiled correctly. Maybe Uint64 is incorrectly defined as unsigned long or something to that effect when cross-compiling on a 64-bit VM.
I agree with your assessment. I did a quick test on a 32-bit target (64-bit host) locally and Erlang's Uint64 is compiled to "unsigned long long" which is correct for my target. Looking through the Erlang source code, I can see how if SIZEOF_LONG were detected incorrectly that you'd have the problem that you're seeing. It seems to be fine for me. Can you check that sizeof(Uint64) is 8 in both your NIF and in Erlang when they're compiled? You can modify the Erlang source code in your output/build/erlang directory and then do a "make erlang-rebuild all". Also, is your simple example and Buildroot tree somewhere public?
Thank you for looking into this, Frank. I'm relieved that Uint64 compiles to "unsigned long long" on your 64-bit host. > Can you check that sizeof(Uint64) is 8 in both your NIF and in Erlang when they're compiled? I ran this test on our device. Eshell V10.0 (abort with ^G) 1> byte_size(binary:encode_unsigned(16#ffffffffffffffff)). 8 I also instrumented erl_nif.c as follows and rebuilt Erlang. int enif_inspect_binary(ErlNifEnv* env, Eterm bin_term, ErlNifBinary* bin) { printf("enif_inspect_binary: size of Uint64 is %u\n", sizeof(Uint64)); ... } And here is what I got when I opened a bitcask store on our device. 1> Handle = bitcask:open("some_db", [read_write]). enif_inspect_binary: size of Uint64 is 8 enif_inspect_binary: size of Uint64 is 8 #Ref<0.1026228257.3490185218.216151> So both Erlang and erl_nif.c agree that the size of a Uint64 is 8 bytes. Something odd I noticed is that instrumenting the suspect enif_get_uint64 resulted in no debug output when calling the bitcask:put function that is throwing. #if HAVE_INT64 && SIZEOF_LONG != 8 int enif_get_int64(ErlNifEnv* env, ERL_NIF_TERM term, ErlNifSInt64* ip) { return term_to_Sint64(term, ip); } int enif_get_uint64(ErlNifEnv* env, ERL_NIF_TERM term, ErlNifUInt64* ip) { printf("enif_get_uint64: sizeof Uint64 is %u\n", sizeof(Uint64)); /* return term_to_Uint64(term, ip); FIXME */ return 1; } #endif /* HAVE_INT64 && SIZEOF_LONG != 8 */ It's as if enif_get_uint64 is never being executed. I thought maybe the bad arg exception was suppressing the printf so I modified enif_get_uint64 to always return 1. Erlang shell still reports bad arg exception at line 181. > Also, is your simple example and Buildroot tree somewhere public? The Buildroot tree for our device is currently still in a private GitHub repo but I could create a similar Buildroot tree for a BeagleBone Black and publish that for you to clone. Our device is built on TI's Sitara AM5728 processor so it is very similar to a BeagleBoard x15.
> Also, is your simple example and Buildroot tree somewhere public? Here it is. https://github.com/fvasquez/buildroot See the following commit for my rebar3-related modifications. https://github.com/fvasquez/buildroot/commit/6411ce9d06a1880e082674231a1902f364fc4da8 Make sure to checkout the bbb-bitcask branch before building. This branch is based off of the Buildroot 2018.02.01 tag since that's what we're running on our device. $ git checkout bbb-bitcask $ make bbb-bitcask_defconfig $ make I booted the resulting image on a BeagleBone Black and verified that my bug still repros. # erl Eshell V9.0 (abort with ^G) 1> Handle = bitcask:open("some_db", [read_write]). #Ref<0.3081199234.2883585.143186> 2> N = 1. 1 3> bitcask:put(Handle, <<"some_key">>, term_to_binary(N)). ** exception error: bad argument in function bitcask_nifs:keydir_get_int/3 called as bitcask_nifs:keydir_get_int(#Ref<0.3081199234.3014657.143177>, <<"some_key">>, 18446744073709551615) in call from bitcask_nifs:keydir_get/3 (/home/frank/buildroot/output/build/erlang-bitcask-0.1.0/_build/default/lib/bitcask/src/bitcask_nifs.erl, line 181) in call from bitcask:do_put/5 (/home/frank/buildroot/output/build/erlang-bitcask-0.1.0/_build/default/lib/bitcask/src/bitcask.erl, line 1760) in call from bitcask:put/3 (/home/frank/buildroot/output/build/erlang-bitcask-0.1.0/_build/default/lib/bitcask/src/bitcask.erl, line 298) Make sure to delete the some_db directory or rename your bitcask store before re-running bitcask:open.
We have moved on from Bitcask to a more robust key-value store. Marking issue as resolved.