Getting SIG_SEGV in xtrans_decode_block (libraw_fuji_compressed.cpp)

Hi Alex,

on my libraw port on android, I'm getting a SIGSEGV (signal 11 (SIGSEGV), code 2 (SEGV_ACCERR)) in the xtrans_decode_block function when trying to call "libraw_unpack" on a raf file reported by an user (X-T20)(see link below). What I tried is to check at which point exactly it crashes (it's a little tricky under android), and found out that it fails somewhere in the last loop (logged the state at line 712, the last output before it fails is:
xtrans_decode_block 22 15 512 (22 = g_even_pos, 15 = g_odd_pos, 512 = line_width)

What's strange is that I couldn't reproduce the crash: When trying to use the provided windows binaries, the file is getting identified correctly. RawDigger also opens the file perfectly. Can you point me into any direction what I'm may doing wrong or need to adjust so the segment fault gets fixed?

I'm using a copy of the current git repository.

Link to raf file: https://www.dropbox.com/s/5qgn5arqyaphu37/DSCF3128.RAF?dl=0

Best Regards
Torsten

Forums: 

Run the file through valgrind

Run the file through valgrind (under FreeBSD), do not see any fuji-decoder problems (buffer overrun, etc), so no idea.

BTW, what exact version do you use (in my working version line 712 is { bracket after while operator)?

-- Alex Tutubalin @LibRaw LLC

Hi Alex,

Hi Alex,

yes, it's the line, I put it behind the bracket:

{__android_log_print(ANDROID_LOG_INFO,"libraw","decode_block %d %d %d",g_even_pos,g_odd_pos,line_width);
...

This is the output I see:

10-23 17:45:33.390 25527-25749/com.tssystems.photomate3 I/libraw: libraw_unpack /storage/6CE4-CA17/raw/raf/DSCF3128.RAF
10-23 17:45:33.391 25527-25749/com.tssystems.photomate3 I/libraw: decode_block 0 1 512
10-23 17:45:33.391 25527-25749/com.tssystems.photomate3 I/libraw: decode_block 2 1 512
10-23 17:45:33.391 25527-25749/com.tssystems.photomate3 I/libraw: decode_block 4 1 512
10-23 17:45:33.391 25527-25749/com.tssystems.photomate3 I/libraw: decode_block 6 1 512
10-23 17:45:33.391 25527-25749/com.tssystems.photomate3 I/libraw: decode_block 8 1 512
10-23 17:45:33.391 25527-25749/com.tssystems.photomate3 I/libraw: decode_block 10 3 512
10-23 17:45:33.391 25527-25749/com.tssystems.photomate3 I/libraw: decode_block 12 5 512
10-23 17:45:33.391 25527-25749/com.tssystems.photomate3 I/libraw: decode_block 14 7 512
10-23 17:45:33.391 25527-25749/com.tssystems.photomate3 I/libraw: decode_block 16 9 512
10-23 17:45:33.391 25527-25749/com.tssystems.photomate3 I/libraw: decode_block 18 11 512
10-23 17:45:33.391 25527-25749/com.tssystems.photomate3 I/libraw: decode_block 20 13 512
10-23 17:45:33.391 25527-25749/com.tssystems.photomate3 I/libraw: decode_block 22 15 512
10-23 17:45:33.638 25773-25773/? A/DEBUG:     #00 pc 00000000000e3c8c  /data/app/com.tssystems.photomate3-2/lib/arm64/liblibraw.so
10-23 17:45:33.638 25773-25773/? A/DEBUG:     #01 pc 00000000000e31ec  /data/app/com.tssystems.photomate3-2/lib/arm64/liblibraw.so (_ZN6LibRaw19xtrans_decode_blockEP21fuji_compressed_blockPK22fuji_compressed_paramsi+2344)
10-23 17:45:33.638 25773-25773/? A/DEBUG:     #02 pc 00000000000e4a30  /data/app/com.tssystems.photomate3-2/lib/arm64/liblibraw.so (_ZN6LibRaw17fuji_decode_stripEPK22fuji_compressed_paramsixj+284)
10-23 17:45:33.639 25773-25773/? A/DEBUG:     #03 pc 00000000000e515c  /data/app/com.tssystems.photomate3-2/lib/arm64/liblibraw.so (_ZN6LibRaw16fuji_decode_loopEPK22fuji_compressed_paramsiPxPj+120)
10-23 17:45:33.639 25773-25773/? A/DEBUG:     #04 pc 00000000000e5040  /data/app/com.tssystems.photomate3-2/lib/arm64/liblibraw.so (_ZN6LibRaw24fuji_compressed_load_rawEv+748)
10-23 17:45:33.639 25773-25773/? A/DEBUG:     #05 pc 00000000000efa84  /data/app/com.tssystems.photomate3-2/lib/arm64/liblibraw.so (_ZN6LibRaw6unpackEv+1944)
10-23 17:45:33.639 25773-25773/? A/DEBUG:     #06 pc 0000000000101b18  /data/app/com.tssystems.photomate3-2/lib/arm64/liblibraw.so (libraw_unpack+64)

I'm also a little irritated. I basically changed nothing and used the latest git (just from today). The only thing I changed for Android is to manually implement the "swab" function, but the implementation I use should be the same as the original from c. I could also see no calls on this function when the xtrans decoding is going on.

Regards
Torsten

Is there any way to get exact

Is there any way to get exact code point that fails with exception? maybe coredump or so?

-- Alex Tutubalin @LibRaw LLC

Hi Alex,

Hi Alex,

My experience is very mixed with debugging c code on Android. However I’ll try to find a way to generate a dump or something similar to find out where it fails and will report back later.

Thanks
Torsten

Hi Alex,

Hi Alex,

so after quite a lot of coffee, I found some kind of translation method to translate the stacktrace to real code lines. Here's the result:

********** Crash dump: **********
signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0x7246192bb8
 
Routine fuji_decode_sample_odd(fuji_compressed_block*, fuji_compressed_params const*, unsigned short*, int, int_pair*) at libraw_fuji_compressed.cpp:490
 
Routine LibRaw::xtrans_decode_block(fuji_compressed_block*, fuji_compressed_params const*, int) at  libraw_fuji_compressed.cpp:727
 
Routine LibRaw::fuji_decode_strip(fuji_compressed_params const*, int, long long, unsigned int) at libraw_fuji_compressed.cpp:919
 
Routine LibRaw::fuji_decode_loop(fuji_compressed_params const*, int, long long*, unsigned int*) at libraw_fuji_compressed.cpp:999
 
Routine LibRaw::fuji_compressed_load_raw() at libraw_fuji_compressed.cpp:983
 
Routine LibRaw::unpack() at libraw_cxx.cpp:2811 (discriminator 3)
 
Routine libraw_unpack at libraw_c_api.cpp:130
 
Routine Java_com_tssystems_Libraw_open at libraw.c:51

For me it seems like the access of the grads array via the gradient variable runs further than the array, but I'm not sure. What do you think? Does any output of the variables may help?

Just a note: The value of the gradient variable before it fails is 2272 which seems way higher than all others (they are around 250). Maybe there's something going wrong here?

Thanks
Torsten

Both 2272 and 250 are too

Both 2272 and 250 are too high values for gradient variable b/c table size is only 41.

gradient is calculated by:

#define fuji_quant_gradient(i, v1, v2) (9 * i->q_table[i->q_point[4] + (v1)] + i->q_table[i->q_point[4] + (v2)])
(at line 402:   grad = fuji_quant_gradient(params, Rb - Rf, Rc - Rb); )

q_table contains values from -4 to +4, so maximum gradient value is 40.

Could you please try to dump q_table values? And, also, values used to calc gradient: Rb - Rf, Rc - Rb

-- Alex Tutubalin @LibRaw LLC

Hi,

Hi,

here are some dumps:

Ra 13272 Rb 12193 Rc 16359 Rd 16232 Rg 14169
 
q_table dump (First number in each row is the array index):
[all 252 before]
11-05 14:23:04.740 16080 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252 252
11-05 14:23:04.740 16096 252 252 252 252 252 252 252 252 252 252 252 252 253 253 253 253
[all 253 inbetween]
11-05 14:23:04.740 16288 253 253 253 253 253 253 253 253 253 253 253 253 253 253 253 253
11-05 14:23:04.740 16304 253 253 253 253 253 253 253 253 253 253 253 253 253 254 254 254
11-05 14:23:04.740 16320 254 254 254 254 254 254 254 254 254 254 254 254 254 254 254 254
11-05 14:23:04.740 16336 254 254 254 254 254 254 254 254 254 254 254 254 254 254 254 254
11-05 14:23:04.740 16384 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
11-05 14:23:04.740 16448 2 2 3 3 3 3 3 3 3 3 3 3 3 3 3 3
11-05 14:23:04.740 16512 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
11-05 14:23:04.740 16560 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
11-05 14:23:04.740 16576 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
11-05 14:23:04.740 16592 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
11-05 14:23:04.740 16608 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
11-05 14:23:04.740 16624 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
11-05 14:23:04.740 16720 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[all 4 afterwards]

// Edit: The really strange thing is that the Ra... Rg values change each time I run it and the crash finally ocurse. For instance, now I get

Ra 16383 Rb 0 Rc 0 Rd 0 Rg 0
 
or
 
Ra 0 Rb 0 Rc 0 Rd 0 Rg 849

before it will crash. Seems there is something really going wrong with the values?

For some other pretty strange reason it almost seems like something is running parallel, because I can't see the full var dump of the q_table in onces. But this could also be a transfering issue of the log in android since the buffer speed is somewhat limited? I don't have openmp enabled.

it looks like I know the

it looks like I know the answer:
q_table is defined as:
char *q_table; /* quantization table */
So, signed char, from -127 to 128.

It is initialized with ..-4...+4 values at lines 110-130.

Is there any chance that your char type is by default unsigned char?

-- Alex Tutubalin @LibRaw LLC

Amazing, yes you are

Amazing, yes you are completely right, that's the reason it was always failing. In gcc, I added
LOCAL_CFLAGS += -fsigned-char
to the make file and it works perfectly and decodes like a charm! As I read, the default of a char is not really defined in c and depends on the architecture, so this error makes perfectly sens. But may there would be a way to explicitly init is as a signed char directly in code so that these things will not happen to others?

Thanks again!

Yes, default char type

Yes, default char type (signed/unsigned) is undefined (although it is signed in all our test/production environments).

We'll change char definition to signed char (or, maybe, to int8_t, will see)

Thank you for reporting the bug.

-- Alex Tutubalin @LibRaw LLC