Hi there.
As part of initial Integration Testing LibRaw 0.22 with my application, I came across an issue whereby this new version unpacks (before Demosaicing) FujiFilm GFX-100 II images almost 10 X slower than version 0.21.5. I've benchmarked other Camera formats and not seen such a difference. Reviewing the Release Notes I see the GFX100-II is now officially supported. I notice a very subtle difference between the .22 and .2.5 outputs, but clearly there is nothing to explain such a huge difference in performance.
I'll triple check my compilation setting and eventually go through the Libraw Repo and Diff the relevant FujiFilm source files, but the question to ask at this stage is a reason why I see such a disparity in performance between versions, specifically for GFX100-II images? Did the RAF file decoder change radically?
Any help would be appreciated.
Regards,
Sean.
| Attachment | Size |
|---|---|
| 226.09 KB |

Quick test (only unpack) with
Quick test (only unpack) with LibRaw compiled via make -f Makefile.dist (so no openmp), three files with different encoding:
0.22.0:
0.21.5:
CPU: Intel(R) Atom(TM) CPU C3758 @ 2.20GHz (2200.21-MHz K8-class CPU)
Storage: fast SSD (nvme)
So, please provide test file you use for benchmarking for more in-depth study.
-- Alex Tutubalin @LibRaw LLC
Hi Alex.
Hi Alex.
Thanks for looking into this.
Mia Culpa! Your response prompted me to look into this again so I checked my Libraw 0.22 compile flags and found the Libraw VS project did not have the /OPENMP flag set for Debug mode. This accounted for the big performance difference (on a 16 core machine).
Very sorry to have you chase a ghost.
I'll be more thorough before posting next time.
Regards,
Sean.
For the files you provide and
For the files you provide and openmp enabled the difference is neglible:
Also, diff in src/decoders/fuji_compressed.cpp is very small (it changes error handling in openmp case):
diff --git a/src/decoders/fuji_compressed.cpp b/src/decoders/fuji_compressed.cpp index acea0825..40d92d78 100644 --- a/src/decoders/fuji_compressed.cpp +++ b/src/decoders/fuji_compressed.cpp @@ -229,9 +229,9 @@ static inline void fuji_fill_buffer(fuji_compressed_block *info) { if (info->cur_pos >= info->cur_buf_size) { + bool needthrow = false; info->cur_pos = 0; info->cur_buf_offset += info->cur_buf_size; - bool needthrow = false; #ifdef LIBRAW_USE_OPENMP #pragma omp critical #endif @@ -1155,14 +1155,16 @@ void LibRaw::fuji_decode_loop(fuji_compressed_params *common_info, int count, IN const int lineStep = (libraw_internal_data.unpacker_data.fuji_total_lines + 0xF) & ~0xF; #ifdef LIBRAW_USE_OPENMP unsigned errcnt = 0; -#pragma omp parallel for private(cur_block) +#pragma omp parallel for private(cur_block) shared(errcnt) #endif for (cur_block = 0; cur_block < count; cur_block++) { - try{ + try + { fuji_decode_strip(common_info, cur_block, raw_block_offsets[cur_block], block_sizes[cur_block], q_bases ? q_bases + cur_block * lineStep : 0); - } catch (...) + } + catch (...) { #ifdef LIBRAW_USE_OPENMP #pragma omp atomicIn fact, errcnt variable is declared openmp-shared (it is atomically changed if error catched, so the difference should be neglible).
I can only recommend performing detailed profiling of both versions and comparing where exactly you're experiencing performance degradation at the individual operator level.
Since I don't see any performance differences on our end, there's nothing to look for there.
-- Alex Tutubalin @LibRaw LLC
Followup:
Followup:
compiled with clang 19.1.7
Compilation flags: -O3 -fopenmp
-- Alex Tutubalin @LibRaw LLC