Thanks for the reply.
So I hope, there is actually scope for further optimization.

The problem is, a 2MP raw image is taking more than 2 seconds for processing, which is undesirable considering the requirement. (I am using HummingBoard with iMX6Q uSoM - by solidrun, running ubuntu 12.04 )