ESP32 cam JPEG encoding on the fly

ESP32 cam JPEG encoding on the fly cover

Many makers start their journey with the ESP32 camera by flashing the CameraWebServer example. This sketch allows you to access the camera feed in your browser.

Then you learnt that you can strip away all the controls GUI and just stream the video in the ESP32 cam Quickstart tutorial.

This tutorial is an advanced modification of that tutorial: I will you you how to process the raw pixels from the camera before streaming them over HTTP to your browser.

Hardware/Software requirements

To follow this tutorial you will need an ESP32 cam with lots of RAM (8 or 16 Mbit) and EloquentEsp32cam library version >= 2.7.7.

Arduino IDE Tools configuration for ESP32S3

!!! Install ESP32 core version 2.x, version 3.x won't work !!!

Board ESP32S3 Dev Module
Upload Speed 921600
USB Mode Hardware CDC and JTAG
USB CDC On Boot Disabled
USB Firmware MSC On Boot Disabled
USB DFU On Boot Disabled
Upload Mode UART0 / Hardware CDC
CPU Frequency 240MHz (WiFi)
Flash Mode QIO 80MHz
Flash Size 4MB (32Mb)
Partition Scheme Huge APP (3MB No OTA/1MB SPIFFS)
Core Debug Level Info
PSRAM OPI PSRAM
Arduino Runs On Core 1
Events Run On Core 1
Erase All Flash Before Sketch Upload Disabled
JTAG Adapter Disabled

End result

I'll show you what we're going to implement. The video below shows the camera video feed where the image is "negative". This is achieved entirely via pixel manipulation, not by applying the built-in filter of the camera sensor!

JPEG encoding on the fly
Arduino sketch

This is the sketch that implements the end result.

See source

Filename: Encode_Frame_on_the_Fly.ino

/**
 * Alter camera pixels before sending them via MJPEG stream
 * (requires enough RAM to run)
 * (expect 0.5 - 2 FPS)
 *
 * BE SURE TO SET "TOOLS > CORE DEBUG LEVEL = INFO"
 * to turn on debug messages
 */
#define WIFI_SSID "SSID"
#define WIFI_PASS "PASSWORD"
#define HOSTNAME  "esp32cam"

#include <eloquent_esp32cam.h>
#include <eloquent_esp32cam/viz/mjpeg.h>

using namespace eloq;
using namespace eloq::viz;

uint16_t jpeg_length = 0;
size_t tick = 0;


// prototype of the function that will
// re-encode the frame on-the-fly
void reencode_frame(WiFiClient *client, camera_fb_t* frame);

// prototype of the functon that will
// put JPEG-encoded data back into the frame
size_t buffer_jpeg(void * arg, size_t index, const void* data, size_t len);


/**
 *
 */
void setup() {
    delay(3000);
    Serial.begin(115200);
    Serial.println("__RE-ENCODE MJPEG STREAM__");

    // camera settings
    // replace with your own model!
    camera.pinout.aithinker();
    camera.brownout.disable();
    // higher resolution cannot be handled
    camera.resolution.qvga();
    camera.quality.best();

    // since we want to access the raw pixels
    // capture in RGB565 format
    // keep in mind that you need a lot of RAM to store
    // all this data at high resolutions
    // (e.g. QVGA = 320 x 240 x 2 = 1536 kB)
    camera.pixformat.rgb565();

    // MJPEG settings
    mjpeg.onFrame(&reencode_frame);

    // init camera
    while (!camera.begin().isOk())
        Serial.println(camera.exception.toString());

    // connect to WiFi
    while (!wifi.connect().isOk())
        Serial.println(wifi.exception.toString());

    // start mjpeg http server
    while (!mjpeg.begin().isOk())
        Serial.println(mjpeg.exception.toString());

    // assert camera can capture frames
    while (!camera.capture().isOk())
        Serial.println(camera.exception.toString());

    Serial.println("Camera OK");
    Serial.println("ToF OK");
    Serial.println("WiFi OK");
    Serial.println("MjpegStream OK");
    Serial.println(mjpeg.address());
}

/**
 *
 */
void loop() {
    // nothing to do here, MJPEG server runs in background
}


/**
 * Apply your custom processing to pixels
 * then encode to JPEG.
 * You will need to modify this
 */
void reencode_frame(WiFiClient *client, camera_fb_t* frame) {
    // log how much time elapsed from last frame
    const size_t now = millis();
    const uint16_t height = camera.resolution.getHeight();
    const uint16_t width = camera.resolution.getWidth();

    ESP_LOGI("benchmark", "%d ms elapsed from last frame", now - tick);
    tick = now;

    // frame->buf contains RGB565 data
    // that is, 2 bytes per pixel
    //
    // in this test, we're going to do a "negative" effect
    // feel free to replace this with your own code
    for (uint16_t y = 0; y < height; y++) {
        uint16_t *row = (uint16_t*) (frame->buf + width * 2 * y);

        for (uint16_t x = 0; x < width; x++) {
            // read pixel and parse to R, G, B components
            const uint16_t pixel = row[x];
            uint16_t r = (pixel >> 11) & 0b11111;
            uint16_t g = (pixel >> 5) & 0b111111;
            uint16_t b = pixel & 0b11111;

            // actual work: make negative
            r = 31 - r;
            g = 63 - g;
            b = 31 - b;

            // re-pack to RGB565
            row[x] = (r << 11) | (g << 5) | b;
        }
    }

    // encode to jpeg
    uint8_t quality = 90;

    frame2jpg_cb(frame, quality, &buffer_jpeg, NULL);
    ESP_LOGI("var_dump", "JPEG size=%d", jpeg_length);
}


/**
 * Put JPEG-encoded data back into the original frame
 * (you don't have to modify this)
 */
size_t buffer_jpeg(void *arg, size_t index, const void* data, size_t len) {
    if (index == 0) {
        // first MCU block => reset jpeg length
        jpeg_length = 0;
    }

    if (len == 0) {
        // encoding is done
        camera.frame->len = jpeg_length;
        return 0;
    }

    jpeg_length += len;

    // override input data
    memcpy(camera.frame->buf + index, (uint8_t*) data, len);

    return len;
}
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158

I'm going to explain each block in detail.

setup()

The setup function configures all the components of the sketch: the camera, the mjpeg HTTP server and the wifi. Refer to the ESP32 cam Quickstart tutorial for more details.

loop()

This is empty, since all the streaming logic is handled in a background task.

reencode_frame()

This function gets called after a client connected to see the stream, each time a new frame is ready to be sent to the browser. You can hook into this function to alter what will be sent to the user and replace the original frame with your own.

In our demo, we're decode the RGB565 pixels, negate each component and re-package back into RGB565. The line 

frame2jpg_cb(frame, quality, &buffer_jpeg, NULL);
1

encodes the RGB565 data into JPEG.

buffer_jpeg()

This function is called by the frame2jpg_cb encoding routine with chunks of JPEG encoded data. We're simply copying the produced data back into the camera frame buffer to override what will be sent to the user.

Speed

Speed is low. On my Freenove S3 camera it takes 300-400 ms to modify the pixels and encode them to JPEG. Add to that the WiFi lag and you can expect 1-2 FPS as a realistic estimate.

If you can, you should stream the original jpeg data as-is and save a lot of strain to the CPU. Use this code only is strictly necessary.

Become an ESP32-CAM EXPERT

Subscribe to my newsletter

Join 1181 businesses and hobbysts skyrocketing their Arduino + ESP32 skills twice a month