ESP32 cam JPEG encoding on the fly
JPEG encoding on the fly
Many makers start their journey with the ESP32 camera by flashing the CameraWebServer example. This sketch allows you to access the camera feed in your browser.
Then you learnt that you can strip away all the controls GUI and just stream the video in the ESP32 cam Quickstart tutorial.
This tutorial is an advanced modification of that tutorial: I will you you how to process the raw pixels from the camera before streaming them over HTTP to your browser.
Hardware/Software requirements
To follow this tutorial you will need an ESP32 cam with lots of RAM (8 or 16 Mbit) and EloquentEsp32cam library version >= 2.7.7.
Arduino IDE Tools configuration for ESP32S3
!!! Install ESP32 core version 2.x, version 3.x won't work !!!
Board
ESP32S3 Dev Module
Upload Speed
921600
USB Mode
Hardware CDC and JTAG
USB CDC On Boot
Disabled
USB Firmware MSC On Boot
Disabled
USB DFU On Boot
Disabled
Upload Mode
UART0 / Hardware CDC
CPU Frequency
240MHz (WiFi)
Flash Mode
QIO 80MHz
Flash Size
4MB (32Mb)
Partition Scheme
Huge APP (3MB No OTA/1MB SPIFFS)
Core Debug Level
Info
PSRAM
OPI PSRAM
Arduino Runs On
Core 1
Events Run On
Core 1
Erase All Flash Before Sketch Upload
Disabled
JTAG Adapter
Disabled
End result
I'll show you what we're going to implement. The video below shows the camera video feed where the image is "negative". This is achieved entirely via pixel manipulation, not by applying the built-in filter of the camera sensor!
/**
* Alter camera pixels before sending them via MJPEG stream
* (requires enough RAM to run)
* (expect 0.5 - 2 FPS)
*
* BE SURE TO SET "TOOLS > CORE DEBUG LEVEL = INFO"
* to turn on debug messages
*/#define WIFI_SSID "SSID"#define WIFI_PASS "PASSWORD"#define HOSTNAME "esp32cam"#include<eloquent_esp32cam.h>#include<eloquent_esp32cam/viz/mjpeg.h>usingnamespace eloq;
usingnamespace eloq::viz;
uint16_t jpeg_length = 0;
size_t tick = 0;
// prototype of the function that will// re-encode the frame on-the-flyvoidreencode_frame(WiFiClient *client, camera_fb_t* frame);
// prototype of the functon that will// put JPEG-encoded data back into the framesize_tbuffer_jpeg(void * arg, size_t index, constvoid* data, size_t len);
/**
*
*/voidsetup(){
delay(3000);
Serial.begin(115200);
Serial.println("__RE-ENCODE MJPEG STREAM__");
// camera settings// replace with your own model!
camera.pinout.aithinker();
camera.brownout.disable();
// higher resolution cannot be handled
camera.resolution.qvga();
camera.quality.best();
// since we want to access the raw pixels// capture in RGB565 format// keep in mind that you need a lot of RAM to store// all this data at high resolutions// (e.g. QVGA = 320 x 240 x 2 = 1536 kB)
camera.pixformat.rgb565();
// MJPEG settings
mjpeg.onFrame(&reencode_frame);
// init camerawhile (!camera.begin().isOk())
Serial.println(camera.exception.toString());
// connect to WiFiwhile (!wifi.connect().isOk())
Serial.println(wifi.exception.toString());
// start mjpeg http serverwhile (!mjpeg.begin().isOk())
Serial.println(mjpeg.exception.toString());
// assert camera can capture frameswhile (!camera.capture().isOk())
Serial.println(camera.exception.toString());
Serial.println("Camera OK");
Serial.println("ToF OK");
Serial.println("WiFi OK");
Serial.println("MjpegStream OK");
Serial.println(mjpeg.address());
}
/**
*
*/voidloop(){
// nothing to do here, MJPEG server runs in background
}
/**
* Apply your custom processing to pixels
* then encode to JPEG.
* You will need to modify this
*/voidreencode_frame(WiFiClient *client, camera_fb_t* frame){
// log how much time elapsed from last frameconstsize_t now = millis();
constuint16_t height = camera.resolution.getHeight();
constuint16_t width = camera.resolution.getWidth();
ESP_LOGI("benchmark", "%d ms elapsed from last frame", now - tick);
tick = now;
// frame->buf contains RGB565 data// that is, 2 bytes per pixel//// in this test, we're going to do a "negative" effect// feel free to replace this with your own codefor (uint16_t y = 0; y < height; y++) {
uint16_t *row = (uint16_t*) (frame->buf + width * 2 * y);
for (uint16_t x = 0; x < width; x++) {
// read pixel and parse to R, G, B componentsconstuint16_t pixel = row[x];
uint16_t r = (pixel >> 11) & 0b11111;
uint16_t g = (pixel >> 5) & 0b111111;
uint16_t b = pixel & 0b11111;
// actual work: make negative
r = 31 - r;
g = 63 - g;
b = 31 - b;
// re-pack to RGB565
row[x] = (r << 11) | (g << 5) | b;
}
}
// encode to jpeguint8_t quality = 90;
frame2jpg_cb(frame, quality, &buffer_jpeg, NULL);
ESP_LOGI("var_dump", "JPEG size=%d", jpeg_length);
}
/**
* Put JPEG-encoded data back into the original frame
* (you don't have to modify this)
*/size_tbuffer_jpeg(void *arg, size_t index, constvoid* data, size_t len){
if (index == 0) {
// first MCU block => reset jpeg length
jpeg_length = 0;
}
if (len == 0) {
// encoding is done
camera.frame->len = jpeg_length;
return0;
}
jpeg_length += len;
// override input datamemcpy(camera.frame->buf + index, (uint8_t*) data, len);
return len;
}
The setup function configures all the components of the sketch: the camera, the mjpeg HTTP server and the wifi. Refer to the ESP32 cam Quickstart tutorial for more details.
loop()
This is empty, since all the streaming logic is handled in a background task.
reencode_frame()
This function gets called after a client connected to see the stream, each time a new frame is ready to be sent to the browser. You can hook into this function to alter what will be sent to the user and replace the original frame with your own.
In our demo, we're decode the RGB565 pixels, negate each component and re-package back into RGB565. The line
frame2jpg_cb(frame, quality, &buffer_jpeg, NULL);
1
encodes the RGB565 data into JPEG.
buffer_jpeg()
This function is called by the frame2jpg_cb encoding routine with chunks of JPEG encoded data. We're simply copying the produced data back into the camera frame buffer to override what will be sent to the user.
Speed
Speed is low. On my Freenove S3 camera it takes 300-400 ms to modify the pixels and encode them to JPEG. Add to that the WiFi lag and you can expect 1-2 FPS as a realistic estimate.
If you can, you should stream the original jpeg data as-is and save a lot of strain to the CPU. Use this code only is strictly necessary.