The Beginner-Friendly Guide To Esp32 Camera Object Detection (FOMO)
The world-best guide to run Edge Impulse FOMO Object Detection on the Esp32 Camera with little effort. Even if you're a beginner

Introduction
Object detection is the task of detecting an object of interest inside an image. Until a couple years ago, this task was a matter of powerful computers due to the complexity of models and the prohibitive number of math operations to perform.
Thanks to platforms like Edge Impulse, however, the entry barrier for beginners has become much lower and it is now possible to:
- easily train an object detection model in the cloud
- (not so easily) deploy this model to the Esp32 camera
Sadly, the Esp32 camera seems not to be a first-class citizen on the Edge Impulse platform and the (few) documentation and tutorials available online miss a lot of fundamental pieces.
The purpose of this post is to help you develop and deploy your very own object detection model to your Esp32 camera with detailed, easy to follow steps, even if you're a beginner with Edge Impulse or Arduino programming.
Let's start!
Setting up the Environment
To follow this tutorial, you will need the following hardware / software:
- an Esp32 camera. The model doesn't matter, but it needs at least 4 Mb of PSRAM. (AiThinker, M5Stack, EspEye all work fine)
- a free account on Edge Impulse
- the EloquentEsp32Cam library version 1.1.1 (install from the Arduino IDE Library Manager)
- ~30 minutes of your time
If you've never used Edge Impulse, I suggest you watch a couple video tutorials online before we start, otherwise you may get lost later on (there exists an official tutorial playlist on YouTube).
If you never used the EloquentEsp32Cam library, it's a library I wrote to make it painfully easy to get the most out of the Esp32 camera boards with very little effort. This very post is an extract of the eBook I've been writing around this library. You can check the eBook dedicated page here.

Object Detection using ESP32 Camera
To perform object detection on our Esp32 camera board, we will follow these steps:
- Use EloquentEsp32Cam to collect images
- Use Edge Impulse to label the images
- Use Edge Impulse to train the model
- Use Edge Impulse to export the model into an Arduino library
- Use EloquentEsp32Cam to run the model
Before this post existed, steps 1. and 5. were not as easy as they should've been.
I invite you to stop reading this post for a minute and go search on Google Esp32-cam object detection
:
read the first 4-5 tutorials on the list and tell me if you're able to deploy such a project.
I was not.
And I bet you neither.
Enough words, it's time to start tinkering.

1. Collect images from the Esp32 camera
This is the first part where things are weird in the other tutorials. Many of them suggest that you use images from Google to train your ML model. Some others suggests to collect data using your smartphone.
How using random images from the web or from a 40 MP camera can help train a good model that will run on a 2$ camera hardware is out of my knowledge.
That said, I believe that to get good results, we need to collect images from our own Esp32 camera board.
This used to be hard, but fear no more. I promise you will complete this task in a matter of minutes, thanks to the tools I created for you.
1.A Upload the Image Collection Sketch to your Board
You need to install the EloquentEsp32Cam
library from the Library Manager. It is mandatory that you install version 1.1.1.
(later versions may work, but is not guaranteed)

After you installed the EloquentEsp32Cam library, navigate to File > Examples > EloquentEsp32Cam > 26_Collect_Images
and upload the sketch to your board.
(yes, this is the 26th example of the aforementioned eBook I wrote. There are a lot of other interesting examples to checkout)
In case you want to know what's inside, here's the code.
// 26_Collect_Images.ino
#define MAX_RESOLUTION_XGA 1
/**
* Run a development HTTP server to capture images
* for TinyML tasks
*/
#include "esp32cam.h"
#include "esp32cam/http/FomoImageCollectionServer.h"
using namespace Eloquent::Esp32cam;
Cam cam;
Http::FOMO::CollectImagesServer http(cam);
void setup() {
Serial.begin(115200);
delay(3000);
Serial.println("Init");
/**
* Replace with your camera model.
* Available: aithinker, m5, m5wide, wrover, eye, ttgoLCD
*/
cam.aithinker();
cam.highQuality();
cam.highestSaturation();
cam.xga();
while (!cam.begin())
Serial.println(cam.getErrorMessage());
// replace with your SSID and PASSWORD
while (!cam.connect("SSID", "PASSWORD"))
Serial.println(cam.getErrorMessage());
while (!http.begin())
Serial.println(http.getErrorMessage());
Serial.println(http.getWelcomeMessage());
cam.mDNS("esp32cam");
}
void loop() {
http.handle();
}
Don't forget to replace SSID and PASSWORD with your WiFi credentials!
Once the upload is done, open the Serial Monitor and take note of the IP address of the camera.
1.B Create your Shooting Setup
Since this is our first project on Esp32 camera object detection, I want you to succeed and get good results easy. Even though this could be not strictly necessary, I suggest you create a "fixed shooting setup": either fix your board to a table with some tape, or use a cheap camera mounting. Then, point it to a flat, monochrome surface (a wall, a closet...): the TinyML model will learn better the objects features.
Also try to keep a good illumination in the room: Esp32 camera performs poorly in low light conditions.
For each object you want to recognize (start with 1 or 2 the first time), you will put that object in front of the camera and, while the capturing is going on, you will move it a bit, rotate...
This ensure the model will be able to learn to detect the objects in different positions and orientations.
1.C Collect Images from the Browser
Here's where things start to become exciting. You will collect the images for the ML model from the browser. No additional software is required.
Pro tip: if your router supports mDNS (most do), you can enter `esp32cam.local` as address instead of the IP address. This way you don't have to open the Serial Monitor every time you power up the board (IP address may change, mDNS name stays the same).
Open a new tab in the browser and enter the IP address of your Esp32 camera (or navigate to http://esp32cam.local if you used mDNS). You can preview the real-time video stream from the camera. Use this preview to setup the correct position and orientation.
When you're ready to start, click Start collecting
and the camera frames will start appearing below on the page.
(continue reading before doing anything).
You may experience a bit of lag: don't worry, just keep collecting.

Images will be 96x96 because that's a recommended size for Edge Impulse image models. We're going to actually use 48x48 images during training since 96x96 is too large to fit into our Esp32-cam board, but it's better to have a larger resolution available if we decide to move to a more capable board later on.
Now follow these EXACT steps (they guarantee you won't have troubles later on):
- put nothing in front of the camera. Collect 15-20 images. Pause, download and clear
- put the first object in front of the camera. Collect 30-40 images while moving the object around. Pause, downalod and clear
- repeat 2. for each object you have
After you finish, you will have one zip of images for each object + one for "no object / background".
Extract the zips and move to the next step.
2. Use Edge Impulse to Label the Images
For object detection to work, we need to label the objects we want to recognize.
There are a few tools online, but Edge Impulse has one integrated that works good enough for our project. Register a free account on edgeimpulse.com if you don't have one already and create a new project.
Create a new project, name it something like esp32-cam-object-detection
, then choose Images > Classify multiple objects > Import existing data
.
Now follow these EXACT steps to speed up the labelling:
- click
Select files
and select all the images in the "no object / background" folder; checkAutomatically split between training and testing
; then hitBegin upload
- click
Labelling queue
in the bar on top - always click
Save labels
without doing anything! (since there's no object in these images, we'll use them for background modelling) - once done for all the images, go back to
Upload data
in the bar on top - click
Select files
and select all the images in the first object folder. Only upload the images of a single folder at a time!!. CheckAutomatically split between training and testing
; then hitBegin upload
- go to
Labelling queue
in the top bar and draw the box around the object you want to recognize. On the right, make sureLabel suggestions: Track objects between frames
is selected - label all the images. Make sure to fix the bounding box to fit the object while leaving a few pixels of padding
- repeat 4-7 for each object
If you upload all the images at once, the labelling queue will mix the different objects and you will lose a lot more time to draw the bounding box. Be smart!

At this point, you will have all the data you need to train the model.
Use Edge Impulse to Train the Model
If you've ever used Edge Impulse, you know this part is going to be pretty easy.
Navigate to Impulse design
on the left menu; enter 48
as both image width and image height, then
select Fit shortest axis
as resize mode.
Add the Image processing block, the Object detection learning block and save the impulse.

Now navigate to Impulse design > Image
on the left, select Grayscale
as color depth and Save parameters
.
Next click on Generate features
. It should take less than a minute to complete, depending on the number of images.
Now navigate to Impulse design > Object detection
, set the number of training cycles to 30
,
training learning rate to 0.005
, click on Choose a different model
right below the FOMO block and select
FOMO (Faster Objects, More Objects) MobileNetV2 0.1
. This is the smallest model we can train and the only one
that fits fine on the Esp32-camera.

Hit Start training
and wait until it completes. It can take 2-3 minutes depending on the number of images.

To get accurate estimates of inferencing time and memory usage as the above image shows, be sure to select "Espressif ESP-EYE" as target board in the top-right corner of the Edge Impulse page.
If you're satisfied with the results, move to the next step. If you're not, you have to:
- collect more / better images. Good input data and labelling is a critical step of Machine Learning
- increase the number of training cycles (don't go over
50
, it's almost useless) - decrease the learning rate to
0.001
. It may help, or not
Now that all looks good, it's time to export the model.
Use Edge Impulse to Export the Model into an Arduino Library
This is the shortest step.
Navigate to Deployment
on the left menu, select Arduino Library
, scroll down and hit Build
.
A zip containing the model library will download.

Keep this ready for the next step.
5. Use EloquentEsp32Cam to Run the Object Detection Model
This last part is the one that struggled me the most.
As a beginner, you won't find any good tutorial on how to deploy an Edge Impulse FOMO model to the Esp32-camera on the web.
Until now, of course!
I'll detail these steps as much as possible so you will have a hard time failing.
5.A Create a new Sketch
Open the Arduino IDE and create a new sketch. Select Esp32 Dev Module
as board and enable external PSRAM, if available
on your board.
5.B Install the Libraries
You need to install the EloquentEsp32Cam
library from the Library Manager. It is mandatory that you install version 1.1.1.

Next navigate to Sketch > Include library > Add .zip library
and select the zip downloaded from Edge Impulse.
5.C Copy the Object Detection Sketch
Copy the following sketch contents.
// 27_EdgeImpulse_FOMO.ino
#define MAX_RESOLUTION_VGA 1
/**
* Run Edge Impulse FOMO model on the Esp32 camera
*/
// replace with the name of your library
#include <esp32-cam-object-detection_inferencing.h>
#include "esp32cam.h"
#include "esp32cam/tinyml/edgeimpulse/FOMO.h"
using namespace Eloquent::Esp32cam;
Cam cam;
TinyML::EdgeImpulse::FOMO fomo;
void setup() {
Serial.begin(115200);
delay(3000);
Serial.println("Init");
cam.aithinker();
cam.highQuality();
cam.highestSaturation();
cam.vga();
while (!cam.begin())
Serial.println(cam.getErrorMessage());
}
void loop() {
if (!cam.capture()) {
Serial.println(cam.getErrorMessage());
delay(1000);
return;
}
// run FOMO model
if (!fomo.detectObjects(cam)) {
Serial.println(fomo.getErrorMessage());
delay(1000);
return;
}
// print found bounding boxes
if (fomo.hasObjects()) {
Serial.printf("Found %d objects in %d millis\n", fomo.count(), fomo.getExecutionTimeInMillis());
fomo.forEach([](size_t ix, ei_impulse_result_bounding_box_t bbox) {
Serial.print(" > BBox of label ");
Serial.print(bbox.label);
Serial.print(" at (");
Serial.print(bbox.x);
Serial.print(", ");
Serial.print(bbox.y);
Serial.print("), size ");
Serial.print(bbox.width);
Serial.print(" x ");
Serial.print(bbox.height);
Serial.println();
});
}
else {
Serial.println("No objects detected");
}
}
Even if you've never used the EloquentEsp32Cam
library and you're a beginner, you should still be able to
get what's going on in the sketch: at each loop, we take a photo, run the object detection model and for each
object found we print its label, position and size.
5.D (Optionally) Fix the Edge Impulse library
After a few days I wrote this tutorial, the Edge Impulse downloaded library stopped compiling out of the box.
The compiler complains about std::fmin
and std::fmax
.
If you get a similar error, here's how to fix:
- navigate to
Arduino root > libraries > esp32-cam-object-detection_inferencing > src > edge-impulse-sdk > tensorflow > lite > micro > tensor_utils_common.cpp
- search for
const double rmin
. For me, it is at line 84 - replace
const double rmin = std::fmin(0, *minmax.first);
withconst double rmin = (*minmax.first) > 0 ? 0 : (*minmax.first);
- on the next line, replace
const double rmax = std::fmax(0, *minmax.second);
withconst double rmax = (*minmax.second) < 0 ? 0 : (*minmax.second);
Save, go back to the Arduino IDE and compile again. Now it should succeed.
6.D Run
This is going to be the most rewarding step of all this tutorial.
Save the sketch, hit upload, open the Serial Monitor and watch the predictions scrolling while you put your objects in front of the camera.

Text may not be easily readable, but the object detection takes ~180 ms to run. This is pretty fast, if you ask me.
To Summarize
I walked you step by step on how to deploy a FOMO object detection model on your Esp32 camera even if you are a beginner. The steps we implemented are:
- Collect images using the tool from the EloquentEsp32Cam library
- Label fast the images in Edge Impulse
- Train a FOMO object detection model in the cloud
- Deploy the model back to our Esp32 camera and integrate it into our own sketch
Before this tutorial came out, collecting images and deploying the model back to the Esp32 camera were very hard to implement for a beginner.
All the existing tutorials were very skimmy on these two fundamental steps. No one (and I challenge you to prove me wrong) ever showed how to integrate the FOMO code into an existing sketch without using the (messy) default example.
If you wanted to e.g. blink a LED every time an object was detected, you had to figure it out by yourself by reading throughout all the (verbose) code of the default Edge Impulse example sketch.
Thanks to the EloquentEsp32Cam
library, you can now easily do this and a lot more!
What's next?
Now that you have the FOMO tool at your disposal, you can create whichever project you like that leverages machine vision on the cheap Esp32 camera board.
You may also want to learn all the other magic tricks that are built inside the EloquentEsp32Cam
library, such as:
- motion detection
- Telegram messaging
- face detection
- person detection
- line crossing detection
- color blob detection
I wrote an entire eBook dedicated to this.

If you appreciated this tutorial, I will appreciate that you share it on social media.
If you have comments, don't hesitate to use the form below.
Having troubles? Ask a question
Related posts
tinyml
Color classification with TinyML
Get familiar with data collection and pre-processing for Machine Learning tasks
libraries
Eloquent Edge Impulse for Arduino
An Arduino library to make Edge Impulse neural networks more accessible and easy to use
tinyml
Iris flower classification with TinyML
An introduction to Machine Learning development for Arduino
tinyml
Attiny Machine Learning
Can an Attiny85 really run Machine Learning? Let's find out in this article!
tinyml
Separable Conv2D kernels: a benchmark
Real world numbers on 2D convolution optimization strategies for embedded microcontrollers