30 minutes to your very own Esp32 Image Recognition
Create a working image recognition system that runs at 60 FPS in 1 Kb of RAM (without TensorFlow)

Have you ever wanted to perform image classification on your cheap Esp32-cam in a matter of minutes?
Do you want it to be easy and fast?
This project is for you!
Learn how to quickly implement your own image classification system on the Esp32-cam following 3 steps:
- collect images from Esp32-cam to create a dataset
- train a Machine Learning classifier on your PC to classify images
- deploy that classifier to your Esp32-cam
Image classification is the task of classifying an entire image as belonging to one of a given set of classes. It is not to be confused with object detection, which is the task of localizing objects of interest inside a given image.
Image Classification that is Fast
Image classification is not something entirely new on the Esp32-cam and other microcontrollers, thanks to TensorFlow for Microcontrollers and no-code platforms like Edge Impulse.
They come with pre-trained Neural networks of varying size and complexity that you can leverage to implement your own image recognition system.
But...
Neural Networks for image classification are heavyweight: they can take anywhere from 50 Kb to 500 Kb of RAM.
Since your cheap Esp32-cam usually comes with limited RAM, you will often be forced to opt for a low complexity, low accuracy network.
Even more, with weight it comes time complexity: classifying an image on the Esp32-cam usually takes about 500 ms (source: Edge Impulse blog).
Can we do better?
Can we do faster?
Yes, we can!
Image classification on Esp32-cam can be implemented in 30 minutes, with minimal code configuration, thanks to the Eloquent Arduino ecosystem of libraries: once deployed, it takes 1 kb of RAM and runs at 60 FPS.
Follow the next steps to get up and running!
Hardware Requirements
To follow this project the only requirement is an Esp32 camera.
You can find many models on the market:
- from Ai Thinker (the most widely used)
- from Espressif
- from M5Stack
I can't recommend enough the cameras from M5Stack because they come with 4 Mb external PSRAM, but any from the above list should work.

Software requirements
To capture the images from the Esp32-cam with ease, you need to install the EloquentEsp32Cam library. It is available on the Arduino IDE Library Manager.
To collect the images on your PC and train the Machine Learning model, you have to install the everywhereml Python package.
Create a new Python project and run
pip install everywhereml>=0.2.19
Video walkthrough
from logging import basicConfig, INFO
from IPython.display import IFrame
basicConfig(level=INFO)
IFrame('https://www.youtube.com/embed/ZxRnVBiN_y4?rel=0', width=694, height=390)
Step 1 of 5: Load the 4_Video_Feed.ino example sketch
First step to create a Machine Learning model is to collect data.
Since the Esp32-cam quality is pretty low, I recommend you to:
- fix the camera in position with tape and don't let it move
- use artificial illumination if possible (image quality degrades in low light conditions)

Something as simple as a plain background will work best
To keep acquisition speed fast, we will capture at QQVGA resolution (160 x 120). If your project requires you to capture at higher resolutions, change the sketch accordingly.
Image classification often happens at even lower resolutions anyway, so if you're not using the large version of the image for other purposes, QQVGA is the best choice
Once your setup is ready, load the 4_Video_Feed.ino
example sketch from the EloquentEsp32Cam library examples. It is reported here for faster copy-paste.
Once loaded, the Esp32-cam will connect to your WiFi network and start an HTTP video streaming server you can access from any web broswer.
#include "esp32cam.h"
#include "esp32cam/http/LiveFeed.h"
// Replace with your WiFi credentials
#define WIFI_SSID "SSID"
#define WIFI_PASS "PASSWORD"
// 80 is the port to listen to
// You can change it to whatever you want, 80 is the default for HTTP
Eloquent::Esp32cam::Cam cam;
Eloquent::Esp32cam::Http::LiveFeed feed(cam, 80);
void setup() {
Serial.begin(115200);
delay(3000);
Serial.println("Init");
/**
* Configure camera model
* Allowed values are:
* - aithinker()
* - m5()
* - m5wide()
* - eye()
* - wrover()
*/
cam.aithinker();
cam.highQuality();
cam.qqvga();
while (!cam.begin())
Serial.println(cam.getErrorMessage());
while (!cam.connect(WIFI_SSID, WIFI_PASS))
Serial.println(cam.getErrorMessage());
while (!feed.begin())
Serial.println(feed.getErrorMessage());
// make the camera accessible at http://esp32cam.local
cam.viewAt("esp32cam");
// display the IP address of the camera
Serial.println(feed.getWelcomeMessage());
}
void loop() {
}
Open the Serial Monitor to take note of the IP address of the Esp32 camera.
If your router supports mDNS (most do), you won't need the IP address and will be able to use an easier (non changing) hostname.
To check if this is the case, click on the following link: http://esp32cam.local
If you can see the live video feed, it is working!
If not, either 1) your router does not support mDNS, 2) your PC is not connected to the same WiFi network as your Esp32.
Close the window now, otherwise the next code won't work!
If you only see a frozen image (or no image at all) on Windows, you may have to disable your antivirus!
Step 2 of 5: Collect images from Esp32-cam over HTTP
Now that the Esp32-cam video stream is available over the WiFi network, we can run a program that collects the frames over HTTP.
We will make use of the MjpegCollector
class, that needs the URL of the Esp32-cam web server (the one you can read on the Serial Monitor).
"""
Collect images from Esp32-cam web server
"""
from logging import basicConfig, INFO
from everywhereml.data import ImageDataset
from everywhereml.data.collect import MjpegCollector
# you need to manually create this folder in the current working directory
base_folder = 'microcontroller_boards'
# copy here the address printed on the Serial Monitor
# (the one after "MJPEG stream available at")
IP_ADDRESS_OF_ESP = 'http://esp32cam.local:81'
basicConfig(level=INFO)
try:
# if our dataset folder already exists, load it
image_dataset = ImageDataset.from_nested_folders(
name='Dataset',
base_folder=base_folder
)
except FileNotFoundError:
# if the dataset folder does not exists, collect the samples
# from the Esp32-cam web server
# duration is how long (in seconds) the program will collect
# the images for each class
#
# After each class collection, you may need to manually create the
# subfolder to store the class images.
#
# Follow the instructions accurately!
mjpeg_collector = MjpegCollector(address=IP_ADDRESS_OF_ESP)
image_dataset = mjpeg_collector.collect_many_classes(
dataset_name='Dataset',
base_folder=base_folder,
duration=30
)
print(image_dataset)
ImageDataset[Dataset](num_images=3973, num_labels=4, labels=['no_board', 'pi', 'portenta', 'wio'])
The above snippet will start an interactive data collection procedure: it will ask for a class name and collect the frames for the given amount of time, until you decide to exit.
First of all, enter background
as class name and let the camera capture frames of nothing. If you skip this step, the model will ALWAYS try to classify the image, even if there's nothing in the scene!.
Next, put the objects you want to classify in front of the camera, enter the object name in the input field and press [Enter]. The frame collection will start after 2 seconds.

Put the objects in front of the camera before starting the collection process
Move the object a little in front of the camera to capture slight variations and make the model more robust.
Once you're done collecting frames, you can get a preview of them and check the quality of your work.
"""
Display a preview of the captured images
"""
image_dataset.preview(
samples_per_class=10,
rows_per_class=2,
figsize=(20, 10)
)

If you find that some images are bad or totally wrong, take some time to delete them.
If you feel that you may need to capture more images, do so.
Take all the time it takes to collect an high quality dataset, because in Machine Learning "garbage in, garbage out"!
Step 3 of 5: Create an Image Recognition pipeline
Having our very own dataset of images, we need a way to transform each image into something a Machine Learning model can classify.
With Neural Networks, you usually feed the raw image as input and the network learns by itself how to extract meaningful features from it.
With traditional Machine Learning it's different: we have to extract the features by ourself.
But don't worry, you don't have to do this on your own.
The everywhereml
package has all the tools you need.
First of all, our feature extractor will work with grayscale images, so let's convert the dataset from RGB to Gray.
"""
Image classification with HOG works on grayscale images at the moment
So convert images to grayscale in the range 0-255
"""
image_dataset = image_dataset.gray().uint8()
"""
Preview grayscale images
"""
image_dataset.preview(
samples_per_class=10,
rows_per_class=2,
figsize=(20, 10),
cmap='gray'
)

Now it's time to actually convert the images to feature vectors.
There exist many feature extractor for images: in this project we will make use of Histogram. of Oriented Gradients.
It is lightweight and pretty fast, so it's a good fit for embedded environments like the Esp32-cam.
To speed the processing up, we will rescale our source image to a lower resolution (40 x 30).
If you later find your classifier achieves low accuracy, you may want to tweak this resolution and see how it impacts both accuracy and execution time.
"""
Create an image recognition pipeline with HOG feature extractor
"""
from everywhereml.preprocessing.image.object_detection import HogPipeline
from everywhereml.preprocessing.image.transform import Resize
pipeline = HogPipeline(
transforms=[
Resize(width=40, height=30)
]
)
# Convert images to feature vectors
feature_dataset = pipeline.fit_transform(image_dataset)
feature_dataset.describe()
HOG: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3973/3973 [00:05<00:00, 664.37it/s]
hog0 | hog1 | hog2 | hog3 | hog4 | hog5 | hog6 | hog7 | hog8 | hog9 | ... | hog126 | hog127 | hog128 | hog129 | hog130 | hog131 | hog132 | hog133 | hog134 | target | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 3973.000000 | 3973.000000 | 3973.000000 | 3973.000000 | 3973.000000 | 3973.000000 | 3973.000000 | 3973.000000 | 3973.000000 | 3973.000000 | ... | 3973.0 | 3973.0 | 3973.0 | 3973.0 | 3973.000000 | 3973.000000 | 3973.000000 | 3973.000000 | 3973.000000 | 3973.000000 |
mean | 0.004144 | 0.013228 | 0.382175 | 0.197764 | 0.065186 | 0.038199 | 0.571028 | 0.607736 | 0.103253 | 0.001294 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.001653 | 0.008949 | 0.035351 | 0.276698 | 0.390667 | 1.502643 |
std | 0.011126 | 0.031442 | 0.209502 | 0.075928 | 0.030605 | 0.027203 | 0.082439 | 0.099930 | 0.027841 | 0.005890 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.012827 | 0.041605 | 0.112141 | 0.125652 | 0.139484 | 1.117609 |
min | 0.000000 | 0.000000 | 0.042983 | 0.006226 | 0.000000 | 0.000000 | 0.330738 | 0.219745 | 0.012631 | 0.000000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.105724 | 0.000000 |
25% | 0.000000 | 0.000000 | 0.236713 | 0.148486 | 0.042255 | 0.023224 | 0.517313 | 0.547261 | 0.085016 | 0.000000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.190454 | 0.287429 | 1.000000 |
50% | 0.000000 | 0.000000 | 0.320816 | 0.188273 | 0.064217 | 0.036321 | 0.582529 | 0.623214 | 0.103585 | 0.000000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.254019 | 0.379817 | 2.000000 |
75% | 0.000000 | 0.008949 | 0.485955 | 0.236276 | 0.085064 | 0.049494 | 0.615910 | 0.675409 | 0.122275 | 0.000000 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.000000 | 0.320986 | 0.441451 | 3.000000 |
max | 0.101935 | 0.309142 | 1.000000 | 0.623691 | 0.176081 | 0.337037 | 1.000000 | 0.938173 | 0.184347 | 0.059208 | ... | 0.0 | 0.0 | 0.0 | 0.0 | 0.189369 | 0.426923 | 1.000000 | 1.000000 | 1.000000 | 3.000000 |
8 rows × 136 columns
"""
Print pipeline description
"""
print(pipeline)
ImagePipeline: HogPipeline --------- - Resize(from=(160, 120), to=(40, 30), pixformat=gray) > HOG(block_size=8, bins=9, cell_size=3)
The output of the above code is a dataset made of feature vectors, instead of images. These feature vectors are now suitable for Machine Learning models.
To get a visual idea of how informative the extracted features are, we can plot a pairplot of them.
A pairplot compares each feature against the others in a grid format. By highlighting each class with a different color, you can quickly get if the features are able to "isolate" a class (if you can do this by eye, a Machine Learning classifier will be able too!).
"""
Plot pairplot of features.
Feel free to open the image in a new window to see it at full scale.
In the next line:
- n is the number of points to plot (the greater the value, the longer it takes)
- k is the number of features (values greater than 10 become messy)
"""
feature_dataset.plot.features_pairplot(n=200, k=8)
/Users/simone/PycharmProjects/PGPackaging/HyperML/venv/lib/python3.8/site-packages/sklearn/feature_selection/_univariate_selection.py:112: UserWarning: Features [ 81 82 126 127 128 129] are constant. warnings.warn("Features %s are constant." % constant_features_idx, UserWarning) /Users/simone/PycharmProjects/PGPackaging/HyperML/venv/lib/python3.8/site-packages/sklearn/feature_selection/_univariate_selection.py:113: RuntimeWarning: invalid value encountered in divide f = msb / msw

In this case, we can clearly see that while the wio class and empty class are well clustered, the pi and portenta are always mixed to some degree.
This tells us that the classifier will mis-label them sometimes.
Another kind of visualization is UMAP.
UMAP (Uniform Manifold Approximation and Projection) is a dimensionality reduction algorithm.
It takes a feature vector of length N and "compresses" it to, in our case, length 2, while trying to preserve the topology structure of the original vector.
By collapsing the feature vectors to (x, y) pairs, we can plot them on a scatter plot.
"""
Plot UMAP of features
If features are discriminative, we should see well defined clusters of points
"""
feature_dataset.plot.umap()

If we see well defined cluster of points (as in the above image), it means that our features do a great job at describing each class.
If you feel like the pairplot and the UMAP "disagree", it is true only to some extent. UMAP applies heavy lifting to data to preserve the cluster isolation; most Machine Learning models won't do so. You should give more importance to the pairplot in the context of TinyML
Step 4 of 5: Train a Machine Learning classifier
From the above graphics we can say that our features are pretty good at characterizing our data, so it is time to train a classifier.
There are many available, but one of the most effective is Random Forest. You can tweak its configuration as you prefer, but the values below should work fine in most cases.
"""
Create and fit RandomForest classifier
"""
from everywhereml.sklearn.ensemble import RandomForestClassifier
for i in range(10):
clf = RandomForestClassifier(n_estimators=5, max_depth=10)
# fit on train split and get accuracy on the test split
train, test = feature_dataset.split(test_size=0.4, random_state=i)
clf.fit(train)
print('Score on test set: %.2f' % clf.score(test))
# now fit on the whole dataset
clf.fit(feature_dataset)
Score on test set: 1.00 Score on test set: 1.00 Score on test set: 1.00 Score on test set: 1.00 Score on test set: 1.00 Score on test set: 1.00 Score on test set: 1.00 Score on test set: 1.00 Score on test set: 1.00 Score on test set: 1.00
RandomForestClassifier(base_estimator=deprecated, bootstrap=True, ccp_alpha=0.0, class_name=RandomForestClassifier, class_weight=None, criterion=gini, estimator=DecisionTreeClassifier(), estimator_params=('criterion', 'max_depth', 'min_samples_split', 'min_samples_leaf', 'min_weight_fraction_leaf', 'max_features', 'max_leaf_nodes', 'min_impurity_decrease', 'random_state', 'ccp_alpha'), max_depth=10, max_features=sqrt, max_leaf_nodes=None, max_samples=None, min_impurity_decrease=0.0, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=5, n_jobs=None, num_outputs=4, oob_score=False, package_name=everywhereml.sklearn.ensemble, random_state=None, template_folder=everywhereml/sklearn/ensemble, verbose=0, warm_start=False)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomForestClassifier(base_estimator=deprecated, bootstrap=True, ccp_alpha=0.0, class_name=RandomForestClassifier, class_weight=None, criterion=gini, estimator=DecisionTreeClassifier(), estimator_params=('criterion', 'max_depth', 'min_samples_split', 'min_samples_leaf', 'min_weight_fraction_leaf', 'max_features', 'max_leaf_nodes', 'min_impurity_decrease', 'random_state', 'ccp_alpha'), max_depth=10, max_features=sqrt, max_leaf_nodes=None, max_samples=None, min_impurity_decrease=0.0, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=5, n_jobs=None, num_outputs=4, oob_score=False, package_name=everywhereml.sklearn.ensemble, random_state=None, template_folder=everywhereml/sklearn/ensemble, verbose=0, warm_start=False)
Depending on your dataset, you can expect your accuracy to range from 0.7 to 1.
If it is lower (or too low for your use case), you can:
- improve your dataset (collect more images, fix your setup)
- tweak the
resize
parameter of the HogPipeline to an higher resolution - tweak the
RandomForestClassifier
parameters (see documentation)
If you're satisfied, it's time to port the whole system to your Esp32-cam.
Step 5 of 5: Port to Esp32
Last step is to convert the HogPipeline
and RandomForestClassifier
to C++ code that can run on your Esp32-cam.
This process is very straightforward, since you only need a line of code.
Create a new project in the Arduino IDE to hold all the following files.
"""
Export pipeline to C++
Replace the path to your actual sketch path
"""
print(pipeline.to_arduino_file(
filename='path-to-sketch/HogPipeline.h',
instance_name='hog'
))
#ifndef UUID5351829856 #define UUID5351829856 #ifndef UUID5351832016 #define UUID5351832016 /** * HOG(block_size=8, bins=9, cell_size=3) */ class HOG { public: /** * Transform input image */ template<typename T, typename U> bool transform(T *input, U *output) { uint16_t f = 0; uint16_t block = 0; float hog[135] = {0}; // compute gradients for (uint16_t blockY = 0; blockY < 3; blockY++) { const uint16_t blockOffsetY = blockY * 320; for (uint16_t blockX = 0; blockX < 5; blockX++) { const uint16_t blockOffsetX = blockX * 8; float hist[9] = {0}; for (uint16_t _y = 1; _y < 7; _y += 1) { const uint16_t rowOffset = blockOffsetY + _y * 40 + blockOffsetX; const uint16_t rowOffsetBefore = rowOffset - 40; const uint16_t rowOffsetAfter = rowOffset + 40; for (uint16_t _x = 1; _x < 7; _x += 1) { const uint16_t offset = rowOffset + _x; const uint16_t offsetBefore = rowOffsetBefore + _x; const uint16_t offsetAfter = rowOffsetAfter + _x; const float gy = ((float) input[offsetAfter]) - input[offsetBefore]; const float gx = ((float) input[offset + 1]) - input[offset - 1]; const float g = sqrt(gy * gy + gx * gx); uint8_t angle = abs(this->arctan(gy, gx) * 180 / 3.141592653589793f / 20); if (angle >= 8) angle = 8; hist[angle] += g; } } for (uint16_t i = 0; i < 9; i++) hog[f++] = hist[i]; block += 1; // end of cell, normalize if ((block % 3) == 0) { const uint16_t offset = (block - 3) * 9; float maxGradient = 0.0001; for (uint16_t i = 0; i < 27; i++) { const float h = hog[offset + i]; if (h > maxGradient) maxGradient = h; } for (uint16_t i = 0; i < 27; i++) { hog[offset + i] /= maxGradient; } maxGradient = 0.0001; } } } // copy over for (uint16_t i = 0; i < 135; i++) output[i] = hog[i]; return true; } protected: /** * optional atan2 approximation for faster calculation */ float arctan(float y, float x) { float r = 0; if (abs(y) < 0.00000001) return 0; else if (abs(x) < 0.00000001) return 3.14159274 * (y > 0 ? 1 : -1); else { float a = min(abs(x), abs(y)) / max(abs(x), abs(y)); float s = a * a; r = ((-0.0464964749 * s + 0.15931422) * s - 0.327622764) * s * a + a; if (abs(y) > abs(x)) r = 1.57079637 - r; } if (x < 0) r = 3.14159274 - r; if (y < 0) r = -r; return r; } }; #endif /** * ImagePipeline: HogPipeline * --------- * - Resize(from=(160, 120), to=(40, 30), pixformat=gray) * > HOG(block_size=8, bins=9, cell_size=3) */ class HogPipeline { public: static const size_t NUM_INPUTS = 1200; static const size_t NUM_OUTPUTS = 135; static const size_t WORKING_SIZE = 135; float features[135]; /** * Extract features from input image */ template<typename T> bool transform(T *input) { time_t start = micros(); ok = true; preprocess(input); ok = ok && hog.transform(input, features); latency = micros() - start; return ok; } /** * Debug output feature vector */ template<typename PrinterInterface> void debugTo(PrinterInterface &printer, uint8_t precision=5) { printer.print(features[0], precision); for (uint16_t i = 1; i < 135; i++) { printer.print(", "); printer.print(features[i], precision); } printer.print('\n'); } /** * Get latency in micros */ uint32_t latencyInMicros() { return latency; } /** * Get latency in millis */ uint16_t latencyInMillis() { return latency / 1000; } protected: bool ok; time_t latency; HOG hog; template<typename T> void preprocess(T *input) { // grayscale rescaling const float dy = 4.0f; const float dx = 4.0f; for (uint16_t y = 0; y < 30; y++) { const size_t sourceOffset = round(y * dy) * 160; const size_t destOffset = y * 40; for (uint16_t x = 0; x < 40; x++) input[destOffset + x] = input[sourceOffset + ((uint16_t) (x * dx))]; } } }; static HogPipeline hog; #endif
"""
Export classifier to C++
Replace the path to your actual sketch path
The class_map parameters convert numeric classes to human-readable strings
"""
print(clf.to_arduino_file(
filename='path-to-sketch/HogClassifier.h',
instance_name='classifier',
class_map=feature_dataset.class_map
))
#ifndef UUID5414136464 #define UUID5414136464 /** * RandomForestClassifier(base_estimator=deprecated, bootstrap=True, ccp_alpha=0.0, class_name=RandomForestClassifier, class_weight=None, criterion=gini, estimator=DecisionTreeClassifier(), estimator_params=('criterion', 'max_depth', 'min_samples_split', 'min_samples_leaf', 'min_weight_fraction_leaf', 'max_features', 'max_leaf_nodes', 'min_impurity_decrease', 'random_state', 'ccp_alpha'), max_depth=10, max_features=sqrt, max_leaf_nodes=None, max_samples=None, min_impurity_decrease=0.0, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=5, n_jobs=None, num_outputs=4, oob_score=False, package_name=everywhereml.sklearn.ensemble, random_state=None, template_folder=everywhereml/sklearn/ensemble, verbose=0, warm_start=False) */ class RandomForestClassifier { public: /** * Predict class from features */ int predict(float *x) { int predictedValue = 0; size_t startedAt = micros(); uint16_t votes[4] = { 0 }; uint8_t classIdx = 0; float classScore = 0; tree0(x, &classIdx, &classScore); votes[classIdx] += classScore; tree1(x, &classIdx, &classScore); votes[classIdx] += classScore; tree2(x, &classIdx, &classScore); votes[classIdx] += classScore; tree3(x, &classIdx, &classScore); votes[classIdx] += classScore; tree4(x, &classIdx, &classScore); votes[classIdx] += classScore; // return argmax of votes uint8_t maxClassIdx = 0; float maxVote = votes[0]; for (uint8_t i = 1; i < 4; i++) { if (votes[i] > maxVote) { maxClassIdx = i; maxVote = votes[i]; } } predictedValue = maxClassIdx; latency = micros() - startedAt; return (lastPrediction = predictedValue); } /** * Predict class label */ String predictLabel(float *x) { return getLabelOf(predict(x)); } /** * Get label of last prediction */ String getLabel() { return getLabelOf(lastPrediction); } /** * Get label of given class */ String getLabelOf(int8_t idx) { switch (idx) { case -1: return "ERROR"; case 0: return "no_board"; case 1: return "pi"; case 2: return "portenta"; case 3: return "wio"; default: return "UNKNOWN"; } } /** * Get latency in micros */ uint32_t latencyInMicros() { return latency; } /** * Get latency in millis */ uint16_t latencyInMillis() { return latency / 1000; } protected: float latency = 0; int lastPrediction = 0; /** * Random forest's tree #0 */ void tree0(float *x, uint8_t *classIdx, float *classScore) { if (x[112] <= 0.13642732053995132) { if (x[93] <= 0.10766664519906044) { if (x[79] <= 0.24860745668411255) { if (x[74] <= 0.6719709634780884) { *classIdx = 3; *classScore = 924.0; return; } else { *classIdx = 1; *classScore = 1068.0; return; } } else { *classIdx = 0; *classScore = 1009.0; return; } } else { if (x[104] <= 0.5145941078662872) { if (x[73] <= 0.15820211917161942) { if (x[33] <= 0.6580864191055298) { if (x[40] <= 0.09790854901075363) { if (x[78] <= 0.01255653840053128) { *classIdx = 3; *classScore = 924.0; return; } else { *classIdx = 2; *classScore = 972.0; return; } } else { *classIdx = 1; *classScore = 1068.0; return; } } else { *classIdx = 0; *classScore = 1009.0; return; } } else { if (x[103] <= 0.6198223605751991) { *classIdx = 3; *classScore = 924.0; return; } else { *classIdx = 2; *classScore = 972.0; return; } } } else { *classIdx = 2; *classScore = 972.0; return; } } } else { if (x[55] <= 0.2049400508403778) { if (x[22] <= 0.51405268907547) { if (x[78] <= 0.3790138363838196) { if (x[61] <= 0.22242896258831024) { if (x[84] <= 0.8479238152503967) { if (x[103] <= 0.2546844808384776) { *classIdx = 3; *classScore = 924.0; return; } else { if (x[42] <= 0.17196498066186905) { if (x[34] <= 0.5830237567424774) { if (x[32] <= 0.01859624870121479) { *classIdx = 2; *classScore = 972.0; return; } else { *classIdx = 2; *classScore = 972.0; return; } } else { if (x[13] <= 0.4550892412662506) { *classIdx = 2; *classScore = 972.0; return; } else { *classIdx = 1; *classScore = 1068.0; return; } } } else { *classIdx = 1; *classScore = 1068.0; return; } } } else { *classIdx = 1; *classScore = 1068.0; return; } } else { *classIdx = 3; *classScore = 924.0; return; } } else { if (x[3] <= 0.2019847333431244) { *classIdx = 2; *classScore = 972.0; return; } else { *classIdx = 1; *classScore = 1068.0; return; } } } else { if (x[47] <= 0.18128295242786407) { *classIdx = 3; *classScore = 924.0; return; } else { *classIdx = 2; *classScore = 972.0; return; } } } else { if (x[91] <= 0.02693597599864006) { if (x[74] <= 0.4636997729539871) { *classIdx = 2; *classScore = 972.0; return; } else { *classIdx = 1; *classScore = 1068.0; return; } } else { if (x[18] <= 0.329525426030159) { *classIdx = 2; *classScore = 972.0; return; } else { if (x[57] <= 0.7670303285121918) { if (x[50] <= 0.009573150891810656) { if (x[102] <= 0.40388256311416626) { if (x[15] <= 0.05786064267158508) { *classIdx = 2; *classScore = 972.0; return; } else { *classIdx = 1; *classScore = 1068.0; return; } } else { *classIdx = 1; *classScore = 1068.0; return; } } else { if (x[57] <= 0.7084416449069977) { if (x[119] <= 0.23356419801712036) { *classIdx = 1; *classScore = 1068.0; return; } else { *classIdx = 2; *classScore = 972.0; return; } } else { if (x[98] <= 0.023570695891976357) { *classIdx = 2; *classScore = 972.0; return; } else { *classIdx = 1; *classScore = 1068.0; return; } } } } else { *classIdx = 2; *classScore = 972.0; return; } } } } } } /** * Random forest's tree #1 */ void tree1(float *x, uint8_t *classIdx, float *classScore) { if (x[43] <= 0.2961336672306061) { if (x[71] <= 0.00032758235465735197) { if (x[34] <= 0.32378068566322327) { if (x[90] <= 0.11530342698097229) { if (x[33] <= 0.09442633390426636) { if (x[94] <= 0.5090316385030746) { if (x[102] <= 0.10444577783346176) { if (x[119] <= 0.12923103850334883) { *classIdx = 3; *classScore = 991.0; return; } else { *classIdx = 2; *classScore = 965.0; return; } } else { if (x[63] <= 0.46751388907432556) { *classIdx = 2; *classScore = 965.0; return; } else { if (x[55] <= 0.18274280056357384) { *classIdx = 3; *classScore = 991.0; return; } else { *classIdx = 1; *classScore = 1009.0; return; } } } } else { *classIdx = 1; *classScore = 1009.0; return; } } else { if (x[35] <= 0.6535502374172211) { if (x[63] <= 0.900971919298172) { if (x[123] <= 0.42087164521217346) { *classIdx = 2; *classScore = 965.0; return; } else { if (x[44] <= 0.1323838010430336) { *classIdx = 1; *classScore = 1009.0; return; } else { *classIdx = 2; *classScore = 965.0; return; } } } else { if (x[107] <= 0.1853318065404892) { *classIdx = 0; *classScore = 1008.0; return; } else { *classIdx = 1; *classScore = 1009.0; return; } } } else { if (x[4] <= 0.044335125014185905) { if (x[47] <= 0.5021862834692001) { *classIdx = 1; *classScore = 1009.0; return; } else { *classIdx = 2; *classScore = 965.0; return; } } else { if (x[119] <= 0.22475799918174744) { *classIdx = 1; *classScore = 1009.0; return; } else { *classIdx = 2; *classScore = 965.0; return; } } } } } else { if (x[59] <= 0.15819567441940308) { if (x[99] <= 0.1404225081205368) { if (x[68] <= 0.041486963629722595) { if (x[46] <= 0.03549467585980892) { *classIdx = 2; *classScore = 965.0; return; } else { *classIdx = 1; *classScore = 1009.0; return; } } else { *classIdx = 1; *classScore = 1009.0; return; } } else { if (x[18] <= 0.4906560480594635) { *classIdx = 2; *classScore = 965.0; return; } else { *classIdx = 1; *classScore = 1009.0; return; } } } else { *classIdx = 3; *classScore = 991.0; return; } } } else { if (x[118] <= 0.1174352876842022) { if (x[99] <= 0.10693112015724182) { if (x[72] <= 0.33362777531147003) { *classIdx = 1; *classScore = 1009.0; return; } else { if (x[73] <= 0.6788644343614578) { *classIdx = 2; *classScore = 965.0; return; } else { *classIdx = 3; *classScore = 991.0; return; } } } else { if (x[35] <= 0.660439670085907) { *classIdx = 2; *classScore = 965.0; return; } else { *classIdx = 1; *classScore = 1009.0; return; } } } else { if (x[37] <= 0.004387721419334412) { if (x[120] <= 0.15494372276589274) { *classIdx = 1; *classScore = 1009.0; return; } else { *classIdx = 2; *classScore = 965.0; return; } } else { *classIdx = 2; *classScore = 965.0; return; } } } } else { if (x[124] <= 0.2986244857311249) { *classIdx = 3; *classScore = 991.0; return; } else { *classIdx = 2; *classScore = 965.0; return; } } } else { if (x[94] <= 0.1600039303302765) { *classIdx = 0; *classScore = 1008.0; return; } else { if (x[33] <= 0.2365989163517952) { if (x[118] <= 0.10640082601457834) { *classIdx = 3; *classScore = 991.0; return; } else { *classIdx = 1; *classScore = 1009.0; return; } } else { *classIdx = 2; *classScore = 965.0; return; } } } } /** * Random forest's tree #2 */ void tree2(float *x, uint8_t *classIdx, float *classScore) { if (x[73] <= 0.0031274217180907726) { if (x[67] <= 0.25749216973781586) { if (x[107] <= 0.18421800434589386) { *classIdx = 0; *classScore = 980.0; return; } else { *classIdx = 1; *classScore = 999.0; return; } } else { if (x[32] <= 0.07408507168292999) { if (x[103] <= 0.7989668846130371) { *classIdx = 1; *classScore = 999.0; return; } else { *classIdx = 2; *classScore = 974.0; return; } } else { if (x[48] <= 0.1834709495306015) { *classIdx = 1; *classScore = 999.0; return; } else { if (x[51] <= 0.0414448007941246) { *classIdx = 2; *classScore = 974.0; return; } else { *classIdx = 0; *classScore = 980.0; return; } } } } } else { if (x[119] <= 0.05613940209150314) { if (x[56] <= 0.23911834508180618) { if (x[124] <= 0.3264329582452774) { *classIdx = 3; *classScore = 1020.0; return; } else { if (x[58] <= 0.15283169224858284) { if (x[104] <= 0.31329280138015747) { *classIdx = 2; *classScore = 974.0; return; } else { *classIdx = 1; *classScore = 999.0; return; } } else { *classIdx = 2; *classScore = 974.0; return; } } } else { if (x[103] <= 0.99990114569664) { if (x[72] <= 0.3486354500055313) { if (x[55] <= 0.13719180971384048) { if (x[75] <= 0.3671320825815201) { *classIdx = 1; *classScore = 999.0; return; } else { *classIdx = 2; *classScore = 974.0; return; } } else { if (x[47] <= 0.9765305519104004) { *classIdx = 1; *classScore = 999.0; return; } else { *classIdx = 2; *classScore = 974.0; return; } } } else { if (x[49] <= 0.11988793313503265) { *classIdx = 2; *classScore = 974.0; return; } else { if (x[59] <= 0.15384046733379364) { if (x[55] <= 0.27266792953014374) { *classIdx = 2; *classScore = 974.0; return; } else { *classIdx = 1; *classScore = 999.0; return; } } else { *classIdx = 3; *classScore = 1020.0; return; } } } } else { if (x[55] <= 0.24919167906045914) { if (x[86] <= 0.2719164863228798) { *classIdx = 2; *classScore = 974.0; return; } else { *classIdx = 1; *classScore = 999.0; return; } } else { *classIdx = 1; *classScore = 999.0; return; } } } } else { if (x[46] <= 0.22156193107366562) { *classIdx = 2; *classScore = 974.0; return; } else { *classIdx = 1; *classScore = 999.0; return; } } } } /** * Random forest's tree #3 */ void tree3(float *x, uint8_t *classIdx, float *classScore) { if (x[111] <= 0.11711716279387474) { if (x[103] <= 0.24427950382232666) { if (x[104] <= 0.0019440746400505304) { *classIdx = 0; *classScore = 983.0; return; } else { *classIdx = 3; *classScore = 1001.0; return; } } else { *classIdx = 2; *classScore = 998.0; return; } } else { if (x[103] <= 0.027491490356624126) { if (x[105] <= 0.2146984338760376) { *classIdx = 0; *classScore = 983.0; return; } else { if (x[112] <= 0.5272030308842659) { *classIdx = 3; *classScore = 1001.0; return; } else { *classIdx = 1; *classScore = 991.0; return; } } } else { if (x[57] <= 0.5472723841667175) { if (x[35] <= 0.46478767693042755) { if (x[108] <= 0.881377637386322) { if (x[7] <= 0.750912994146347) { *classIdx = 2; *classScore = 998.0; return; } else { *classIdx = 1; *classScore = 991.0; return; } } else { *classIdx = 3; *classScore = 1001.0; return; } } else { if (x[56] <= 0.16459540277719498) { *classIdx = 3; *classScore = 1001.0; return; } else { if (x[99] <= 0.23425475507974625) { *classIdx = 1; *classScore = 991.0; return; } else { if (x[117] <= 0.5565855763852596) { *classIdx = 1; *classScore = 991.0; return; } else { *classIdx = 2; *classScore = 998.0; return; } } } } } else { if (x[91] <= 0.3536626696586609) { if (x[86] <= 0.45749130845069885) { if (x[94] <= 0.0066089634783566) { *classIdx = 1; *classScore = 991.0; return; } else { if (x[78] <= 0.3713982254266739) { if (x[46] <= 0.17745181918144226) { if (x[1] <= 0.018077485729008913) { if (x[49] <= 0.42618507146835327) { *classIdx = 2; *classScore = 998.0; return; } else { *classIdx = 1; *classScore = 991.0; return; } } else { *classIdx = 1; *classScore = 991.0; return; } } else { if (x[7] <= 0.6998633146286011) { *classIdx = 2; *classScore = 998.0; return; } else { *classIdx = 1; *classScore = 991.0; return; } } } else { *classIdx = 1; *classScore = 991.0; return; } } } else { *classIdx = 1; *classScore = 991.0; return; } } else { if (x[35] <= 0.3449694290757179) { *classIdx = 2; *classScore = 998.0; return; } else { *classIdx = 1; *classScore = 991.0; return; } } } } } } /** * Random forest's tree #4 */ void tree4(float *x, uint8_t *classIdx, float *classScore) { if (x[103] <= 0.002042085863649845) { if (x[73] <= 0.2877766191959381) { *classIdx = 0; *classScore = 979.0; return; } else { *classIdx = 3; *classScore = 977.0; return; } } else { if (x[68] <= 0.04799677059054375) { if (x[28] <= 0.5193119049072266) { if (x[29] <= 0.9195823073387146) { if (x[55] <= 0.18093488365411758) { if (x[93] <= 0.3958347737789154) { *classIdx = 3; *classScore = 977.0; return; } else { *classIdx = 2; *classScore = 1009.0; return; } } else { *classIdx = 1; *classScore = 1008.0; return; } } else { *classIdx = 2; *classScore = 1009.0; return; } } else { if (x[108] <= 0.8755479753017426) { *classIdx = 1; *classScore = 1008.0; return; } else { *classIdx = 3; *classScore = 977.0; return; } } } else { if (x[106] <= 0.09126594290137291) { if (x[47] <= 0.19886062294244766) { if (x[55] <= 0.20575668662786484) { if (x[57] <= 0.24623526632785797) { *classIdx = 3; *classScore = 977.0; return; } else { if (x[56] <= 0.7031114101409912) { *classIdx = 2; *classScore = 1009.0; return; } else { *classIdx = 1; *classScore = 1008.0; return; } } } else { if (x[59] <= 0.15399130061268806) { *classIdx = 1; *classScore = 1008.0; return; } else { *classIdx = 3; *classScore = 977.0; return; } } } else { if (x[72] <= 0.3628241717815399) { if (x[71] <= 0.013572153635323048) { if (x[101] <= 0.4474247843027115) { if (x[29] <= 0.736113578081131) { *classIdx = 1; *classScore = 1008.0; return; } else { *classIdx = 2; *classScore = 1009.0; return; } } else { *classIdx = 0; *classScore = 979.0; return; } } else { *classIdx = 3; *classScore = 977.0; return; } } else { if (x[13] <= 0.6972875893115997) { *classIdx = 2; *classScore = 1009.0; return; } else { *classIdx = 3; *classScore = 977.0; return; } } } } else { if (x[18] <= 0.6039438545703888) { if (x[72] <= 0.1549278348684311) { *classIdx = 1; *classScore = 1008.0; return; } else { if (x[2] <= 0.3677712380886078) { if (x[86] <= 0.2421632707118988) { if (x[77] <= 0.009909989312291145) { if (x[64] <= 0.33217500150203705) { *classIdx = 1; *classScore = 1008.0; return; } else { *classIdx = 2; *classScore = 1009.0; return; } } else { *classIdx = 2; *classScore = 1009.0; return; } } else { *classIdx = 1; *classScore = 1008.0; return; } } else { if (x[111] <= 0.17989542707800865) { *classIdx = 3; *classScore = 977.0; return; } else { *classIdx = 1; *classScore = 1008.0; return; } } } } else { if (x[111] <= 0.1835588961839676) { if (x[131] <= 0.012005271390080452) { if (x[77] <= 0.060762908309698105) { *classIdx = 3; *classScore = 977.0; return; } else { *classIdx = 2; *classScore = 1009.0; return; } } else { *classIdx = 2; *classScore = 1009.0; return; } } else { if (x[48] <= 0.23723432421684265) { if (x[36] <= 0.027184411883354187) { if (x[35] <= 0.46229688823223114) { if (x[7] <= 0.7153618931770325) { *classIdx = 2; *classScore = 1009.0; return; } else { *classIdx = 1; *classScore = 1008.0; return; } } else { *classIdx = 1; *classScore = 1008.0; return; } } else { *classIdx = 2; *classScore = 1009.0; return; } } else { if (x[44] <= 0.10724342614412308) { if (x[91] <= 0.3177187889814377) { if (x[47] <= 0.16174422949552536) { *classIdx = 1; *classScore = 1008.0; return; } else { *classIdx = 2; *classScore = 1009.0; return; } } else { if (x[6] <= 0.6613056063652039) { *classIdx = 1; *classScore = 1008.0; return; } else { *classIdx = 2; *classScore = 1009.0; return; } } } else { if (x[118] <= 0.09889509528875351) { if (x[28] <= 0.7939835786819458) { if (x[35] <= 0.2191457599401474) { *classIdx = 2; *classScore = 1009.0; return; } else { *classIdx = 1; *classScore = 1008.0; return; } } else { *classIdx = 2; *classScore = 1009.0; return; } } else { if (x[92] <= 0.6761123239994049) { *classIdx = 2; *classScore = 1009.0; return; } else { *classIdx = 1; *classScore = 1008.0; return; } } } } } } } } } } }; static RandomForestClassifier classifier; #endif
And this is the main code to put in the .ino
file.
#include "eloquent.h"
#include "eloquent/print.h"
#include "eloquent/tinyml/voting/quorum.h"
// replace 'm5wide' with your own model
// possible values are 'aithinker', 'eye', 'm5stack', 'm5wide', 'wrover'
#include "eloquent/vision/camera/aithinker.h"
#include "HogPipeline.h"
#include "HogClassifier.h"
Eloquent::TinyML::Voting::Quorum<7> quorum;
void setup() {
Serial.begin(115200);
delay(3000);
Serial.println("Begin");
camera.qqvga();
camera.grayscale();
while (!camera.begin())
Serial.println("Cannot init camera");
}
void loop() {
if (!camera.capture()) {
Serial.println(camera.getErrorMessage());
delay(1000);
return;
}
// apply HOG pipeline to camera frame
hog.transform(camera.buffer);
// get a stable prediction
// this is optional, but will improve the stability of predictions
uint8_t prediction = classifier.predict(hog.features);
int8_t stablePrediction = quorum.vote(prediction);
if (quorum.isStable()) {
eloquent::print::printf(
Serial,
"Stable prediction: %s \t(DSP: %d ms, Classifier: %d us)\n",
classifier.getLabelOf(stablePrediction),
hog.latencyInMillis(),
classifier.latencyInMicros()
);
}
camera.free();
}
Hit upload and put your objects in front of the camera: you will see the predicted label.
Demo video
If you follow the above steps, you will end with the following result.
You can see that the portenta and pi are mis-labelled quite often: this is expected result as we saw from the features pairplot.
"""
Play demo video
"""
from IPython.display import Video
Video("assets/esp32 image object classification live demo.mp4", width=728)
Processing time is 12 milliseconds, while classification time is < 20 microseconds (1 / 1000 th of DSP!). If you do the math, this translates to ~80 FPS, which is greater than your Esp32-cam frame rate.
In the next release of everywhereml
, thanks to some approximated math, DSP will lower to 6 milliseconds (a.k.a. 160 FPS!)
Take your Esp32-cam skills to the next level!
Did this project got your attention?
Do you want to become an expert in the use of the Esp32-cam, but you don't know where to start?
Look no further, I have the complete guide that you need: it's called "Mastering the Esp32 Camera" and I'm writing it just now.
It will contain a more in-depth version of this image recognition project and much more:
- Motion detection
- Color blob detection
- Passing people counter
- Line crossing counter
- Person detection
- Face detection
- TensorFlow YOLO
- Edge Impulse
You can buy the pre-sale right now (while I'm still writing it), which gives you access to:
- 50% discount on retail price (19,99$ when done)
- early access to chapters drafts and code
- partecipate in polls and feature requests to shape the contents of the book while I'm writing
- join the Discord community with all the other people who bought this pre-sale
If you're hooked, don't wait more! Visit the dedicated page.

Conclusion
When it comes to image recognition on Esp32-cam, you have two options:
- If you're looking for the best accuracy possibile, you should stick to Neural Networks: they achieve state-of-the-art results. Platforms like Edge Impulse will speed up your development time.
- If you're goal is to implement something that works good and really fast, you now have an alternative option to choose thanks to the Eloquent Arduino libraries.