Running TensorFlow Lite on microcontrollers is a pain.
If you're just getting started and you follow the official tutorials on the TensorFlow blog or the Arduino website, you'll soon get lost. They are outdated and many of the examples provided just don't work.
And this is a shame, since the ESP32 (and even better the ESP32S3) is a very powerful chip that can handle the heavy computations of a TensorFlow model with little to no problems.
The goal of this tutorial is to teach you:
- how to train a TensorFlow model in the browser, without installing anything on your PC
- how to run that model on your ESP32
Train & Export a TensorFlow NN
Run a TensorFlow NN on ESP32
Running the exported network is pretty easy thanks to the EloquentTinyML library. You can install it from the Arduino IDE Library Manager.
Then copy the sketch below.
Filename: IrisExample.ino
/**
* Run a TensorFlow model to predict the IRIS dataset
* For a complete guide, visit
* https://eloquentarduino.com/tensorflow-lite-esp32
*/
// replace with your own model
// include BEFORE <eloquent_tinyml.h>!
#include "irisModel.h"
// include the runtime specific for your board
// either tflm_esp32 or tflm_cortexm
#include <tflm_esp32.h>
// now you can include the eloquent tinyml wrapper
#include <eloquent_tinyml.h>
// this is trial-and-error process
// when developing a new model, start with a high value
// (e.g. 10000), then decrease until the model stops
// working as expected
#define ARENA_SIZE 2000
Eloquent::TF::Sequential<TF_NUM_OPS, ARENA_SIZE> tf;
/**
*
*/
void setup() {
Serial.begin(115200);
delay(3000);
Serial.println("__TENSORFLOW IRIS__");
// configure input/output
// (not mandatory if you generated the .h model
// using the everywhereml Python package)
tf.setNumInputs(4);
tf.setNumOutputs(3);
// add required ops
// (not mandatory if you generated the .h model
// using the everywhereml Python package)
tf.resolver.AddFullyConnected();
tf.resolver.AddSoftmax();
while (!tf.begin(irisModel).isOk())
Serial.println(tf.exception.toString());
}
void loop() {
// classify class 0
if (!tf.predict(x0).isOk()) {
Serial.println(tf.exception.toString());
return;
}
Serial.print("expcted class 0, predicted class ");
Serial.println(tf.classification);
// classify class 1
if (!tf.predict(x1).isOk()) {
Serial.println(tf.exception.toString());
return;
}
Serial.print("expcted class 1, predicted class ");
Serial.println(tf.classification);
// classify class 2
if (!tf.predict(x2).isOk()) {
Serial.println(tf.exception.toString());
return;
}
Serial.print("expcted class 2, predicted class ");
Serial.println(tf.classification);
// how long does it take to run a single prediction?
Serial.print("It takes ");
Serial.print(tf.benchmark.microseconds());
Serial.println("us for a single prediction");
delay(1000);
}
Here's a brief explanation of the important parts of the sketch.
Import library and instantiate nn
// replace with your own model
// include BEFORE <eloquent_tinyml.h>!
#include "irisModel.h"
// include the runtime specific for your board
// either tflm_esp32 or tflm_cortexm
#include <tflm_esp32.h>
// now you can include the eloquent tinyml wrapper
#include <eloquent_tinyml.h>
#define ARENA_SIZE 2000
Eloquent::TF::Sequential<TF_NUM_OPS, ARENA_SIZE> tf;
We begin with importing the library and the exported model.
The only thing you need to define is ARENA_SIZE
. This value defines how much memory the model will be allocated. Finding the optimal value is a trial-and-error process because there's not an exact formula to apply. Larger values will work but leave less space to your own code. Smaller values will prevent the network to run correctly. I suggest you start with a large value (e.g. 10000) and decrease it until your model start throwing errors about tensors allocation.
Model configuration and initialization
// replace with the correct values
tf.setNumInputs(1);
tf.setNumOutputs(1);
// add required ops
tf.resolver.AddFullyConnected();
tf.resolver.AddSoftmax();
while (!tf.begin(irisModel).isOk())
Serial.println(tf.exception.toString());
These lines set the number of inputs/outputs and adds the required operations to the model.
If you used the above Notebook to generate the model code, you can remove these lines since they're automatically handled for you.
If not, when you created your model, you added a number of layer types (fully connected, conv2d, max pooling...). You have to add the exact same layer types here, otherwise your network won't run.
Once done, initialize the network by passing the exported model from Step 2.
Execution
// sample data
float x0[4] = {0.22222222222f, 0.62500000000f, 0.06779661017f, 0.04166666667f};
while (!tf.predict(x0).isOk())
Serial.println(tf.exception.toString());
Finally, you can execute the network by passing it an input vector.
The input vector must always be a vector, even if it's made by a single value!
After the execution, you can access the results with
tf.output(i)
where i
is the index of the output.
For example, the IRIS dataset has 3 labels (setosa, virginica, versicolor) and the model outputs 3 values, representing the probability of each class.
Since this is a classification task, you have access to tf.classification
, which returns the class with the highest probability.
To iterate over the results
for (int i = 0; i < tf.numOutputs; i++) {
Serial.print(tf.output(i));
Serial.print(", ");
}