Level up your TinyML skills

Project #2 Color Classification

Get familiar with data collection and pre-processing for Machine Learning tasks

Color Classification

This project is a clone of one of the first tutorial for TensorFlow on Arduino, which uses the Arduino Nano 33 BLE Sense built-in colorimeter to classify different fruits based on their color.

It is a pretty simple project, but we'll replicate it to get familiar with the Eloquent TinyML workflow for classification tasks for static datatypes.

Features definition

The Nano 33 BLE Sense colorimeter (or any other RGB color sensor like the TCS3200) can detect the RGB values of the object it is pointed at.

We will use these 3 values as input features for our Machine Learning model.

Even though many classifier will work fine with the raw feature values (in the range 0 to 255), we will introduce a pre-processing pipeline to show how you can modify the data to make it a better fit for later classification.

Data collection

Our dataset will be made of 3-4 fruits (or any other kind of object, really), each from a different color.

We'll take 20 samples of each object, from different angles and distances to make sure we capture all the variability of each color.

To create a robust model, it would be ideal if you could change the room illumination while capturing the samples, so that our model is insensitive to light variations too.

To collect the samples, load the following sketch to your Arduino Nano 33 BLE Sense and run the Python code below.

If you don't have an Arduino Nano 33 BLE Sense and want to use your own color sensor (e.g. TCS3200), update the get RGB values line accordingly.

// file Fruit_identification_data_collection.ino


#include <Arduino_APDS9960.h>


void setup() {
  Serial.begin(9600);

  // check if sensor is ok
  while (!APDS.begin()) {
    Serial.println("Error initializing APDS9960 sensor.");
  }
}

void loop() {
  int r, g, b;
  
  // await for sensor data
  while (!APDS.colorAvailable())
    delay(5);

  // get RGB values
  APDS.readColor(r, g, b);

  // print values to Serial
  Serial.print("RGB: ");
  Serial.print(r);
  Serial.print(",");
  Serial.print(g);
  Serial.print(",");
  Serial.println(b);

  delay(300);
}

The Python code below makes use of the SerialCollector class. It is a utility class that makes it really easy to collect data from your development board without having to manually move data around.

# !pip install everywhereml>=0.0.3
from everywhereml.data import Dataset
from everywhereml.data.collect import SerialCollector

"""
Create a SerialCollector
Each data line is marked by the 'RGB:' string
Collect 20 samples for each class.
Replace the port with your own!
"""
try:
    rgb_dataset = Dataset.from_csv('rgb.csv', name='Color', target_name_column='target_name')
    
except FileNotFoundError:
    rgb_collector = SerialCollector(
        port='/dev/cu.usbmodem141401', 
        baud=9600, 
        start_of_frame='RGB:',
        feature_names=['r', 'g', 'b']
    )

    rgb_dataset = rgb_collector.collect_many_classes(
        dataset_name='RGB', 
        num_samples=20
    )
"""
Save dataset to file for later use
"""
rgb_dataset.df.to_csv('rgb.csv', index=False)
"""
Print summary of dataset
"""
rgb_dataset.describe()
r g b target
count 60.000000 60.000000 60.000000 60.000000
mean 6.316667 5.200000 4.050000 1.000000
std 1.935324 0.776782 0.501692 0.823387
min 3.000000 4.000000 3.000000 0.000000
25% 4.000000 5.000000 4.000000 0.000000
50% 7.000000 5.000000 4.000000 1.000000
75% 8.000000 6.000000 4.000000 2.000000
max 9.000000 7.000000 5.000000 2.000000
"""
Plot features pairplot
"""
rgb_dataset.plot.features_pairplot()

Pre-processing pipeline

It is often the case that your dataset needs to be pre-processed in some way to achieve good classification results.

Many classifiers, for example, require that all the features lie in a given (or the same) range. If you were to classify a dataset of people where the height ranges from 0 to 2 m and the weight from 0 to 100 kg, your model would complain about such a difference.

More complex datasets will also have more complex pre-processing requirements and you will have to pipe multiple pre-processing steps to transform the raw values into meaningful features.

This is where the pipeline concept comes in.

Pipeline example from https://towardsdatascience.com/make-a-rock-solid-ml-model-using-sklearn-pipeline-926f2ccf4706

You can think of a pipeline as a list of steps that operate in waterfall: every step receives as input the output of the previous one.

They can perform any sort of manipulation to the data and no one is (or should be) aware of the others, nor has any dependency on them.

For the sake of this project, we'll use an minimalistic pipeline made of a single step: a MinMaxScaler.

A MinMaxScaler maps each feature from its original range (from 0 to 255 in this case) to a new range (usually from 0 to 1).

"""
Create a pipeline for feature pre-processing
"""
from everywhereml.preprocessing import Pipeline, MinMaxScaler

rgb_pipeline = Pipeline(name='ColorPipeline', steps=[
    MinMaxScaler()
])
"""
Apply the pipeline to our dataset
"""
rgb_dataset.apply(rgb_pipeline)
<everywhereml.data.Dataset.Dataset at 0x12ba6c7c0>
"""
Print summary of dataset
You can now see that the max value of r, g and b is always 1
"""
rgb_dataset.describe()
r g b target
count 60.000000 60.000000 60.000000 60.000000
mean 0.552778 0.400000 0.525000 1.000000
std 0.322554 0.258927 0.250846 0.823387
min 0.000000 0.000000 0.000000 0.000000
25% 0.166667 0.333333 0.500000 0.000000
50% 0.666667 0.333333 0.500000 1.000000
75% 0.833333 0.666667 0.500000 2.000000
max 1.000000 1.000000 1.000000 2.000000

Machine Learning model

For such a simple dataset, any Machine Learning model will work fine. In this example, we'll use a Decision Tree, since it's fast and needs little resources.

from everywhereml.sklearn.tree import DecisionTreeClassifier

rgb_classifier = DecisionTreeClassifier()
rgb_classifier.fit(rgb_dataset)

print('Score on training set: %.2f' % rgb_classifier.score(rgb_dataset))
Score on training set: 1.00

Port to Arduino

We need to port two piece of code to Arduino:

  1. the pre-processing pipeline
  2. the classifier

Both of them implement the to_arduino_file() function to perform this task.

"""
Port pipeline to C++
"""
print(rgb_pipeline.to_arduino_file(
    'sketches/ColorClassification/Pipeline.h', 
    instance_name='pipeline'
))
#ifndef UUID5027627216
#define UUID5027627216

namespace ColorPipeline {

    
        #ifndef UUID5027627408
#define UUID5027627408

/**
  * MinMaxScaler(low=0, high=1)
 */
class Step0 {
    public:

        /**
         * Transform input vector
         */
        bool transform(float *x) {
            
    for (uint16_t i = 0; i < 3; i++) {
        x[i] = (x[i] - offset[i]) * scale[i] + 0;
    }

    return true;


            return true;
        }

    protected:
        
    float offset[3] = {3.000000000f, 4.000000000f, 3.000000000f};
    float scale[3] = {0.166666667f, 0.333333333f, 0.500000000f};

};



#endif
    

     /**
      * Pipeline:
 * ---------
 *  > MinMaxScaler(low=0, high=1)
     */
    class Pipeline {
        public:
            static const uint16_t NUM_INPUTS = 3;
            static const uint16_t NUM_OUTPUTS = 3;
            static const uint16_t WORKING_SIZE = 3;
            float X[3];

            /**
             * Apply pipeline to input vector
             */
            bool transform(float *x) {
                memcpy(X, x, sizeof(float) * 3);

                size_t start = micros();
                bool isOk =
                
                     step0.transform(X)
                ;

                latency = micros() - start;

                return isOk;
            }

            /**
             * Debug output feature vector
             */
            template<typename PrinterInterface>
            void debugTo(PrinterInterface &printer, uint8_t precision=5) {
                printer.print(X[0]);

                for (uint16_t i = 1; i < 3; i++) {
                    printer.print(", ");
                    printer.print(X[i], precision);
                }

                printer.print('\n');
            }

            /**
 * Get latency in micros
 */
uint32_t latencyInMicros() {
    return latency;
}

/**
 * Get latency in millis
 */
uint16_t latencyInMillis() {
    return latency / 1000;
}

        protected:
            float latency;
            
                ColorPipeline::Step0 step0;
            
    };
}


static ColorPipeline::Pipeline pipeline;


#endif
"""
Port DecisionTree to C++
"""
print(rgb_classifier.to_arduino_file(
    'sketches/ColorClassification/Classifier.h', 
    instance_name='tree', 
    class_map=rgb_dataset.class_map
))
#ifndef UUID4945632512
#define UUID4945632512

/**
  * DecisionTreeClassifier(ccp_alpha=0.0, class_name=DecisionTreeClassifier, class_weight=None, criterion=gini, max_depth=None, max_features=None, max_leaf_nodes=None, min_impurity_decrease=0.0, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, num_outputs=3, package_name=everywhereml.sklearn.tree, random_state=None, splitter=best, template_folder=everywhereml/sklearn/tree)
 */
class DecisionTreeClassifier {
    public:

        /**
         * Predict class from features
         */
        int predict(float *x) {
            int predictedValue = 0;
            size_t startedAt = micros();

            
    
    if (x[0] <= 0.416666679084301) {
        
            
    predictedValue = 2;

        
    }
    else {
        
            
    if (x[0] <= 0.75) {
        
            
    if (x[2] <= 0.25) {
        
            
    predictedValue = 0;

        
    }
    else {
        
            
    predictedValue = 1;

        
    }

        
    }
    else {
        
            
    predictedValue = 0;

        
    }

        
    }



            latency = micros() - startedAt;

            return (lastPrediction = predictedValue);
        }


        

/**
 * Predict class label
 */
String predictLabel(float *x) {
    return getLabelOf(predict(x));
}

/**
 * Get label of last prediction
 */
String getLabel() {
    return getLabelOf(lastPrediction);
}

/**
 * Get label of given class
 */
String getLabelOf(int8_t idx) {
    switch (idx) {
        case -1:
            return "ERROR";
        
            case 0:
                return "yellow";
        
            case 1:
                return "pink";
        
            case 2:
                return "cyan";
        
        default:
            return "UNKNOWN";
    }
}


        /**
 * Get latency in micros
 */
uint32_t latencyInMicros() {
    return latency;
}

/**
 * Get latency in millis
 */
uint16_t latencyInMillis() {
    return latency / 1000;
}

    protected:
        float latency = 0;
        int lastPrediction = 0;

        
};



static DecisionTreeClassifier tree;


#endif
// file Fruit_identification_classification.ino

#include <Arduino_APDS9960.h>
#include "Pipeline.h"
#include "Classifier.h"


void setup() {
  Serial.begin(9600);

  // check if sensor is ok
  while (!APDS.begin()) {
    Serial.println("Error initializing APDS9960 sensor.");
  }
}

void loop() {
  int r, g, b;
  
  // await for sensor data
  while (!APDS.colorAvailable())
    delay(5);

  // get RGB values
  APDS.readColor(r, g, b);

  // print values to Serial
  Serial.print("RGB: ");
  Serial.print(r);
  Serial.print(",");
  Serial.print(g);
  Serial.print(",");
  Serial.println(b);
    
  // perform feature extraction
  float features[] = {r, g, b};
    
  if (!pipeline.transform(features))
      return;

  // perform classification on pipeline result
  Serial.print("Predicted color: ");
  Serial.println(tree.predictLabel(pipeline.X));

  delay(300);
}

Well done!

If you were able to follow the project from start to finish, you got a good overview of what a TinyML project looks like:

  1. define your features
  2. collect data
  3. train a model to classify that data
  4. deploy the model back to your board

These steps used to be very difficult to implement for those who are not expert C++ programmers, but now it's much easier thanks to the Eloquent ecosystem of libraries.


Notify me on next lesson!

Stay updated on the release of next lessons from the Eloquent TinyML Vol.1 course

We use Mailchimp as our marketing platform. By submitting this form, you acknowledge that the information you provided will be transferred to Mailchimp for processing in accordance with their terms of use. We will use your email to send you updates relevant to this website.

Having troubles? Ask a question

Need a one to one call? Book now!

© Copyright 2022 Eloquent Arduino. All Rights Reserved.