Four technologies which tell us: “It’s the right time to bring intelligence to microcontrollers”
Insights
Neural networks have become state of the art technology for data scientists in recent years. At the same time, data science vastly grew in importance for product releases. It not only helps gain market insights, but also enables new customer values such as predictive maintenance, self-adjusting control loops or object recognition. To deliver the best customer experience, we often must provide customer value with low latency, at low cost, and regardless of the current cloud connection state. To meet these expectations, it is important to run neural networks not only in the cloud, but also on the edge. There are several definitions of “the edge”: It may range from kubernetes clusters and Content Delivery Networks to simple microcontroller-based devices. On edges with limited computing power, the delivery and execution of neural networks becomes an increasing challenge.
Since 2019 new technologies and libraries have been published, which help developers overcome these challenges.
- Tensorflow lite
- STM 32Cube MX and STM32Cube.AI
- Matlab embedded coder toolbox
- NNoM
- AWS IoT Greengrass
This blog post provides an overview of how the technologies work and concludes which one is most promising for us. It is written for firmware developers, data scientists and machine learning (ML) engineers who want to deploy and run a pre-trained model on a microcontroller.
Introduction to neural networks on microcontrollers
It is still a challenge to run neural networks on an embedded device. In this article, we define “embedded device” as follows: The device runs a microcontroller with up to 1000 DMIPS, with on-chip RAM (<1 MByte ) and Flash (<10 MByte) and limited use of external memory chips. These embedded devices do not run systems with preemptive operating systems such as Linux or Android. The focus is on devices with a collaborative operating system or none at all.
Neural networks can learn – in machine learning terminology, this is called “training a neural network”. This training allows to improve automated decision making, but it requires high amount of data and high computation power. Therefore, the function to train should ideally be performed in a scalable cloud environment. Also, the cloud environment allows continuous training during product live time.
These lead to a first quality criterium for the application of neural networks to edge devices: It should be possible to update a deployed network. To avoid a firmware update on each new training cycle, we should separate what is code (firmware) and what is data (weights) which we can easily exchange.
Neural networks are composed of connected neurons arranged in a number of layers. The connection between two neurons is defined by its weight, a numeric value calculated during training.
To describe a neural network, we need two elements: The first one is its architecture. The architecture defines the number of neurons, their distribution over layers and what connections exist. This description element does not change if we retrain the network. The second element is composed of the calculated weight values, also called ‘parameters’. They do change if we retrain the neural network.
On an embedded device, the network architecture is only changeable if we perform a firmware update.
On the other hand, the weights are exchangeable by normal data transmission, a firmware update should not be required.
Therefore, it is essential to separate these two description elements.
Introduction to development process of neural networks
The following diagram shows the design and deployment process of a neural network which should run on a microcontroller.
For design and training of neural networks, data scientists use tools such as Matlab, Tensorflow, Keras, ONNX, Caffe, Lasagne or ConvNetJS. Depending on the tool used to design the neural network, the output file format may differ. However, they always include the neural network architecture and the weights. In the Generate C/C++ Code step, a tool reads the network architecture and converts it into C code. Next, the C code is compiled and downloaded to the microcontroller. Also supplied with the weights, the microcontroller now can run the neural network.
The quality criteria
Training: Training of neural networks consumes high amount of data and high computing power. On microcontrollers, the computing power and the data storage is limited. Training neural networks solely on the microcontroller therefore is reasonable only in case of reinforcement learning, where learning and decision making are tightly coupled. For all other types of training, the neural network should be trained in the cloud and the trained model delivered to the microcontroller
Re-training and continuous delivery: The capability to continuously train on new data delivered from the sensors or other data sources to the cloud infrastructure. The re-trained neural network should ideally be delivered automatically to the microcontroller. It is important to keep down-times and download times short.
Avoid long-term effects: Long-term effects such as heap accumulation in the firmware can lead to unwanted system states. To prevent the system running out of free memory, it is recommended to avoid use of heap whenever possible. The architecture of the neural network will define the execution sequence and memory footprint of the algorithm in the microcontroller, the weights determine the results of each execution and do not impact the memory footprint and the execution sequence.
Thus, by separating the neural network architecture from the weights, it is possible to update the neural network with the latest training results without a firmware update.
Common tools: Provide an easy to use interface for data scientists and machine learning engineers. It should be easy to simulate, test and deploy a new model.
Flexibility: It should be possible to define different neural network architecture for use in different microcontroller architectures. Moreover, machine learning models for use in microcontrollers should not be limited to neural networks. Ideally, it should be possible to generate models of the following types:
- Any type of linear or logistic regression
- SVM with different kernel types
- Tree ensembles (random forests, gradient boosted trees)
- Neural networks
- …
Comparison of solutions
We have identified the following competing tools for the conversion from neural network descriptive language to C code.
Tensorflow Lite
Tensorflow is a tool which is frequently used to describe and train neural networks in the cloud. Tensorflow Lite is designed to deliver the trained models to the edge (to different types of microcontrollers and microprocessors). It runs under the same open source license as Tensorflow.
The following illustration shows how it works:
The Tensorflow Lite converter is a static library integrated in the firmware. During operation, the Tensorflow Lite converter receives a pre-trained model. It creates a buffer called “FlatBuffer” which is used by the interpreter. The interpreter runs the neural networks according to the description in the “FlatBuffer”. An advantage of this approach is that the firmware is only generated once and the model can be replaced during operation. While this is a clear advantage for the developers of the model, firmware developers dislike this behavior: It will result in dynamic memory and computation effort during operation. In return, this increases the need for a sophisticated memory management and task scheduling.
Tensorflow and Tensorflow Lite currently support many different types of neural networks, but do not support other machine learning models such as linear regression.
Matlab Coder and Matlab Embedded Coder
In addition to Python and R, Matlab and Octave are the preferred tools for data scientists and machine learning engineers. Matlab offers toolboxes to automatically generate C code from Matlab code. To generate optimized C code for embedded devices, the Matlab toolbox “Matlab Coder” and “Embedded Coder” is required.
The following illustration shows how it works:
The Embedded Coder toolbox generates C code from .m files or from Simulink models. The generated C code depends on the functions used by the Matlab developer. The generated C code can use both purely static or dynamic objects. The Matlab coder can be used for any type of machine learning model. The interfaces to the created model can be defined by the developer in the function definition in the .m files. Depending on this definition, the developer could create a parser/interpreter pair as in the approach defined by Tensorflow Lite or choose an approach as sketched in the diagram above.
The initial definition of the neural network requires the Matlab toolbox, which is a closed source and licensed piece of software. This can increase development costs. The continuous training of an existing model can be performed without the embedded coder toolbox and will therefore not result in additional costs. It is important to note that this statement depends on the actual implementation and whether the weights have been separated from the neural network architecture. In case the neural network architecture needs to be updated and changed, the access to the toolbox will be required again.
The generated C code can be compiled for any microcontroller and can be used to compile new firmware version whenever needed.
STM32CubeMX and STM32Cube.AI
STM32CubeMX is a graphical tool that allows configuration of STM32 microcontrollers as well as the generation of the corresponding initialization C code. The extension STM32Cube.AI enables conversion of trained neural networks to C code. These tools from ST offer a graphical user interface and a command line interface for automation. Cube AI also interfaces with machine learning frameworks such as Tensorflow Lite, Matlab and PyTorch. The tool only generate code for use in ST-Microcontrollers.
The following diagram shows how the source code is generated and run:
Cube AI generates several .c and .h files and finally supplies a static library “*.a” file which can be used to compile the source code. Whenever the weights or the neural network architecture need to be replaced, the firmware must be recompiled and deployed. Because the “*.a” file is not open source, it is not clear whether it is possible to update the weights during run-time.
Neural Network on Microcontroller (NNoM)
NNoM is a high-level inference Neural Network library specifically for microcontrollers (cf. https://github.com/majianjia/nnom ). The following diagram shows its architecture:
NNoM is free and open source under the Apache 2.0 License. NNoM projects consist of two programs: A Python program which runs the NNoM converter and the C program which runs the actual neural network. The Python program takes as input a trained Keras model and a test data set. The output will be a “model.h” file. This file is then interpreted on the target device by the local backend (part of the NNoM Framework).
It is laudable that the “model.h” file is generated along with a memory consumption footprint:
The approach shows similar characteristics to Tensorflow Lite.
AWS IoT Greengrass with lambda functions
AWS Greengrass with lambda functions assumes that the edge computing power and the available memory is sufficient to run lambda functions. The lambda function requires that a runtime for NodeJS, Python, or even Java. As microcontrollers with limited memory cannot run such virtual environments, AWS IoT Greengrass does not meet our needs.
Conclusion
A brief view into the following four technologies convinced us that it is the right time to deploy and run neural networks on microcontroller. These technologies seem to be well maintained and their output artifacts are focused on microcontroller with limited computing power and memory.
- Tensorflow lite
- STM 32Cube MX and STM32Cube. AI
- Matlab coder and Matlab embedded coder toolbox
- NNoM
Each of these technologies comes with different capabilities, complexity and with different terms of use. Which technology is the right choice depends on the following aspects:
- The type of microcontroller
- The flexibility to use dynamically allocated memory in the firmware
- The preferences and experiences of the team
- The need for continuous training
We see the Matlab Coder toolbox as the most promising technology because it offers highest level of flexibility which allows us the use it in different projects.
Leave a comment and tell us about your experience with neural networks on microcontrollers.
Authors: Aaron Riedener, David Savi
Co-author: Simon Kurmann
Links:
https://www.st.com/en/embedded-software/x-cube-ai.html
https://github.com/majianjia/nnom
https://aws.amazon.com/de/greengrass/
https://ch.mathworks.com/de/products/matlab-coder.html
https://www.tensorflow.org/lite
Copyright Notice:
This blog contains third party pictures requiring the following notices:
Copyright 2020, Owner: Jianjia Ma, Project: https://github.com/majianjia/nnom
Licensed under the Apache License, Version 2.0 (the “License”);
you may not use this file except in compliance with the License.
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0