With CLIPs impressive Image Classification capabilities, wouldn’t it be nice, to use them on embedded devices?
But getting Pytorch and other dependencies to run on small ARM CPUs has to be a major challenge, or?
And pretrained CLIP is certainly not a model made for embedded devices, like MobileNet. This can’t fit into memory, can it?

Lets dig out the old RaspberryPi 3 from some half forgotten box and order a camera module on Amazon for < 10€.

RaspberryPi 3 with Camera Module

The capture is actually quite impressive for a cheap, embedded solution. Lighting might need some improvements, but resolution and focus should provide good material for detection.

Photo Capture of the 8MP Camera

Fast forward pytorch & clip setup steps...

Let’s try to get some CLIP demo code running on the Pi:

CLIP (RN101) running on RaspberryPi

And now zero shot predict the (resized) camera output:

CLIP (RN50) running on RaspberryPi

Next up: image classification on real time stream from the camera. Let’s see some latency measurements for clip.