30 December 2019, 16:57
Years ago, I watched a video of a Japanese research team building a robot that would play the simple game of rock-paper-scissors. This bot is nearly incapable of losing because it is mind-blowing fast at analysing and predicting your moves and playing the counter move. (YouTube source). It is so cool that it has stuck with me since.
Every year ML6 organises an in-house Christmas project week where employees can pitch ideas we will try to build in one week. We focus on building impressive AI demo’s to showcase the in-house skills and knowledge of ML6 during this week.
Xander Steenbrugge and I went with the idea of a bot that plays the game of rock-paper-scissors, not by learning with every practised move and predicting yours, but by ‘cheating’. This can be done by (ab)using its potential computing speed to identify a human opponent’s move and responding with a counter move before the human opponent even realizes what has happened.
As humans are not likely to notice a small delay (50ms), a bot’s response can be imperceptibly delayed until the final human action occurs. This implies that there’s merely a very limited window to capture a picture of the initiating human move and classify it correctly before making a counter move (which is also nicely illustrated in the research that inspired this project). The quicker we capture the frame to make a classification, the faster the response. Nevertheless this is not without the risk of being too quick at making a decent classification of the opponent’s move, which inadvertently results in a wrong countermove. Speed is of the essence in this whole setup.
Illustration of how the Robot delays its action until the human opponent starts making a move before trying to counter it:
We divide the build in 2 modules:
We order some parts online to start building the bot. One of our biggest concerns is finding the right camera as the capture-rate and image-quality is crucial to this build. We order a couple playstation-eye cameras because it can capture at 60 fps with a resolution of 640×480. This camera is quite popular in the maker world for its properties and low price.
Since we built the demo before you could get EdgeTPUs, we ordered a Movideus Neural Compute stick . The stick is optimised for interference on images and we were very excited to test it out. The idea is to use this with a raspberry PI or a ASUS Chromebox we have lying around the office.
The first step is to make sure our hardware is up to the task. We set up a demo using our camera and the movideus stick. We run a couple of the example apps provided by the open-source Neural Compute Application Zoo (ncappzoo). Most notable during this first demo is the streaming image classifier performing surprisingly well given the limited setup.
One of the bigger challenges is — needless to say — having no data to train the model. We thus built a setup that allows us to collect labelled data by telling the person what to play and then recording small videos of their moves. This way we can quickly develop a small dataset of already labelled data by extracting multiple frames from the recorded videos. We encourage (force) all our colleagues in the office to participate in the data collection. About 10 people participated which resulted in roughly 1500 labelled images.
We also try to add as much variation as possible, such as left vs. right hand, different sleeve colours and vertical vs. horizontal positions.
This self-built dataset is still very limited so we apply a lot of data augmentation techniques such as warping, rotation, altering contrast, lightning etc.
We quickly learn that the background has a significant impact on the robustness of our model, such as dark shadows and objects occurring in the picture. So we go even further in augmenting the data by swapping the (white) backgrounds with images found online.
Owing to the fact that — at its core — the problem we need to solve is an image classification one, we start by applying transfer learning using some well established models. We try ResNet and Inception, for example, which shows great accuracy but the inference time on the Movideus stick is just too slow (> 300ms). We also try using a really slim model such as SqueezeNet, which is faster ( ~ 40 ms), but then the accuracy is lacking.
Since we only have limited output classes (Rock, Paper or Scissors) we opt for creating our own small custom network (4 layers and one dense layer). We get great results after a couple of iterations. The inference time on the Movideus stick is only ~25 ms which means we can classify about 40 frames per second.
We get great results in our confusion matrix, the only confusion arises between paper and scissors . Some players play the game using their hand vertical which most likely makes it harder to distinguish different output classes (mainly paper vs scissors) using a downward looking camera position.
Having successfully trained a model we implemented it in a small demo using openCV. The demo would capture frames, do a little bit of preprocessing on the frames (resize the image and correct the colors), classify the image and respond within roughly 40 ms. Which is a great result. So went ahead and created a Physical demo.
We wanted to build a real robotic hand that can perform rock, paper and scissors moves at a fast pace. So you can play against a real opponent instead of just a screen. After failed experimentation with servo’s (to slow), we choose to use compressed air to drive the fast movements of the fingers on our robotic hand.
The fingers are adapted designs from a robotic hand model as this design allows us to stretch and bend a finger with only one movement. All parts are printed on an Ultimaker 2. The rest of the materials are leftovers found around the workshop. We control the pneumatic actuators using an arduino and a servo controller.
We design the hand with only 4 actuators that will drive the fingers, the wrist rotation and the up and down movement of the hand.
Having the robotic hand and our performant model we can now complete our build
Overall the build was a great success, we managed to get amazing results with a limited hardware setup and little data. If you are interested in trying out this demo you can visit us one of our many events or our office in Ghent.
Special thanks to Xander Steenbrugge for being a great teammate on this project.