(Note: these posts are migrated from my previous medium.com blog)
I first learned about Google’s Cloud Vision API at this year’s Google I/O. Though it’s been out in beta since 2015, I had not heard of it, nor had the chance to try it out till today. I came across this blog post and was intrigued by the YouTube demo:
As always, I have an Intel Edison lying around so I decided to give it a try.
Before you begin:
Make sure your Edison has been updated to the latest firmware and has Wi-Fi setup, use the setup/configuration tool found here to do so.
You will also need a Google Cloud account with the Vision API enabled. Follow these instructions here to do so before proceeding.
Things you’ll need:
Intel Edison w/ Arduino Breakout Board (You could also use the mini breakout but you might need a USB adapter to connect a webcam
Logitech C270 Webcam (Any other USB webcam supported by Linux UVC drivers would work too)
Power Supply
Here’s how it’s all connected:
Let’s go!
For the USB webcam to work, make sure UVC drivers are installed and enabled; you can find instructions here on how to do that.
-
Install ffmpeg. Git clone the edi-cam repository and run the shell script to install ffmpeg:
root@edison:~# cd /edi-cam/bin root@edison:~# ./install_ffmpeg.sh
-
Install gcloud. This is the Google Cloud NodeJS module that allows you to easily use Google Cloud APIs.
root@edison:~# npm install gcloud
Copy over your service account key JSON created during setup (scp/sftp). You can create a new one here if you’ve lost it.
Run the code! Copy & Paste this snippet into VIM or transfer the file over:
-
root@edison:~# node capture.js
Results
Here’s the image that was captured by my webcam:
And here’s the returned JSON:
root@edison:~# node capture.js
[ { desc: ‘cartoon’, mid: ‘/m/0215n’, score: 85.945672 },
{ desc: ‘machine’, mid: ‘/m/0dkw5’, score: 74.98506900000001 },
{ desc: ‘robot’, mid: ‘/m/06fgw’, score: 69.911 },
{ desc: ‘gadget’, mid: ‘/m/02mf1n’, score: 67.246151 } ]
…I thought that was pretty cool :)
The Google Cloud Vision API actually has a lot of other powerful features, including analyzing emotional facial attributes, text extraction & detection, and detecting any [faces, landmarks, labels, logos, properties] in your images.
Vision capabilities perfectly complement robotic applications (e.g. a drone that tazes you if you’re not smiling, a spray paint bot that corrects graffiti grammar, etc.). I can’t wait to see what kind of cool things people will make with this!
Top comments (0)