This article is part of #25DaysOfServerless. New challenges will be published every day from Microsoft Cloud Advocates throughout the month of December. Find out more about how Microsoft Azure enables your Serverless functions.
Have an idea or a solution? Share your thoughts on Twitter!
Do you want to allow your app to see, hear, listen, speak and even to begin to reason? No worries! You don't need to have a degree in Machine Learning. Nowadays, you can choose betweeen different AI services. One of them are Azure Cognitive Services, available as APIs, SDKs and services. They help developers build intelligent applications. Cognitive Services are grouped. Each Group support different, generalized prediction capabilities. Computer Vision is the service to choose to process and analyse images.
The usage of the Computer Vision API allows to analyse and describe images, to detect objects, recognize texts and more. You can either upload your image to the service or pass the image url.
In this blog post I want to demonstrate the usage of the Cognitive Services API by solving one of the #25DaysOfServerless challenges.
Here in Munich, Germany, Felix is excited to go to a traditional Weihnachtsmarkt, a holiday market! He keeps texting photos to his friend Anna about all the fun things he's doing: drinking hot mulled GlΓΌhwein, going ice skating, shopping for presents. But Anna can't find her glasses, and can't clearly see what's in the pictures!
For today's challenge, Anna needs a service that, given an image, describe the image and gives some keywords about what it contains.
To solve this 15th challenge we need to:
- Find an image randomly or by keywords using an image API.
- Process and Analyse the given image to give a matching description for the given picture.
- (Optional) Display the image with the caption and keywords.
Author's Solution π₯
This challenge can be seen as a continuation of challenge 7. Feel free to extend your own solution or you can use the author's solution which is written in Java.
We will use Cognitive Services and integrate Computer Vision to calculate the description and tags for the images. Therefore, enter the portal
and search for Computer Vision.
The free Tier allows you to execute 20 Calls per minute or 5000 per month. That is way more than we need for the challenge.
All you need is the Key1 and your API Endpoint that you can find in the portal at Quick start
.
In the source code, all we need to do is to authenticate with the generated keys and then let the API do it's work π±βπ. Therefore, think which VisualFeatureTypes
are interesting for the specific usecase as we want to keep it as kissπ as possible. For this challenge, we only need the description and tags, explaining what you can see on the image passed. To receive the image analysis call the Computer Vision service.
// Get environment variables and authenticate with computer vision
String CV_KEY = System.getenv("COMPUTER_VISION_SUBSCRIPTION_KEY");
String CV_API = System.getenv("COMPUTER_VISION_ENDPOINT");
ComputerVisionClient compVisClient = ComputerVisionManager.authenticate(CV_KEY)
.withEndpoint(CV_API);
// This list defines the features to be extracted from the image.
List<VisualFeatureTypes> extractDescriptionAndTags = new ArrayList<>();
extractDescriptionAndTags.add(VisualFeatureTypes.DESCRIPTION);
extractDescriptionAndTags.add(VisualFeatureTypes.TAGS);
// Call the Computer Vision service and tell it to analyze the loaded image.
ImageAnalysis analysis = compVisClient.computerVision().analyzeImage().withUrl(imageUrl)
.withVisualFeatures(extractDescriptionAndTags).execute();
Good to know: What we will get in return are not only the captions and descriptions but also the matching confidence values. That can be very useful for scenarios where we do need to know with specific certainty whether the random image displays, for instance, a Christmas tree or it does not.
For this challenge, we will just print our findings with the confidence values in the console to satisfy our developers curiosity.
// Display image captions and confidence values.
System.out.println("\nCaptions: ");
String captionString = "";
for (ImageCaption caption : analysis.description().captions()) {
captionString = caption.text();
System.out.printf("\'%s\' with confidence %f\n", caption.text(), caption.confidence());
}
// Display image tags and confidence values.
System.out.println("\nTags: ");
StringBuilder keywords = new StringBuilder();
for (ImageTag tag : analysis.tags()) {
keywords.append(tag.name());
keywords.append(" ");
System.out.printf("\'%s\' with confidence %f\n", tag.name(), tag.confidence());
}
Finally, we return the image with the matching caption and keywords to complete the challenge. β¨
return request.createResponseBuilder(HttpStatus.OK).body(String.format("Search for image with keywords:%s. You can see on this picture: %s. With the keywords %s. Got url: %s", resultText, captionString, keywords, imageUrl)).build();
In Conclusion
We learned what Cognitive Services are and looked in particular into the artificial comprehension of images.
Therefore, we have seen an example how to use the Computer Vision Service.
- Register your service to receive your API keys.
- Create a list of visual feature types you want to extract.
- Analyze the image. π±βπ€
Want to submit your solution to this challenge? Build a solution locally and then submit an issue. If your solution doesn't involve code you can record a short video and submit it as a link in the issue desccription. Make sure to tell us which challenge the solution is for. We're excited to see what you build! Do you have comments or questions? Add them to the comments area below.
Watch for surprises all during December as we celebrate 25 Days of Serverless. Stay tuned here on dev.to as we feature challenges and solutions! Sign up for a free account on Azure to get ready for the challenges!
Top comments (0)