Has Anthropic Claude just wiped out an entire industry?
If you have been following the news, you may have read about a new feature (or should I call it a product) in the Claude API - it is called Computer Use.
This technology is not just another nice feature of Claude, but could actually be the beginning of the end for many industries. But before we look at the implications, let's take a look at what Computer Use actually is.
Here is the official announcement from Anthropic regarding computer use:
https://www.anthropic.com/news/3-5-models-and-computer-use
But before we delve into it and its implications, I have to mention that controlling computers with AI is not really something new. There have been many projects that have gone in this direction using agents.
Some of the open source projects I have in my Github favourites:
https://github.com/e2b-dev/awesome-ai-agents
https://github.com/xlang-ai/OSWorld
These projects have a lot of potential to make this technology available to everyone, even on-premises using local models (the Claude solution is closed-source and currently relies heavily on Anthropic models).
However, as is often the case, the challenge is in the execution. The architecture of these agents may be similar, but it all comes down to the reasoning capabilities of the model.
If the model can reason well, then it can probably perform the tasks it is asked to perform with little or no user intervention.
Sound good? Let's have a look:
How it works
The idea is simple:
First, a client (such as a Python application) takes commands from the user and passes them to Claude along with a screenshot of the desktop environment.
Then Claude interprets the command and reads the image with the desktop information to determine what action to take based on the current state of the desktop.
The client then interprets the control commands from Claude and actually executes those commands with simulated mouse movements and clicks, acting as a human would.
Note, however, that Claude does not (and cannot) actually execute these instructions; it simply returns them as output tokens to the caller (the Python application in this case), and the application is responsible for executing them on the host machine (Linux, Mac, whatever).
Here is a quickstart client-application provided by Anthropic that uses a VM with Docker to simulate this behaviour. As you can see, the screen is divided into two parts. On the left you can chat and give instructions, and on the right you can see the actual robot behaviour.
Key Advancements
I have experimented a bit with RPA in the past, and more recently with Anthropic's Computer Use, and in my opinion there are two key differences:
First, unlike some RPAs that are tailored to specific scenarios, Computer Use is 100% generic and can work with any application and in any environment (if you have a mouse and a client that controls the OS).
Secondly, previous generic solutions did not have sufficient reasoning capabilities to perform complex tasks involving many steps, and most importantly were not able to recover when something unexpected happened.
With Computer Use in Claude, this seems to have improved significantly.
I tried it with some complex scenarios and it was able to execute them quite well.
Limitations
Despite the huge potential of this technology, there are a few limitations we need to be aware of.
The Token Limitations
Depending on the length of the task, you may hit a daily or minute token limit when using the Anthropic API. Keep in mind that the conversation is not just text, but your client must continuously send a screenshot of your desktop, which dramatically increases the number of tokens used.
Note that the limits are much higher if you are using Bedrock rather than the Anthropic API. More information on current limits can be found on the provider's website.
Cost
Depending on the task you want to perform, you will also need to consider the cost. As with the token limits, they increase massively with the number of screenshots you take. Of course, this depends a lot on the model you use and the caching you do.
Note that you can save a lot of money by using context caching, which is currently only available if you use the Anthropic API directly, not through AWS Bedrock. Context caching saves up to 90% of the cost of input tokens, so it is worth considering if the token limit is not an issue.
For more information on current pricing and context cache pricing, please visit the Anthropic website.
Speed of execution
Using the Computer Use requires a lot of multimedia interaction with the Claude API, including the transfer of images. This has a significant impact on bandwidth, but also on the speed of execution of these tasks. For recurring tasks that are highly structured and can be handled by an API, using Computer Use would not be the solution.
Anthropic Guardrails
Anthropic has strict guardrails on its models and does not allow every action the user requests. For example, it will not log into a website if you give it a username and password. It may also be reluctant to open certain applications or perform certain actions in some websites or applications. This can make or break your use case if you are relying on an entirely autonomous solution because of its unpredictability (it might work one time and fail another). For this reason, a human in the loop may be necessary at this stage of development.
Top comments (0)