Daniel Huynh

Posted on Mar 29 • Edited on Apr 2

LaVague: Open-source Large Action Model to automate Selenium browsing

TL;DR

LaVague is an open-source project designed to automate menial tasks on behalf of its users. Many of these tasks are repetitive, time-consuming, and require little to no cognitive effort. By automating these tasks, LaVague aims to free up time for more meaningful endeavors, allowing users to focus on what truly matters to them.

Our GitHub
Our Discord
A Gradio demo to get started

The journey

Mithril Security started in 2021 in Paris, and started by open-sourcing BlindAI, an AI deployment framework leveraging Intel SGX secure hardware to deploy models on secure enclaves.

BlindAI enables the protection of both data and models, to guarantee the privacy of data sent to an AI provider, or the protection of the weights if deployed on premise.

BlindAI has been audited by Quarkslab in 2023 and was leveraged by the Future of Life Institute.

We have always been passionate about AI and privacy and have been firm believers in open-source for security, transparency, and trust.

I will not bore you with the many frameworks we have developed to make AI more privacy-friendly, but if you care, you can also have a look at:

BastionLab: a remote data science framework with access control built-in
BlindBox: a framework to easily deploy Docker images inside Trusted Execution Environments
BlindLlama: a BlindBox v2, supported by the OpenAI Cybersecurity Grant program, to deploy Kubernetes image on Azure instances with vTPM
BlindChat: a framework to chat with local models fitting in your browser using transformers.js

We also have shared our analysis of the LLM ecosystem, from the Total Cost of Ownership of AI models to hallucination detection, through memorization of private data with LLMs. Our goal with these resources was to educate the market to help them be onboarded on AI in general and for the privacy-sensitive customers, leverage our confidential AI stack.

All this was quite exciting to work on, but as a startup, we needed to find product-market fit.

It all started with a side project...

Before being CEO of Mithril Security, a privacy and security startup, I was an AI engineer by training and passion.

Since the rise of LLMs, I have been looking for occasions to explore its potential for a while, but due to my duties at Mithril, I have yet to be able to put in the time I wanted.

However, in early March 2024, I participated in a hackathon that featured LLMs for function calls. I really wanted to win the Apple Vision Pro, so I put in some effort to come up with a quick and dirty working demo. As I believe LLMs have the potential to automate mechanical tasks, like web browsing, I came up with a framework to automatically generate Selenium code to program a browser from natural language instructions.

I tried it, it worked, and voila! LaVague was born.

Because we have been firm believers at Mithril in open-source, after the hackathon, I decided to open-source our project. I first announced it with an initial tweet, and it took off!

Following that, we managed to make #1 on Hacker News.

Those events led to the explosive growth of our project:

After seeing that much enthusiasm for LaVague and talking to early users, we realized that this project has a huge potential to help developers in their automation journey.

After a (very) short exchange with my team, we realized that the opportunity to democratize automation with AI was too exciting and thrilling not to do, so we have decided to broaden our mission focus and allocate Mithril's resources to make LaVague the new standard to automate automation!

🌊LaVague: A new wave is coming

That’s where LaVague comes in!

LaVague is a Large Action Model framework whose goal is to automate automation. By leveraging LLMs under the hood, we make it easy to generate Selenium code to automate web interactions simply from human instructions.

You can see it in action below, where simple instructions are given to post on Hugging Face Social Posts:

You can play with it directly by using this Colab. You can also find our GitHub here.

Fun story: LaVague started as a hackathon project to win a Vision Pro in a local SF hackathon. While I unfortunately did not win the hackathon, I won much more than that: a Vision for automation!

We believe LLMs will not displace many people in the near future as they are not as flexible or intelligent as humans are and need to be for many jobs! However, with the proper engineering (prompt engineering, Chain of Thought, fine-tuning, etc.), they have great potential to help automate mundane tasks.

That is why our framework, LaVague, has an immense potential to empower human agents in their day-to-day tasks by letting an AI take care of the menial and mechanical tasks, like browsing a website for information or filling out forms. Instead, humans should focus on reasoning and planning and delegate the execution of mechanical tasks to machines.

Philosophy

Because we believe AI has the potential to profoundly impact our lives, such technology should be developed in the open.

That is why LaVague is an open-source framework, leveraging other open-source libraries, such as Hugging Face or LlamaIndex, under the hood. Because we want people to be able to have their own private LLMs to automate their tasks, LaVague natively supports both local and remote LLM calls to provide as much flexibility as possible.

Our key principle is that hackers hack for free. We want this to be a project by and for the AI community and beyond. All core components are developed openly, and we strive to guide this project to unlock the most value for the largest number.

Obviously, as a startup, we still need a monetization strategy. We have decided with LaVague to have a mix of open-core approaches where users will be able to use and modify LaVague at will, but some Enterprise features (security, compliance, audit, scalability, etc.) will be packaged and sold to the Enterprise market.

In addition, we will develop a hosted solution to make it easy for developers to easily get onboarded with LaVague.
‍

Roadmap

So now, what is coming next to LaVague?

Our end goal is to automate automation and provide the ultimate tooling for developers to easily program pipelines to automate menial tasks.

Our first focus is to solve web automation. As most interactions happen on the internet today, providing an easy solution to interact with web resources could greatly help reduce time spent on menial tasks.

Therefore, the initial efforts will be to develop the best framework to generate web pipelines, with a first focus on Selenium workflows. As Selenium is an industry standard, it will be the first solution we support, though others, such as Playwright, will be integrated.

We aim within three months to have both:

Created a decentralized and open dataset of web interactions to evaluate and train LaVague to ensure it properly generates Selenium code
Have a model with 95% accuracy on a representative dataset of internet interactions.
Some non-exhaustive elements part of this roadmap:

Fine-tuning a Gemma 7b for a better local model
Improving the retriever to have the right precision/accuracy when asked to find the relevant HTML of the current page
Have a Hub of functions created by LaVague
Integrate other frameworks, such as Playwright or Selenium IDE, with a browser plugin
‍
Conclusion

Mithril Security has seen a lot since its inception in 2021. Even though our initial focus on enclaves for AI has not borne the fruits we hoped for, it is still working at a steady rhythm with partners like the Future of Life Institute to make AI confidentiality and transparency a reality!

We have started a new journey with LaVague, more focused on unlocking the full potential of AI to automate automation!

If you are interested in contributing, asking questions, or proposing features, do not hesitate to contact us on Discord! If you want professional support in your adoption of LaVague you can also email us directly.

DEV Community

LaVague: Open-source Large Action Model to automate Selenium browsing

Top comments (0)

Read next

Top 10 Free Resources to Learn Web Development in 2024

Unlock Angular's Full Potential With These 5 RxJS Operators

JavaScript Testing and Automation: Ensuring Quality and Reliability in Your Code

JAMstack Architecture: The Future of Fast, Scalable, and Secure Web Development