Best Codes

Posted on Sep 17, 2024 • Edited on Sep 24, 2024

Top 5 AI Models YOU Can Run Locally on YOUR Device! 🤯

#ai #security #machinelearning #tutorial

What's up, folks? Did you know you can run an AI model on YOUR machine?! Let me explain.

Most AI models are run on private servers far away (unless you happen to live near your server). This is because it takes a LOT of power to run an AI model — well, you can run some AI models on your personal device, but there is no guarantee that everybody has a device strong enough. Plus, for companies like OpenAI, running the AI on a big server makes it a lot faster for their users as well.

Most companies that offer AI services use an AI API rather than run the AI models themselves. After all, GPT-4 and Claude-3.5-Sonnet are some of the highest quality AI models, but both OpenAI and Anthropic (Claude) have not made these models open source, so they cannot be run locally.

So, why not just use a chat website like ChatGPT.com to get access to powerful models?
Hmm. I should be clear. This article does not tell you how to run models as powerful as GPT-4 on your device. Most devices can't. This article is about how to run alternative open source models on your device, easily and efficiently.

Back to our question of 'why not just use ChatGPT?'. Here are a few reasons:

Security. Most AI APIs don't really provide any form of encryption. You need to provide an API key to use the API, but providing an API key does not magically encrypt your data.

Many AI chats connect to a data stream, send your message as plain text (unencrypted) to a server, and then stream the AI's response, unencrypted, back to you.

Stability. Most AI companies have reasonably stable chat interfaces. For example, OpenAI's website, https://chatgpt.com/, has a great uptime — 99.7%. The most likely reason that you wouldn't be able to access AI would be because of an internet outage or loss of internet access. AI models do not require an internet connection, but connecting to a server where one is running does. So if you lose your internet connection, you lose AI access as well… unless you run AI models on your device. In which case your device is the server, in which case internet doesn't matter.
Privacy. Many AI companies lack good privacy. Some collect your chat data to train AI models on (commonly euphemized as 'collecting telemetry'). Others have a human team manually review user chats, to gather insights about their AI models.

Google Gemini Privacy — See https://support.google.com/gemini/answer/13594961

Running models locally addresses each of these issues quite well.

Running an AI model locally

It's pretty easy for a developer to run an AI model locally using the CLI, for example with Ollama or a similar service. But since this article has both the developer and non-developer audiences in mind, I'll be using an easier method, with an intuitive UI.

There are a number of options, such as Alpaca (Linux only) or LM Studio (very slow), but I'm choosing GPT4All by NomicAI, due to its cross-platform support and ease of use.

Download GPT4All

If you're going for privacy, be sure to opt out of any 'Telemetry' or 'Datalake' settings when you set up the app (you can change them later in settings if you miss it).

Now, you'll need to download some models to run, which is what this post is all about! Go to the models tab, then click 'Add model' in the top right.

Now, let's get downloading!

⚠️ Warning!

AI model files are often VERY large. Many models I'm recommending are in the 2-6 GB range, so if your computer is a bit old, check how much space you have before downloading. If you want to test lots of models, you might remove ones you don't use before trying another.

Note: Since the model search currently is not working for default models, I reference the position of the model in the model list throughout this article. These positions may have changed since this article's publication date.

1. Nous Hermes 2 Mistral DPO

This is a great overall model. It's fairly fast, fine-tuned, and reasonably knowledgeable.
The model has about 7 billion parameters.

File size: 3.83 GB
RAM required: 8 GB
Quantization: q4_0 (suitable for older systems)

You won't have to look too hard to find this model. It's #2 on the list. (You shouldn't need to use the search):

Click 'Download' (not shown in the image; far right of the model card) and wait for the model to be downloaded and installed for future use.

2. Small or Old Devices: Qwen2-1.5B-Instruct

This model isn't the sharpest knife in the drawer 😒. But if your device is not super powerful, this model is a great choice. It only has 1.5 billion parameters, although it is good at following instructions or interpreting data. All that sounds pretty nice, but this model is also very prone to hallucination — the word we use for when an AI tells a lie (since it doesn't make the ethical choice to lie). Check out the image below to see what I mean.

File size: 0.89 GB
RAM required: 4 GB
Quantization: q4_0

This model is farther down on the list. To find it, just scroll down to the very bottom of the model list — don't search anything — and it should be there.

3. Llama 3 8B Instruct

This model is larger than other models suggested so far. Llama models are open-source and usually pretty 'smart'. They also have a very friendly personality and high quality training data. This particular model has 8 billion parameters.

File size: 4.34 GB
RAM required: 8 GB
Quantization: q4_0

This model is the first one on the list!

4. Mini Orca (Small)

This model is great at explaining, fairly small, and fast. It is very prone to hallucination, particularly in regard to math problems. I'd recommend this as an informational model rather than a chat model.

File size: 1.84 GB
RAM required: 4 GB (great for older systems)
Quantization: q4_0

Since this is a default model, searching for it won't rank it higher. Scroll down to the bottom of the model list, then go up to the three models, and you should see it.

5. Mistral Instruct

This model is a great model in general, and has licensing to be used commercially. It also doesn't have ethical limitations, so it will help you with anything — even naughty things.

File size: 3.83 GB
RAM required: 8 GB
Quantization: q4_0

This model is the third model in the default list.

Now that we've downloaded a model or two, let's talk to one! This is pretty straightforward.

Chatting with a model

Click the chats tab in the sidebar:
Click the 'New Chat' button at the top of the sidebar.
Load a model. You can do this easily by click the load default model button, or choose a specific one in the top bar.

Now, you're all set to chat! After the model loads, send it a message and see how it goes. Try a smaller model to start out (Qwen, for example) for your first test.

Of course, while running AI models locally is a lot more secure and reliable, there are tradeoffs. For instance, local AI models are limited to the processing power of your device, so they can be pretty slow. They also aren't as 'smart' as many closed-source models, like GPT-4. Running models locally is not 'better' than running them in the cloud. It depends on your use case and preferences.

Well, thanks for reading!

Article by BestCodes. No content in this article was generated by AI, excepting images which depict the text output of AI models.

Check out my next post here!

3G Cell Service Has a HUGE Security Flaw.

Best Codes ・ Sep 24 '24

#mobile #learning #security #tutorial

Top comments (44)

Shrijal Acharya • Sep 18 '24

Good one buddy! What do you use to run these models locally? Is it Ollama?

harshit_lakhani • Sep 21 '24

You should try llmchat.co – it offers the best UI for interacting with local models using Ollama.

Best Codes • Sep 21 '24

Looks neat, I'll check it out!

Best Codes • Sep 18 '24

Thanks! I run the models with GPT4All (in this article). I also use Ollama, or the Alpaca UI for Ollama (Linux only).

Shrijal Acharya • Sep 19 '24

Oh, I missed where you mentioned Ollama and GPT4ALL. I just skimmed through the list.

Best Codes • Sep 19 '24

I've heard LM Studio is great as well. I'm gonna check it out! :)

TechFan71 • Sep 18 '24 • Edited

Thank you!
Another option to run a LLM locally is LM Studio. It is free for personal use, Linux, Mac and Windows versions. It provides a List with short description of the supported models, which can be downloaded with a mouse click. You can also switch between them with a mouse click.
[lmstudio.ai/]

Best Codes • Sep 19 '24

@techfan71, @robbenzo24, and @recoveringoverthinkr I tested LM Studio today. The UI was nice and very intuitive, but at the cost of speed. GPT4All was much faster, less laggy, and had a higher token per second output for the same models.

Plus, any features of LM Studio, such as easily switching models, starting an AI server, managing models, etc. are also in GPT4All.

Overall, I'd recommend GPT4All to most Linux, Windows, or macOS users, and Alpaca to users with small PCs.

Thank you all for your feedback! :D

recoveringOverthinker • Sep 19 '24

Thanks! You're awesome! I'll pass this along to my coworkers.

Best Codes • Sep 19 '24

Glad I could help! :)

TechFan71 • Sep 19 '24

Thank you for the comparison, I will try GPT4All with Linux.

Best Codes • Sep 19 '24

👍

recoveringOverthinker • Sep 19 '24

good someone beat me to mentioning LM Studio. I haven't checked it out but some folks at work have recommended it.

Best Codes • Sep 19 '24

I'm testing it today 🔥

Rob Benzo • Sep 18 '24 • Edited

Very cool never heard of it
Is it basically a UI for ollama?

TechFan71 • Sep 18 '24

Rob Benzo • Sep 18 '24

oh cool, will check it out!

Best Codes • Sep 18 '24

I've seen that one as well! Thank you for your feedback. :)

Bala • Oct 12 '24

Hi, I see folks commenting on using different models but I couldn't find anyone reporting results after trying one or more models from the article, with my limited time reviewing the comments.

I did try two of the models (#1 Nous Hermes 2 Mistral DPO & #3 Llama 3 8B Instruct) and my experience is not good. With a 31GB RAM, the queries were taking longer time than I thought to respond; but the main issue GPT4All is that it does a poor job when I tried to chat with my local files using "LocalDocs". Anyone had different experience?

Best Codes • Oct 15 '24

Most models you can run locally are pretty weak. Not much you can do. If you want to run a better model, get a better device, use an API, or an AI server.

Amit Giri • Nov 16 '24

024545

Best Codes • Nov 16 '24

Noah • Sep 19 '24

I got a machine w/256 gb of ram 18 cores & 10tb of disk space. Got any models you can recommend for machines w/more memory?

Best Codes • Sep 19 '24

Wow, nice! 😲

I'd recommend this model here; it's a bit larger:

You can also try the Llama 3.1 8B or 70B parameter models (just search Meta-Llama-3.1-8b or Meta-Llama-3.1-70B).

If you think you can handle more, try the Meta-Llama-3.1-405B model — it's very large and powerful; one of the best open source models out there.

Noah • Sep 19 '24

Thank you very much for the recs, I will look into them. Appreciate the knowledge drop as I'm just starting to look at this stuff.

Best Codes • Sep 19 '24

No problem :)

Martin Baun • Sep 19 '24

Llama looks good!
But why no multi modal?

Best Codes • Sep 19 '24

I didn't include any multimodal models because there aren't many open-source ones, and because they can be a lot more intensive to run locally, and this article was focusing on smaller models that can run on a laptop or PC. :)

Martin Baun • Sep 20 '24

Ah, gotcha!

Best Codes • Sep 20 '24

Mohammad Kareem • Sep 18 '24

i'd be glad to run vscode on my machine without it turning into a stove

Best Codes • Sep 18 '24

Running an AI model is a bit more intensive than running VS Code. 🤪

Mohammad Kareem • Sep 18 '24

a bit you say lmao

Best Codes • Sep 19 '24

Haha

Von Colborn • Sep 20 '24

i'm curios as to what level of a computing environment you all are using for running ~8 MB models, other than Apple M? hardware?

Best Codes • Sep 20 '24

An 8 MB model is tiny. Do you mean 8 GB? In that case, any device will do as long as you've got enough disk space (8 GB) and RAM (usually about 16 to 32 GB).

Alternate Existance • Sep 17 '24

love it thanks for sharing

Best Codes • Sep 17 '24

Thank you!

SP • Sep 19 '24

Anyone know how to train a model on source code? To use Locally? Which SLM should use? And how to do it?

Best Codes • Sep 19 '24

You can't exactly train a model on code and get something usable; you need chat data with a heavy use of code. Also, any codebase would probably be too small to make much of an AI off of.

You can use the Local Docs feature in GPT4All (which uses a text embeddings model, probably more what you're looking for) or Codeium in your editor to chat with your codebase.

Rob Benzo • Sep 18 '24 • Edited

Nice article, thanks for sharing 💖

Best Codes • Sep 18 '24

Thanks!

View full discussion (44 comments)

Some comments may only be visible to logged-in visitors. Sign in to view all comments. Some comments have been hidden by the post's author - find out more

DEV Community

Top 5 AI Models YOU Can Run Locally on YOUR Device! 🤯

Running an AI model locally

⚠️ Warning!

1. Nous Hermes 2 Mistral DPO

2. Small or Old Devices: Qwen2-1.5B-Instruct

3. Llama 3 8B Instruct

4. Mini Orca (Small)

5. Mistral Instruct

Chatting with a model

3G Cell Service Has a HUGE Security Flaw.

Best Codes ・ Sep 24 '24

Top comments (44)

Read next

Building a Smarter Chatbot with OpenAI Assistant API and Streaming(React & Node.js)

What Is Semantic Search With Filters and How to Implement It With Pgvector and Python

AI Language Models Show Strange "Hyperfitting" Effect When Fine-Tuned for Precision

New 4-Bit Training Method Cuts AI Model Memory Usage in Half While Maintaining Accuracy