DEV Community

Cover image for Speaking with LLMs locally (offline internet)
msc2020
msc2020

Posted on • Updated on • Originally published at dev.to

Speaking with LLMs locally (offline internet)

Intro: Even though there are contrary opinions, more and more people are hearing that LLMs (large language models) models are a bit intelligent. To test these AI tools, it would be interesting to know how far the limit of their knowledge goes. Until now, there are still doubts about the composition of the training dataset for some LLMs. There is a certain "mystery" in what these models are capable of answering or doing. They probably have as part of their training dataset a massive amount of digital books, files, images, audios, texts, signals, methods, and analyses. Therefore, it is expected that LLMs, including the one we used in this post, have a high potential to answer the most different questions.

** The cover image was generated by a text-to-image generator with the choice of the phrase "A Llama fishing in a beautiful lake."

What will we do? In this context, this post shows how to run a open source and free LLM model on a personal computer. To test this local version of LLM, we ask some questions that can show some of the difficulties that arise in interactions with these robots.

Maybe some coffee can help with this work! ☕

Test 5

GIF image of the Test 5, done next, in which the Mistral-7B solves a Calculus exercise.

Configuring the LLM chatbot (Mistral-7B)

The LLM model chosen for this post is Mistral-7B, a variation of Llama 2. It is available in the llamafile format, an executable of around 4GB that can be downloaded from a repository on the Hugging Face website.

Llafile: More information about the llamafile format can be found at this link.

Credits: Both contents mentioned above are by the same author (thank you!).

Running the Mistral-7B locally

To run this model locally, we follow the steps below.

STEP 1. Download the executable (.llamafile) by clicking here.

Image of Hugging Face download page

Notes:

  • The name of the downloaded file in this tutorial is mistral-7b-instruct-v0.1-Q4_K_M.llamafile. For other versions of the Mistral, the name will probably be different.
  • After downloading the llamafile file, we can turn off our Internet connection.

STEP 2. Change the permissions of the downloaded file .llamafile so that we can run it on our computer.

On Linux, this can be done using the command below:

chmod 755 mistral-7b-instruct-v0.1-Q4_K_M.llamafile
Enter fullscreen mode Exit fullscreen mode

In Windows, by right-clicking on the file you can change yours permissions.

STEP 3. Run the downloaded file. On Linux this can be done with

./llava-v1.5-7b-q4-server.llamafile
Enter fullscreen mode Exit fullscreen mode

In Windows, we run the file by double-clicking on it.

STEP 4. Now just browse the Mistral-7B by accessing port 8080 of the localhost http://127.0.0.1:8080, which should automatically open in the default web browser of the computer used after completing the previous step.


Local tests with the Mistral-7B

In order to test the Mistral-7B model, which is now running locally, we will ask you to answer some questions. For testing purposes, we will ask it to change the level of understanding expected in his answers. Skills like this will make our AI chatbot a robot that not only reads what is in a dictionary, encyclopedia, or database but is also capable of transmitting a message from the sending source to a receiving source, according to the requirements and characteristics of each of these.

Below are the questions and answers provided in the tests.

  • Test 1. "Solve this equation for x: x^2 - 2*x + 3 = 0." (Expected answer: x = 1, x = 3)

- Answer 1 (summarized): "To solve we can use factoring to rewrite the equation as (x-1)(x-3). This gives us two possibles solutions: x = 1 or x = 3."

Test 1

  • Test 2. "Explain to an elementary school student how the for repetition loop works in the Python language. Present a teaching example at this level of understanding."

- Answer 2 (summarized): "The for loop is used when we want to repeat a block or section of code until a certain condition is reached. As an example, consider a list numbers = [1, 2, 3, 4, 5] , so to print Hello! 5 times is done like this:

for n in numbers:
   print(n)
Enter fullscreen mode Exit fullscreen mode

Test 2

  • Test 3. "Explain to a graduate student in the area of Computing, how the for repetition loop works in the Python language. Present a didactic example. Present a didactic example at this level of understanding."

   - Answer 3: "Answers practically the same as in Test 2, but uses language with more technical terms and presents a simpler example than the previous one."

Test 3

  • Test 4. "Please, could you tell a joke in Brazilian Portuguese?"

    • Answer 4:

Test 4

  • Test 5. "If x=sum_{i=3}^{102}(i-2), what is the value of x?" (Expected answer: x = 5050)

    • Answer 5:

Test 5

Comments on answers:

  • Tests 1. Nailed it!

  • Tests 2 and 3. When asked to change the complexity of understanding their answers, they understand and, in a way, follow.

  • Test 4. Maybe if we had written it in English instead of Portuguese, or changed its hyperparameters, it would have been funnier.

  • Test 5. He got the answer wrong, but when we changed it to the error, he knew how to correct himself and justify his mistake. Not bad for a chatbot running locally, with limited resources. From an ethical point of view, it turned out to be a somewhat friendly robot.


License

Llama 2 is licensed under the LLAMA 2 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved.


Some next steps

Another option for increment this post, is test other free model that can run locally. Some options are shared in this link.

Can an LLM run on any hardware? With few changes to what we did here, it is possible to create other free experiments locally. With around 6GB of RAM is it possible to play a "little". See more about clicking here.

Fun fact: Although some LLM models seem neutral in their answers, until now, it is impossible to know really how much these models know, how much they don't know and how much unprecedented knowledge they can generate. For example, in the article "Generation of 3D molecules in pockets via a language model" the authors shown how LLM can helps generate scientific knowledge with novel content in the literature. Namely, they generated synthesizable 3D molecular structures.


If you try the post, kindly share the results with us. Suggestions, criticisms, and opinions are welcome.

Top comments (0)