DEV Community

Fine-Tune GPT-3 on custom datasets with just 10 lines of code using GPT-Index

Dhanush Reddy on February 12, 2023

The Generative Pre-trained Transformer 3 (GPT-3) model by OpenAI is a state-of-the-art language model that has been trained on a massive amount of ...
Collapse
 
redsquares profile image
redsquares

Hi! Great project! Thanks in advance!

Is there a way to direct it to use mainly the indexed documents ?

Like, "resume me the content of indexed PDFs" , is there a way to restrict data for analysis to the data folder ?

Thanks

Collapse
 
web3tej profile image
Pranav Tej

Yes, I ran it, it worked but i ended up gettnig OPenAI errors. Though i have 18 $ credit unutilized. I tried adding the same PDF as in your example into the data folder, its max 8 pages not sure why i was getting this error.

Image description

Collapse
 
dhanushreddy29 profile image
Dhanush Reddy

Can you once try on your local system?
I feel Replit servers maybe be blacklisted by OpenAI, as the same code works on my own.

Collapse
 
web3tej profile image
Pranav Tej

What about colab ?

Collapse
 
nikaskn profile image
Konstantinos N. Nikas

Hi mr. Dhanush Reddy, thank you for your article -guide.
Can we similarly point to a government open access database in which we have username and password to create official texts of the same typology as that of the government database using gpt3?

Collapse
 
dhanushreddy29 profile image
Dhanush Reddy • Edited

@nikaskn, I would suggest you to look at Llama Hub.

Alternatively, here is the main website: llamahub.ai, where you can find other data loaders for GPT Index.

Collapse
 
web3tej profile image
Pranav Tej

Hi,

I am trying the above code, request to explain me the following. # load from disk
index = GPTSimpleVectorIndex.load_from_disk('index.json')

I have my file as pdf in the data folder as you said, but what index.json here?

Am i missing something.

Collapse
 
dhanushreddy29 profile image
Dhanush Reddy

Did you run main.py before running query.py?
Maybe you have forgot that.

Collapse
 
web3tej profile image
Pranav Tej

INFO:root:> [build_index_from_documents] Total LLM token usage: 0 tokens
INFO:root:> [build_index_from_documents] Total embedding token usage: 192193 tokens

exactly above error i get when query.py is run

Collapse
 
machinesrental profile image
machinesrental

API not available

Collapse
 
shyamgt profile image
shyamgt

I am getting the "ModuleNotFoundError: No module named 'langchain.utilities'" when running main.py. installed langchain, openai, llama_index

Collapse
 
dhanushreddy29 profile image
Dhanush Reddy

Hey @shyamgt, we are using GPT-Index and not langchain.
Did you properly copy my code?

Maybe you have forgot installing it (install it via pip install gpt-index)

Collapse
 
nirvitarka profile image
Sujata

I am facing same issue, this is the error log (on windows 10 python virtual environment)
Traceback (most recent call last):
File "app.py", line 1, in <module>
from gpt_index import SimpleDirectoryReader, GPTListIndex, GPTSimpleVectorIndex, LLMPredictor, PromptHelper
File "F:\Projects\Other\ToDo\Self\SA\AppIoTProjects\GPT2\custombot\myenv\lib\site-packages\gpt_index\__init__.py", line 15, in <module>
from gpt_index.indices.common.struct_store.base import SQLContextBuilder
File "F:\Projects\Other\ToDo\Self\SA\AppIoTProjects\GPT2\custombot\myenv\lib\site-packages\gpt_index\indices\__init__.py", line 4, in <module>
from gpt_index.indices.keyword_table.base import GPTKeywordTableIndex
File "F:\Projects\Other\ToDo\Self\SA\AppIoTProjects\GPT2\custombot\myenv\lib\site-packages\gpt_index\indices\keyword_table\__init__.py", line 4, in <module>
from gpt_index.indices.keyword_table.base import GPTKeywordTableIndex
File "F:\Projects\Other\ToDo\Self\SA\AppIoTProjects\GPT2\custombot\myenv\lib\site-packages\gpt_index\indices\keyword_table\base.py", line 15, in <module>
from gpt_index.indices.base import DOCUMENTS_INPUT, BaseGPTIndex
File "F:\Projects\Other\ToDo\Self\SA\AppIoTProjects\GPT2\custombot\myenv\lib\site-packages\gpt_index\indices\base.py", line 19, in <module>
from gpt_index.docstore import DOC_TYPE, DocumentStore
File "F:\Projects\Other\ToDo\Self\SA\AppIoTProjects\GPT2\custombot\myenv\lib\site-packages\gpt_index\docstore.py", line 9, in <module>
from gpt_index.readers.schema.base import Document
File "F:\Projects\Other\ToDo\Self\SA\AppIoTProjects\GPT2\custombot\myenv\lib\site-packages\gpt_index\readers\__init__.py", line 34, in <module>
from gpt_index.readers.web import (
File "F:\Projects\Other\ToDo\Self\SA\AppIoTProjects\GPT2\custombot\myenv\lib\site-packages\gpt_index\readers\web.py", line 5, in <module>
from langchain.utilities import RequestsWrapper
ModuleNotFoundError: No module named 'langchain.utilities'