Fine-tune ChatGPT: what are the content and structure of the prompts dataset? #2990
spig95
started this conversation in
Community | General
Replies: 1 comment 3 replies
-
Hi @spig95 You can use awesome-chatgpt-prompts as example dataset. It is a small dataset with hundreds of prompts. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello everyone,
I have read this article that shows how Colossal-AI can be used to train ChatGPT. It motivated me to give a try to Colossal-AI to make a virtual assistant on a specific topic, about which I have a lot of text input that I would like to use to fine-tune the training of ChatGPT.
After some research, I came across the train_prompts.py script. If my understanding is correct, I can use this script to train ChatGPT (or to fine tune a pretrained model). However, it is unclear to me how the data/prompts should look like and how they are structured.
In particular, I found this line of code:
dataset = pd.read_csv(args.prompt_path)['prompt']
. It loads the dataset used intrainer.fit(dataset, ... )
.Could someone kindly tell me what the csv file at
args.prompt_path
contains and how it should be structured? If it is written somewhere, I was not able to find the documentation on this, and a link to it would suffice!Thanks for taking the time to read through my question!
Beta Was this translation helpful? Give feedback.
All reactions