Fine-Tuning GPT-4o-Mini for Blog Post Generation
The newly released GPT-4o-mini model, launched on July 18, surpasses GPT-3.5 and approaches GPT-4 in performance, with a cost only half of GPT-3.5 and the fastest response speed among the series. OpenAI officially opened the fine-tuning interface for GPT-4o-mini today, offering 2M tokens free per day until September 23, 2024.
1 Fine-Tuning Application Scenarios
For simple tasks, writing prompts is sufficient for the model to perform well. For more complex tasks, you can use the Chain of Thought technique to break down the task into multiple steps and reason through them step by step. However, for tasks requiring high precision and consistent output, fine-tuning is necessary.
The following table compares the pros and cons of these three methods and their application scenarios.
Method | Advantages | Disadvantages | Application Scenarios |
---|---|---|---|
Fine-Tuning | Provides high-quality results | Requires a lot of time and resources to prepare and train data | Tasks needing stable, reliable, and high-quality output |
Suitable for complex tasks and custom domains | Feedback loop is slow, training cost is high | Improve model performance in specific tasks or domains | |
Saves tokens, reduces latency | Requires knowledge of deep learning | Tasks needing high precision or unique style, tone, format | |
Prompting | Quick iteration and testing | Depends on the quality of the prompt design | Quick prototyping and testing of common tasks |
Suitable for initial exploration and general tasks | May not be accurate enough for complex tasks | When flexible adjustment of model output is needed | |
No need for additional data preparation and training resources | Not suitable for tasks with many examples and complex logic | ||
Chain of Thought | Provides step-by-step logic and reasoning | Increases prompt complexity and length | Tasks requiring reasoning and logical steps |
Improves performance on complex tasks | Increases token usage and latency | Scenarios involving multi-step problem-solving | |
Easy to combine multiple strategies and tools | May still not be enough for very complex tasks | When a clear logical process and step-by-step execution are needed |
The No Free Lunch theorem tells us that no method can be suitable for all scenarios, and this is no exception. Fine-tuning is not necessarily better than the other two methods. However, it is clear that fine-tuning is suitable for those “hard-to-describe tasks”, such as a specific style and tone. Additionally, these three methods are not mutually exclusive. Using well-designed prompts or even combining them with chain of thought in a fine-tuned model might achieve better results.
For simple tasks like writing an article or a paragraph, prompts are sufficient. But for a blog post, considering SEO, there are many details like core keyword frequency, etc. These details may not be fully understood by the large model, and as a user, you might not be able to describe them well in prompts. Therefore, writing such a blog post can benefit from fine-tuning.
2 Preparing Data
Data needs to be organized in jsonl
format, with each line being a JSON object. For example:
|
|
You can also set weights in multi-turn conversations, with weight 0 indicating that the model should avoid this type of response.
|
|
Of course, processing data is the most time-consuming part. Here, you can also use the dataset I created. This dataset is used for fine-tuning large models, sourced from scraping over 3000 pages across 13 categories from the reads.alibaba.com website. The open-source dataset includes both processed data, raw data, and crawler code.
Upload the prepared data and record the returned file ID.
|
|
3 Fine-Tuning the Model
Once the data is prepared, validated, and the token cost confirmed, you can create a fine-tuning job.
|
|
More detailed parameter configurations for this step can be found in the official API documentation.
These two steps can also be quickly completed in the UI interface. After submitting the job, you can also track progress and loss changes in real-time in the UI interface.
4 Using the Model
You can check the status of the fine-tuning job with the code below. Once the job is successful, you will see the fine_tuned_model
field filled with the model’s name. Note this name to call the model.
|
|
The calling method is the same as the official models; you only need to change the model name. For example:
|
|
5 Evaluating the Results
During training, there are two indicators to refer to: loss and token accuracy. The official explanation is as follows:
Validation loss and validation token accuracy are calculated in two different ways - on a small batch of data during each step and on the entire validation set at the end of each epoch. The overall validation loss and overall validation token accuracy indicators are the most accurate indicators for tracking the overall performance of the model. These statistics are intended to provide a sanity check to ensure training is proceeding smoothly (loss should decrease, token accuracy should increase).
However, indicators are just references; the actual effect still needs to be evaluated by yourself. The fine-tuned model has at least the following improvements:
- Article length increases by 20%
- Article structure is closer to the training data
- No more format errors (e.g., markdown format, adding CSS, etc.)
Here is an example of an article generated with the title “What is the Difference Between a Mural and a Mosaic?”: