Part 6/9:
- Saving prompts and generated outputs systematically with timestamped filenames to facilitate data management.
This method ensures efficient creation of large, rich datasets without manually writing each example.
Fine-Tuning Data Formatting
Once the synthetic prompts and responses are generated, the next step involves formatting the data for fine-tuning:
Structuring each record with a clear prompt and corresponding completion.
Employing string formatting to embed variables (e.g., genre, location) into prompts dynamically.
Managing file storage, naming conventions, and ensuring proper data splits for training.
Shapiro emphasizes testing generated examples before actual training, to confirm that prompts effectively yield desired outputs.