Part 5/9:
He also introduces UUIDs (Universally Unique Identifiers) at the beginning of each prompt. This technique primes GPT-3 to produce more internal entropy—introducing randomness—by making the model "confused," which results in more diverse outputs when combined with a high temperature setting. This results in a more varied set of examples, enriching the training data.
Automation of Data Synthesis
Shapiro describes a Python script that automates the data generation process:
Combining different lists of genres, modifiers, locations, and periods to generate diverse prompt combinations.
Using nested loops for systematic variation, e.g., four genres, four modifiers, four locations, and four periods, resulting in 256 unique prompts.