Part 6/13:
Handling Data: Embeddings and Validation
A recurring challenge in Shapiro’s work is managing the format of data payloads—particularly the embeddings used in semantic search and similarity matching. He carefully implements validation routines to ensure that data contains all necessary fields like 'content,' 'model,' 'microservice,' and 'type' (or 'category'), with robust error handling to catch missing or malformed data early.
He discusses storing embeddings as JSON to facilitate transparency during initial experiments, acknowledging that in production, they’ll transition to more efficient storage like relational or NoSQL databases, and eventually specialized vector stores such as Milvus, Pinecone, or Elasticsearch with vector capabilities.