Part 8/11:
Embedding Data Using Sentence Transformers
A significant portion of the process involved converting raw text data into high-dimensional vectors suitable for semantic search. The approach included:
Creating a list of sample sentences (e.g., logs, startup descriptions)
Using sentence-transformers (specifically the
all-mpnet-base-v2model) to generate embeddingsAttempting to upload these vectors into the collection
The challenge was embedding large datasets efficiently. Running the embeddings on a CPU was feasible, but for faster performance—particularly on large datasets—GPU acceleration was considered. Transitioning to GPU involved installing the right version of torch and configuring the environment accordingly.