RE: LeoThread 2025-11-05 15-48

Part 8/11:

Embedding Data Using Sentence Transformers

A significant portion of the process involved converting raw text data into high-dimensional vectors suitable for semantic search. The approach included:

Creating a list of sample sentences (e.g., logs, startup descriptions)
Using sentence-transformers (specifically the all-mpnet-base-v2 model) to generate embeddings
Attempting to upload these vectors into the collection

The challenge was embedding large datasets efficiently. Running the embeddings on a CPU was feasible, but for faster performance—particularly on large datasets—GPU acceleration was considered. Transitioning to GPU involved installing the right version of torch and configuring the environment accordingly.