You are viewing a single comment's thread from:

RE: LeoThread 2025-11-05 15-48

in LeoFinance21 days ago

Part 8/11:

Embedding Data Using Sentence Transformers

A significant portion of the process involved converting raw text data into high-dimensional vectors suitable for semantic search. The approach included:

  • Creating a list of sample sentences (e.g., logs, startup descriptions)

  • Using sentence-transformers (specifically the all-mpnet-base-v2 model) to generate embeddings

  • Attempting to upload these vectors into the collection

The challenge was embedding large datasets efficiently. Running the embeddings on a CPU was feasible, but for faster performance—particularly on large datasets—GPU acceleration was considered. Transitioning to GPU involved installing the right version of torch and configuring the environment accordingly.