Part 4/13:
A cornerstone of the project is the use of semantic vector embeddings—numerical representations capturing the meaning of text. Instead of relying solely on keywords, this approach allows the engine to recognize conceptually similar papers, even if they don't share exact keywords.
Shapiro employed Google's Universal Sentence Encoder v5, a lightweight, free NLP model capable of generating 512-dimensional embeddings for each abstract and title. These vectors encode semantic nuances, enabling near-instantaneous comparison of scientific papers in the embedding space.