Recommendation System for Articles
Let’s consider a scenario where a news aggregation platform aims to recommend articles similar to what a user is currently reading, enhancing user engagement by suggesting relevant content.
How It Works:
- Preprocessing and Indexing: Articles in the platform’s database are processed to extract textual features, often converted into high-dimensional vectors using LDA or transformer based embeddings like text-ada-embedding-002. These vectors are then indexed using HNSW, an algorithm suitable for high-dimensional spaces due to its hierarchical structure that facilitates efficient navigation and search.
- Retrieval Time: When a user reads an article, the system generates a feature vector for this article and queries the HNSW index to find vectors (and thus articles) that are close in the high-dimensional space. Cosine similarity can be used to evaluate the similarity between the query article’s vector and those in the index, identifying articles with similar content.
- Outcome: The system recommends a list of articles ranked by their relevance to the currently viewed article. Thanks to the efficient indexing and similarity search, these recommendations are generated quickly, even from a vast database of articles, providing the user with a seamless experience.
Now let us walkthrough a scenario where Manhattan Distance will be preferred over Cosine Similarity.
Ride-Sharing App Matchmaking
Let’s consider a scenario where a ride-sharing application needs to match passengers with nearby drivers efficiently. The system must quickly find the closest available drivers to a passenger’s location to minimize wait times and optimize routes.
How It Works:
- Preprocessing and Indexing: Drivers’ current locations are constantly being updated and stored as points in a 2D space representing a map. These points can be indexed using a tree based spatial indexing techniques or data structures optimized for geospatial data, such as R-trees.
- Retrieval Time: When a passenger requests a ride, the application uses the passenger’s current location as a query point. Manhattan distance (L1 norm) is particularly suitable for urban environments, where movement is constrained by a grid-like structure of streets and avenues, mimicking the actual paths a car would take along city blocks.
- Outcome: The system quickly identifies the nearest available drivers using the indexed data and Manhattan distance calculations, considering the urban grid’s constraints. This process
ensures a swift matchmaking process, improving the user experience by reducing wait times.