
Node Details
- Name: embeddingsFilterRetriever
- Type: EmbeddingsFilterRetriever
- Version: 1.0
- Category: Retrievers
- Base Classes: EmbeddingsFilterRetriever, BaseRetriever
Description
This node implements a document compressor that uses embeddings to drop documents unrelated to the query. It combines a base retriever (typically a vector store retriever) with an embeddings filter to refine the retrieval process.Input Parameters
-
Vector Store Retriever (baseRetriever)
- Type: VectorStoreRetriever
- Description: The base retriever to use for initial document retrieval.
-
Embeddings (embeddings)
- Type: Embeddings
- Description: The embeddings model to use for encoding queries and documents.
-
Query (query)
- Type: string
- Optional: Yes
- Description: Specific query to retrieve documents. If not provided, the user’s question will be used.
-
Similarity Threshold (similarityThreshold)
- Type: number
- Default: 0.8
- Optional: Yes
- Description: Threshold for determining when two documents are similar enough to be considered redundant.
-
K (k)
- Type: number
- Default: 20
- Optional: Yes
- Description: The number of relevant documents to return. Can be set to undefined, in which case similarity_threshold must be specified.
Outputs
-
Embeddings Filter Retriever (retriever)
- Type: EmbeddingsFilterRetriever, BaseRetriever
- Description: The configured retriever object.
-
Document (document)
- Type: Document, json
- Description: Array of document objects containing metadata and pageContent.
-
Text (text)
- Type: string, json
- Description: Concatenated string from pageContent of retrieved documents.
Functionality
The Embeddings Filter Retriever works by:- Using the base retriever to fetch an initial set of documents.
- Applying an embeddings filter to refine the results based on similarity to the query.
- Returning either the retriever object, the filtered documents, or the concatenated text of the documents based on the specified output.
Use Cases
- Improving relevance in document retrieval tasks.
- Reducing noise in retrieved documents for more focused language model inputs.
- Enhancing question-answering systems by providing more relevant context.
Notes
- Either ‘k’ or ‘similarity_threshold’ must be specified for proper functioning.
- The node uses the ContextualCompressionRetriever and EmbeddingsFilter from the LangChain library.
- It handles escape characters in the output text when returning concatenated document content.