Embeddings Filter Retriever

Node Details

Name: embeddingsFilterRetriever
Type: EmbeddingsFilterRetriever
Version: 1.0
Category: Retrievers
Base Classes: EmbeddingsFilterRetriever, BaseRetriever

Description

This node implements a document compressor that uses embeddings to drop documents unrelated to the query. It combines a base retriever (typically a vector store retriever) with an embeddings filter to refine the retrieval process.

Input Parameters

Vector Store Retriever (baseRetriever)
- Type: VectorStoreRetriever
- Description: The base retriever to use for initial document retrieval.
Embeddings (embeddings)
- Type: Embeddings
- Description: The embeddings model to use for encoding queries and documents.
Query (query)
- Type: string
- Optional: Yes
- Description: Specific query to retrieve documents. If not provided, the user’s question will be used.
Similarity Threshold (similarityThreshold)
- Type: number
- Default: 0.8
- Optional: Yes
- Description: Threshold for determining when two documents are similar enough to be considered redundant.
K (k)
- Type: number
- Default: 20
- Optional: Yes
- Description: The number of relevant documents to return. Can be set to undefined, in which case similarity_threshold must be specified.

Outputs

Embeddings Filter Retriever (retriever)
- Type: EmbeddingsFilterRetriever, BaseRetriever
- Description: The configured retriever object.
Document (document)
- Type: Document, json
- Description: Array of document objects containing metadata and pageContent.
Text (text)
- Type: string, json
- Description: Concatenated string from pageContent of retrieved documents.

Functionality

The Embeddings Filter Retriever works by:

Using the base retriever to fetch an initial set of documents.
Applying an embeddings filter to refine the results based on similarity to the query.
Returning either the retriever object, the filtered documents, or the concatenated text of the documents based on the specified output.

Use Cases

Improving relevance in document retrieval tasks.
Reducing noise in retrieved documents for more focused language model inputs.
Enhancing question-answering systems by providing more relevant context.

Notes

Either ‘k’ or ‘similarity_threshold’ must be specified for proper functioning.
The node uses the ContextualCompressionRetriever and EmbeddingsFilter from the LangChain library.
It handles escape characters in the output text when returning concatenated document content.

Components

​Node Details

​Description

​Input Parameters

​Outputs

​Functionality

​Use Cases

​Notes