Class BatchEmbeddingPipeline
Processes documents through a chunking → embedding → storage pipeline. Handles batching to stay within embedding provider rate limits.
public sealed class BatchEmbeddingPipeline
- Inheritance
-
BatchEmbeddingPipeline
- Inherited Members
Constructors
BatchEmbeddingPipeline(IEmbeddingProvider, IVectorStore, int, ILogger?)
public BatchEmbeddingPipeline(IEmbeddingProvider embedder, IVectorStore store, int batchSize = 100, ILogger? logger = null)
Parameters
embedderIEmbeddingProviderEmbedding provider for vectorization.
storeIVectorStoreVector store for persistence.
batchSizeintMaximum entries per embedding batch. Default 100.
loggerILoggerOptional logger.
Methods
IndexDocumentAsync(string, string, string?, string?, int, int, CancellationToken)
Indexes a document by chunking, embedding, and storing. Returns the number of chunks stored.
public Task<int> IndexDocumentAsync(string documentId, string content, string? source = null, string? category = null, int maxChunkChars = 1500, int overlapChars = 200, CancellationToken ct = default)
Parameters
documentIdstringcontentstringsourcestringcategorystringmaxChunkCharsintoverlapCharsintctCancellationToken
Returns
IndexDocumentsAsync(IReadOnlyList<(string Id, string Content, string? Source, string? Category)>, CancellationToken)
Indexes multiple documents in parallel batches.
public Task<int> IndexDocumentsAsync(IReadOnlyList<(string Id, string Content, string? Source, string? Category)> documents, CancellationToken ct = default)
Parameters
documentsIReadOnlyList<(string Id, string Content, string Source, string Category)>ctCancellationToken