Class LlamaInferenceEngine
- Namespace
- JD.AI.Core.LocalModels
- Assembly
- JD.AI.Core.dll
Wraps LLamaSharp to provide Semantic Kernel's Microsoft.SemanticKernel.ChatCompletion.IChatCompletionService. Manages model loading/unloading and streaming inference.
public sealed class LlamaInferenceEngine : IChatCompletionService, IAIService, IDisposable
- Inheritance
-
LlamaInferenceEngine
- Implements
-
IChatCompletionServiceIAIService
- Inherited Members
Constructors
LlamaInferenceEngine(ModelMetadata, LocalModelOptions?, ILogger?)
public LlamaInferenceEngine(ModelMetadata modelInfo, LocalModelOptions? options = null, ILogger? logger = null)
Parameters
modelInfoModelMetadataoptionsLocalModelOptionsloggerILogger
Properties
Attributes
Gets the AI service attributes.
public IReadOnlyDictionary<string, object?> Attributes { get; }
Property Value
IsLoaded
Whether the model is currently loaded.
public bool IsLoaded { get; }
Property Value
Methods
Dispose()
Performs application-defined tasks associated with freeing, releasing, or resetting unmanaged resources.
public void Dispose()
GetChatMessageContentsAsync(ChatHistory, PromptExecutionSettings?, Kernel?, CancellationToken)
Get chat multiple chat content choices for the prompt and settings.
public Task<IReadOnlyList<ChatMessageContent>> GetChatMessageContentsAsync(ChatHistory chatHistory, PromptExecutionSettings? executionSettings = null, Kernel? kernel = null, CancellationToken cancellationToken = default)
Parameters
chatHistoryChatHistoryThe chat history context.
executionSettingsPromptExecutionSettingsThe AI execution settings (optional).
kernelKernelThe Microsoft.SemanticKernel.Kernel containing services, plugins, and other state for use throughout the operation.
cancellationTokenCancellationTokenThe CancellationToken to monitor for cancellation requests. The default is None.
Returns
- Task<IReadOnlyList<ChatMessageContent>>
List of different chat results generated by the remote model
Remarks
This should be used when the settings request for more than one choice.
GetStreamingChatMessageContentsAsync(ChatHistory, PromptExecutionSettings?, Kernel?, CancellationToken)
Get streaming chat contents for the chat history provided using the specified settings.
public IAsyncEnumerable<StreamingChatMessageContent> GetStreamingChatMessageContentsAsync(ChatHistory chatHistory, PromptExecutionSettings? executionSettings = null, Kernel? kernel = null, CancellationToken cancellationToken = default)
Parameters
chatHistoryChatHistoryThe chat history to complete.
executionSettingsPromptExecutionSettingsThe AI execution settings (optional).
kernelKernelThe Microsoft.SemanticKernel.Kernel containing services, plugins, and other state for use throughout the operation.
cancellationTokenCancellationTokenThe CancellationToken to monitor for cancellation requests. The default is None.
Returns
- IAsyncEnumerable<StreamingChatMessageContent>
Streaming list of different completion streaming string updates generated by the remote model
Exceptions
- NotSupportedException
Throws if the specified type is not the same or fail to cast
Load()
Load the model into memory. Call before first inference.
public void Load()
Unload()
Unload the model and free memory.
public void Unload()