Class LlamaInferenceEngine

Namespace: JD.AI.Core.LocalModels

Assembly: JD.AI.Core.dll

Wraps LLamaSharp to provide Semantic Kernel's Microsoft.SemanticKernel.ChatCompletion.IChatCompletionService. Manages model loading/unloading and streaming inference.

public sealed class LlamaInferenceEngine : IChatCompletionService, IAIService, IDisposable

Inheritance: object

LlamaInferenceEngine

Implements: IChatCompletionService

IAIService

IDisposable

Inherited Members: object.Equals(object)

object.Equals(object, object)

object.GetHashCode()

object.GetType()

object.ReferenceEquals(object, object)

object.ToString()

Constructors

LlamaInferenceEngine(ModelMetadata, LocalModelOptions?, ILogger?)

public LlamaInferenceEngine(ModelMetadata modelInfo, LocalModelOptions? options = null, ILogger? logger = null)

Parameters

modelInfo ModelMetadata
options LocalModelOptions
logger ILogger

Properties

Attributes

Gets the AI service attributes.

public IReadOnlyDictionary<string, object?> Attributes { get; }

Property Value

IReadOnlyDictionary<string, object>

IsLoaded

Whether the model is currently loaded.

public bool IsLoaded { get; }

Property Value

bool

Methods

Dispose()

Performs application-defined tasks associated with freeing, releasing, or resetting unmanaged resources.

public void Dispose()

GetChatMessageContentsAsync(ChatHistory, PromptExecutionSettings?, Kernel?, CancellationToken)

Get chat multiple chat content choices for the prompt and settings.

public Task<IReadOnlyList<ChatMessageContent>> GetChatMessageContentsAsync(ChatHistory chatHistory, PromptExecutionSettings? executionSettings = null, Kernel? kernel = null, CancellationToken cancellationToken = default)

Parameters

chatHistory ChatHistory: The chat history context.
executionSettings PromptExecutionSettings: The AI execution settings (optional).
kernel Kernel: The Microsoft.SemanticKernel.Kernel containing services, plugins, and other state for use throughout the operation.
cancellationToken CancellationToken: The CancellationToken to monitor for cancellation requests. The default is None.

Returns

Task<IReadOnlyList<ChatMessageContent>>: List of different chat results generated by the remote model

Remarks

This should be used when the settings request for more than one choice.

GetStreamingChatMessageContentsAsync(ChatHistory, PromptExecutionSettings?, Kernel?, CancellationToken)

Get streaming chat contents for the chat history provided using the specified settings.

public IAsyncEnumerable<StreamingChatMessageContent> GetStreamingChatMessageContentsAsync(ChatHistory chatHistory, PromptExecutionSettings? executionSettings = null, Kernel? kernel = null, CancellationToken cancellationToken = default)

Parameters

chatHistory ChatHistory: The chat history to complete.
executionSettings PromptExecutionSettings: The AI execution settings (optional).
kernel Kernel: The Microsoft.SemanticKernel.Kernel containing services, plugins, and other state for use throughout the operation.
cancellationToken CancellationToken: The CancellationToken to monitor for cancellation requests. The default is None.

Returns

IAsyncEnumerable<StreamingChatMessageContent>: Streaming list of different completion streaming string updates generated by the remote model

Exceptions

NotSupportedException: Throws if the specified type is not the same or fail to cast

Load()

Load the model into memory. Call before first inference.

public void Load()

Unload()

Unload the model and free memory.

public void Unload()

Table of Contents

Class LlamaInferenceEngine

Constructors

LlamaInferenceEngine(ModelMetadata, LocalModelOptions?, ILogger?)

Parameters

Properties

Attributes

Property Value

IsLoaded

Property Value

Methods

Dispose()

GetChatMessageContentsAsync(ChatHistory, PromptExecutionSettings?, Kernel?, CancellationToken)

Parameters

Returns

Remarks

GetStreamingChatMessageContentsAsync(ChatHistory, PromptExecutionSettings?, Kernel?, CancellationToken)

Parameters

Returns

Exceptions

Load()

Unload()