Table of Contents

Class LlamaInferenceEngine

Namespace
JD.AI.Core.LocalModels
Assembly
JD.AI.Core.dll

Wraps LLamaSharp to provide Semantic Kernel's Microsoft.SemanticKernel.ChatCompletion.IChatCompletionService. Manages model loading/unloading and streaming inference.

public sealed class LlamaInferenceEngine : IChatCompletionService, IAIService, IDisposable
Inheritance
LlamaInferenceEngine
Implements
IChatCompletionService
IAIService
Inherited Members

Constructors

LlamaInferenceEngine(ModelMetadata, LocalModelOptions?, ILogger?)

public LlamaInferenceEngine(ModelMetadata modelInfo, LocalModelOptions? options = null, ILogger? logger = null)

Parameters

modelInfo ModelMetadata
options LocalModelOptions
logger ILogger

Properties

Attributes

Gets the AI service attributes.

public IReadOnlyDictionary<string, object?> Attributes { get; }

Property Value

IReadOnlyDictionary<string, object>

IsLoaded

Whether the model is currently loaded.

public bool IsLoaded { get; }

Property Value

bool

Methods

Dispose()

Performs application-defined tasks associated with freeing, releasing, or resetting unmanaged resources.

public void Dispose()

GetChatMessageContentsAsync(ChatHistory, PromptExecutionSettings?, Kernel?, CancellationToken)

Get chat multiple chat content choices for the prompt and settings.

public Task<IReadOnlyList<ChatMessageContent>> GetChatMessageContentsAsync(ChatHistory chatHistory, PromptExecutionSettings? executionSettings = null, Kernel? kernel = null, CancellationToken cancellationToken = default)

Parameters

chatHistory ChatHistory

The chat history context.

executionSettings PromptExecutionSettings

The AI execution settings (optional).

kernel Kernel

The Microsoft.SemanticKernel.Kernel containing services, plugins, and other state for use throughout the operation.

cancellationToken CancellationToken

The CancellationToken to monitor for cancellation requests. The default is None.

Returns

Task<IReadOnlyList<ChatMessageContent>>

List of different chat results generated by the remote model

Remarks

This should be used when the settings request for more than one choice.

GetStreamingChatMessageContentsAsync(ChatHistory, PromptExecutionSettings?, Kernel?, CancellationToken)

Get streaming chat contents for the chat history provided using the specified settings.

public IAsyncEnumerable<StreamingChatMessageContent> GetStreamingChatMessageContentsAsync(ChatHistory chatHistory, PromptExecutionSettings? executionSettings = null, Kernel? kernel = null, CancellationToken cancellationToken = default)

Parameters

chatHistory ChatHistory

The chat history to complete.

executionSettings PromptExecutionSettings

The AI execution settings (optional).

kernel Kernel

The Microsoft.SemanticKernel.Kernel containing services, plugins, and other state for use throughout the operation.

cancellationToken CancellationToken

The CancellationToken to monitor for cancellation requests. The default is None.

Returns

IAsyncEnumerable<StreamingChatMessageContent>

Streaming list of different completion streaming string updates generated by the remote model

Exceptions

NotSupportedException

Throws if the specified type is not the same or fail to cast

Load()

Load the model into memory. Call before first inference.

public void Load()

Unload()

Unload the model and free memory.

public void Unload()