Error Handling

ExperimentFramework provides built-in error handling strategies to manage failures in experimental implementations. This allows you to safely test new code while maintaining system reliability.

Error Policies

Error policies determine what happens when a condition throws an exception. The framework supports five policies:

Policy	Behavior	Use Case
Throw	Propagate the exception immediately	Development, when you want to see failures
FallbackToControl	Fall back to control on error	Production, safe rollback to stable code
TryAny	Try all conditions until one succeeds	High availability scenarios
FallbackTo	Redirect to specific fallback condition	Dedicated diagnostics/safe-mode handlers
TryInOrder	Try ordered list of fallback conditions	Fine-grained control over fallback strategy

Throw Policy (Default)

The throw policy propagates exceptions immediately without attempting fallback.

When to Use

Development and testing environments
When you need to see and diagnose failures quickly
When condition failures should stop request processing

Configuration

Throw is the default policy if no policy is specified:

.Trial<IPaymentProcessor>(t => t
    .UsingFeatureFlag("UseNewPaymentProvider")
    .AddControl<StripePayment>("false")
    .AddVariant<NewPaymentProvider>("true"))
    // No error policy specified - uses Throw by default

Or explicitly:

.OnErrorThrow()

Behavior

When a condition throws an exception:

public class NewPaymentProvider : IPaymentProcessor
{
    public async Task<PaymentResult> ChargeAsync(decimal amount)
    {
        throw new PaymentException("Service unavailable");
    }
}

The exception propagates to the caller:

try
{
    await paymentProcessor.ChargeAsync(100m);
}
catch (PaymentException ex)
{
    // Exception is thrown directly
    // No fallback attempted
}

FallbackToControl Policy

The fallback-to-control policy catches exceptions from the selected condition and falls back to the control.

When to Use

Production environments
When you want to test new implementations with automatic rollback
When the control implementation is known to be stable
When partial availability is better than complete failure

Configuration

.Trial<IPaymentProcessor>(t => t
    .UsingFeatureFlag("UseNewPaymentProvider")
    .AddControl<StripePayment>("false")
    .AddVariant<NewPaymentProvider>("true")
    .OnErrorFallbackToControl())

Behavior

When the selected condition throws:

1. Try NewPaymentProvider
   └─ Throws PaymentException

2. Catch exception

3. Try StripePayment (control)
   └─ Succeeds

4. Return result

Example:

// Feature flag is enabled, so NewPaymentProvider is selected
var result = await paymentProcessor.ChargeAsync(100m);

// If NewPaymentProvider throws, framework automatically:
// 1. Catches the exception
// 2. Switches to StripePayment (control)
// 3. Retries the operation
// 4. Returns the result from StripePayment

// Caller receives successful result and doesn't see the exception

Logging Failed Attempts

Use the error logging decorator to track when fallback occurs:

var experiments = ExperimentFrameworkBuilder.Create()
    .AddLogger(l => l.AddErrorLogging())
    .Trial<IPaymentProcessor>(t => t
        .UsingFeatureFlag("UseNewPaymentProvider")
        .AddControl<StripePayment>("false")
        .AddVariant<NewPaymentProvider>("true")
        .OnErrorFallbackToControl());

Logged output when fallback occurs:

error: ExperimentFramework.ErrorLogging[0]
      Experiment error: IPaymentProcessor.ChargeAsync trial=true
      System.PaymentException: Service unavailable
         at NewPaymentProvider.ChargeAsync(Decimal amount)

Avoiding Retry Storms

Be cautious when the control can also fail:

public class StripePayment : IPaymentProcessor
{
    public async Task<PaymentResult> ChargeAsync(decimal amount)
    {
        // This can also throw
        throw new PaymentException("Stripe is down");
    }
}

In this case, both conditions fail and the exception propagates:

1. Try NewPaymentProvider -> Throws
2. Try StripePayment (control) -> Throws
3. Propagate exception to caller

TryAny Policy

The try-any policy tries all registered conditions in sequence until one succeeds.

When to Use

High availability scenarios
When you have multiple fallback options
When any successful response is acceptable
Circuit breaker patterns

Configuration

.Trial<ICache>(t => t
    .UsingConfigurationKey("Cache:Provider")
    .AddControl<InMemoryCache>("")
    .AddVariant<RedisCache>("redis")
    .AddVariant<MemcachedCache>("memcached")
    .OnErrorTryAny())

Behavior

When a condition throws, the framework tries the next available condition:

1. Try RedisCache (selected by configuration)
   └─ Throws ConnectionException

2. Try MemcachedCache
   └─ Throws ConnectionException

3. Try InMemoryCache (control)
   └─ Succeeds

4. Return result

Condition Order

Conditions are attempted in this order:

Selected condition (based on selection mode)
Other non-control conditions (order unspecified)
Control (always last)

Example Scenario

Caching with multiple fallback options:

public interface ICache
{
    Task<T> GetAsync<T>(string key);
    Task SetAsync<T>(string key, T value);
}

public class RedisCache : ICache
{
    public async Task<T> GetAsync<T>(string key)
    {
        // Redis is down
        throw new ConnectionException("Redis unavailable");
    }
}

public class MemcachedCache : ICache
{
    public async Task<T> GetAsync<T>(string key)
    {
        // Memcached is also down
        throw new ConnectionException("Memcached unavailable");
    }
}

public class InMemoryCache : ICache
{
    private readonly ConcurrentDictionary<string, object> _cache = new();

    public Task<T> GetAsync<T>(string key)
    {
        // Always succeeds (no external dependencies)
        if (_cache.TryGetValue(key, out var value))
        {
            return Task.FromResult((T)value);
        }
        return Task.FromResult(default(T));
    }
}

Usage:

// Configuration specifies Redis
// Redis fails, Memcached fails, InMemory succeeds
var value = await cache.GetAsync<string>("user:123");

// Caller receives the result from InMemoryCache
// No exception is thrown

When All Conditions Fail

If all conditions throw exceptions, the last exception is propagated:

public class InMemoryCache : ICache
{
    public Task<T> GetAsync<T>(string key)
    {
        // Even the fallback fails
        throw new OutOfMemoryException();
    }
}

Result:

1. Try RedisCache -> Throws ConnectionException
2. Try MemcachedCache -> Throws ConnectionException
3. Try InMemoryCache -> Throws OutOfMemoryException
4. Propagate OutOfMemoryException to caller

FallbackTo Policy

The fallback-to policy redirects to a specific fallback condition (e.g., a Noop diagnostics handler) when the selected condition fails.

When to Use

When you need a dedicated safe-mode or diagnostics handler
When you want fine-grained control over which condition handles failures
When the fallback condition should differ from the control
Circuit breaker patterns with specific fallback logic

Configuration

.Trial<IDiagnosticsHandler>(t => t
    .UsingFeatureFlag("UsePrimaryDiagnostics")
    .AddControl<PrimaryDiagnosticsHandler>("true")
    .AddVariant<SecondaryDiagnosticsHandler>("false")
    .AddVariant<NoopDiagnosticsHandler>("noop")
    .OnErrorFallbackTo("noop"))

Behavior

When the selected condition throws, the framework redirects to the specified fallback condition:

1. Try PrimaryDiagnosticsHandler (selected by feature flag)
   └─ Throws TimeoutException

2. Catch exception

3. Try NoopDiagnosticsHandler (specified fallback)
   └─ Succeeds (no-op returns immediately)

4. Return result

Example Scenario

Diagnostics handler with safe-mode fallback:

public class PrimaryDiagnosticsHandler : IDiagnosticsHandler
{
    public async Task CollectDiagnosticsAsync()
    {
        // May timeout connecting to diagnostics service
        throw new TimeoutException("Diagnostics service unavailable");
    }
}

public class NoopDiagnosticsHandler : IDiagnosticsHandler
{
    public Task CollectDiagnosticsAsync()
    {
        // No-op: always succeeds, does nothing
        return Task.CompletedTask;
    }
}

Usage:

// Primary diagnostics fails, falls back to noop
await diagnosticsHandler.CollectDiagnosticsAsync();

// Application continues without diagnostics
// No exception is thrown

When Fallback Condition Also Fails

If the specified fallback condition also throws, the exception propagates:

1. Try PrimaryDiagnosticsHandler -> Throws TimeoutException
2. Try NoopDiagnosticsHandler -> Throws InvalidOperationException
3. Propagate InvalidOperationException to caller

TryInOrder Policy

The try-in-order policy tries an ordered list of fallback conditions in exact sequence until one succeeds.

When to Use

When you need fine-grained control over fallback priority
Multi-tier caching strategies (cloud → local → memory → static)
When fallback order matters for performance or cost
When you have specific degradation paths

Configuration

.Trial<IDataService>(t => t
    .UsingFeatureFlag("UseCloudDatabase")
    .AddControl<CloudDatabaseImpl>("true")
    .AddVariant<LocalCacheImpl>("cache")
    .AddVariant<InMemoryCacheImpl>("memory")
    .AddVariant<StaticDataImpl>("static")
    .OnErrorTryInOrder("cache", "memory", "static"))

Behavior

When a condition throws, the framework tries the fallback conditions in exact order:

1. Try CloudDatabaseImpl (selected by feature flag)
   └─ Throws ConnectionException

2. Try LocalCacheImpl (first fallback)
   └─ Throws IOException

3. Try InMemoryCacheImpl (second fallback)
   └─ Succeeds

4. Return result

The framework stops at the first successful condition and doesn't try remaining fallbacks.

Example Scenario

Multi-tier data service with degradation strategy:

public interface IDataService
{
    Task<CustomerData> GetCustomerDataAsync(int customerId);
}

public class CloudDatabaseImpl : IDataService
{
    public async Task<CustomerData> GetCustomerDataAsync(int customerId)
    {
        // Cloud database might be unavailable
        throw new ConnectionException("Cloud database unreachable");
    }
}

public class LocalCacheImpl : IDataService
{
    public async Task<CustomerData> GetCustomerDataAsync(int customerId)
    {
        // Local cache might be corrupted
        throw new IOException("Cache file corrupted");
    }
}

public class InMemoryCacheImpl : IDataService
{
    private readonly ConcurrentDictionary<int, CustomerData> _cache = new();

    public Task<CustomerData> GetCustomerDataAsync(int customerId)
    {
        // In-memory cache succeeds with cached data
        if (_cache.TryGetValue(customerId, out var data))
            return Task.FromResult(data);

        return Task.FromResult(new CustomerData { Id = customerId, Name = "Unknown" });
    }
}

public class StaticDataImpl : IDataService
{
    public Task<CustomerData> GetCustomerDataAsync(int customerId)
    {
        // Static fallback returns placeholder data
        return Task.FromResult(new CustomerData
        {
            Id = customerId,
            Name = "Loading...",
            IsPlaceholder = true
        });
    }
}

Usage:

// Configuration: Cloud → LocalCache → InMemory → Static
var customerData = await dataService.GetCustomerDataAsync(123);

// Framework tries:
// 1. CloudDatabase (fails - connection error)
// 2. LocalCache (fails - corrupted)
// 3. InMemoryCache (succeeds - returns cached data)
// 4. StaticData (not tried - InMemory succeeded)

// Caller receives result from InMemoryCache

Condition Order Rules

The framework tries conditions in this exact order:

Selected condition (based on selection mode) - tried first
Ordered fallback keys (in the order you specify) - tried in sequence
Fallback keys are skipped if they match the selected condition - prevents duplicate attempts

Example:

.OnErrorTryInOrder("cache", "memory", "static")

If feature flag selects "memory", the order becomes:

1. Try "memory" (selected)
2. Try "cache" (first fallback, not already tried)
3. Try "static" (second fallback, not already tried)

The selected condition is never retried even if it appears in the fallback list.

When All Conditions Fail

If all conditions in the ordered sequence throw exceptions, the last exception propagates:

1. Try CloudDatabaseImpl -> Throws ConnectionException
2. Try LocalCacheImpl -> Throws IOException
3. Try InMemoryCacheImpl -> Throws InvalidOperationException
4. Try StaticDataImpl -> Throws NotImplementedException
5. Propagate NotImplementedException to caller

Performance Considerations

Each fallback attempt incurs the cost of:

Service resolution from DI container
Decorator pipeline execution
Method invocation overhead

For performance-critical paths, consider:

Keeping the fallback list short (2-3 conditions max)
Using fast-fail implementations that fail quickly
Monitoring fallback rates to identify problematic conditions

Error Logging Decorator

The error logging decorator logs exceptions before they propagate or trigger fallback.

Configuration

var experiments = ExperimentFrameworkBuilder.Create()
    .AddLogger(l => l.AddErrorLogging())
    .Trial<IPaymentProcessor>(t => t
        .UsingFeatureFlag("UseNewPaymentProvider")
        .AddControl<StripePayment>("false")
        .AddVariant<NewPaymentProvider>("true")
        .OnErrorFallbackToControl());

Logged Information

When an exception occurs:

error: ExperimentFramework.ErrorLogging[0]
      Experiment error: IPaymentProcessor.ChargeAsync trial=true
      System.InvalidOperationException: Payment gateway timeout
         at NewPaymentProvider.ChargeAsync(Decimal amount)
         at ExperimentFramework.ExperimentProxy.InvokeAsync(...)

The log includes:

Service interface name
Method name
Condition key that failed
Full exception with stack trace

Choosing an Error Policy

Use this decision tree:

Is this production?
├─ No (Development/Testing):
│   └─ Use Throw (see failures immediately)
└─ Yes (Production):
    └─ Do you have a stable control implementation?
        ├─ Yes:
        │   └─ Use FallbackToControl
        └─ No:
            └─ Do you have multiple fallback options?
                ├─ Yes: Use TryAny
                └─ No: Use Throw (and handle in application code)

Best Practices

1. Always Have a Stable Control

The control should be your most reliable implementation:

// Good: Stable implementation as control
.AddControl<ProvenPaymentProvider>("default")
.AddVariant<NewExperimentalProvider>("experimental")

// Bad: Experimental implementation as control
.AddControl<ExperimentalProvider>("default")
.AddVariant<ProvenProvider>("proven")

2. Use Error Logging

Always enable error logging in production to track fallback occurrences:

var experiments = ExperimentFrameworkBuilder.Create()
    .AddLogger(l => l
        .AddBenchmarks()
        .AddErrorLogging())  // Track when failures occur
    .Trial<IPaymentProcessor>(t => t
        .UsingFeatureFlag("UseNewPaymentProvider")
        .AddControl<StripePayment>("false")
        .AddVariant<NewPaymentProvider>("true")
        .OnErrorFallbackToControl());

3. Monitor Fallback Rates

Track how often fallback occurs to identify problematic conditions:

public class MetricsDecorator : IExperimentDecorator
{
    private readonly IMetrics _metrics;

    public async ValueTask<object?> InvokeAsync(
        InvocationContext context,
        Func<ValueTask<object?>> next)
    {
        try
        {
            return await next();
        }
        catch (Exception)
        {
            _metrics.Increment($"experiment.fallback.{context.ServiceType.Name}");
            throw;
        }
    }
}

4. Avoid Side Effects in Failing Conditions

Ensure conditions don't perform irreversible operations before failing:

// Bad: Side effect before failure
public async Task ProcessPaymentAsync(Payment payment)
{
    await _database.SavePaymentAttemptAsync(payment);  // Side effect
    throw new InvalidOperationException("Payment failed");
}

// Good: Validate before side effects
public async Task ProcessPaymentAsync(Payment payment)
{
    ValidatePayment(payment);  // Throws if invalid
    await _database.SavePaymentAttemptAsync(payment);  // Only if valid
    await ProcessPaymentInternalAsync(payment);
}

5. Consider Idempotency

When using TryAny, ensure operations are idempotent:

public async Task SendEmailAsync(Email email)
{
    // Use idempotency key to prevent duplicate sends
    var idempotencyKey = $"email:{email.Id}";

    if (await _cache.GetAsync<bool>(idempotencyKey))
    {
        return; // Already sent
    }

    await _emailProvider.SendAsync(email);
    await _cache.SetAsync(idempotencyKey, true, TimeSpan.FromHours(24));
}

Combining with Telemetry

Error policies work seamlessly with telemetry to provide observability:

var experiments = ExperimentFrameworkBuilder.Create()
    .AddLogger(l => l
        .AddBenchmarks()
        .AddErrorLogging())
    .Trial<IPaymentProcessor>(t => t
        .UsingFeatureFlag("UseNewPaymentProvider")
        .AddControl<StripePayment>("false")
        .AddVariant<NewPaymentProvider>("true")
        .OnErrorFallbackToControl());

services.AddExperimentFramework(experiments);
services.AddOpenTelemetryExperimentTracking();

This provides:

Error logs when conditions fail
Timing metrics for successful and failed attempts
Distributed traces showing fallback paths
Telemetry tags indicating which condition was attempted and which succeeded

Next Steps

Telemetry - Add observability to track experiment behavior
Advanced Topics - Implement custom error handling logic
Samples - See complete examples of error handling patterns

Table of Contents

Error Handling

Error Policies

Throw Policy (Default)

When to Use

Configuration

Behavior

FallbackToControl Policy

When to Use

Configuration

Behavior

Logging Failed Attempts

Avoiding Retry Storms

TryAny Policy

When to Use

Configuration

Behavior

Condition Order

Example Scenario

When All Conditions Fail

FallbackTo Policy

When to Use

Configuration

Behavior

Example Scenario

When Fallback Condition Also Fails

TryInOrder Policy

When to Use

Configuration

Behavior

Example Scenario

Condition Order Rules

When All Conditions Fail

Performance Considerations

Error Logging Decorator

Configuration

Logged Information

Choosing an Error Policy

Best Practices

1. Always Have a Stable Control

2. Use Error Logging

3. Monitor Fallback Rates

4. Avoid Side Effects in Failing Conditions

5. Consider Idempotency

Combining with Telemetry

Next Steps