Testing Guide

Comprehensive testing strategies and coverage for the Open Bedrock Server Server.

Testing Philosophy

The Open Bedrock Server Server follows a comprehensive testing strategy that ensures reliability, maintainability, and confidence in deployments. Our testing approach includes:

Unit Tests - Test individual components in isolation
Integration Tests - Test component interactions
End-to-End Tests - Test complete user workflows
Performance Tests - Validate system performance under load
Contract Tests - Ensure API compatibility

Test Structure

tests/
├── unit/                    # Unit tests
│   ├── test_services/
│   ├── test_adapters/
│   ├── test_strategies/
│   ├── test_models/
│   └── test_utils/
├── integration/             # Integration tests
│   ├── test_api/
│   ├── test_cli/
│   └── test_providers/
├── e2e/                     # End-to-end tests
├── performance/             # Performance tests
├── fixtures/                # Test data and fixtures
├── mocks/                   # Mock implementations
└── conftest.py             # Pytest configuration

Unit Testing

Service Layer Tests

Testing LLM Services:

# tests/unit/test_services/test_openai_service.py
import pytest
from unittest.mock import AsyncMock, patch, MagicMock
from src.open_bedrock_server.services.openai_service import OpenAIService
from src.open_bedrock_server.core.models import Message, ChatCompletionResponse

@pytest.fixture
def openai_service():
    with patch('src.open_bedrock_server.services.openai_service.AsyncOpenAI') as mock_client:
        service = OpenAIService(api_key="test-key")
        service.client = mock_client.return_value
        return service

@pytest.mark.asyncio
async def test_chat_completion_success(openai_service):
    """Test successful chat completion"""
    # Arrange
    mock_response = MagicMock()
    mock_response.id = "chatcmpl-123"
    mock_response.choices = [MagicMock()]
    mock_response.choices[0].message.content = "Hello! How can I help you?"
    mock_response.choices[0].finish_reason = "stop"
    mock_response.created = 1677652288
    mock_response.model = "gpt-4o-mini"
    mock_response.usage.prompt_tokens = 10
    mock_response.usage.completion_tokens = 15
    mock_response.usage.total_tokens = 25
    
    openai_service.client.chat.completions.create = AsyncMock(return_value=mock_response)
    
    messages = [Message(role="user", content="Hello")]
    
    # Act
    response = await openai_service.chat_completion(messages, "gpt-4o-mini")
    
    # Assert
    assert isinstance(response, ChatCompletionResponse)
    assert response.id == "chatcmpl-123"
    assert response.choices[0].message.content == "Hello! How can I help you?"
    assert response.usage.total_tokens == 25

@pytest.mark.asyncio
async def test_chat_completion_with_tools(openai_service):
    """Test chat completion with tool calls"""
    # Setup mock response with tool calls
    mock_response = MagicMock()
    mock_response.choices[0].message.tool_calls = [MagicMock()]
    mock_response.choices[0].message.tool_calls[0].id = "call_123"
    mock_response.choices[0].message.tool_calls[0].function.name = "get_weather"
    mock_response.choices[0].message.tool_calls[0].function.arguments = '{"location": "London"}'
    
    openai_service.client.chat.completions.create = AsyncMock(return_value=mock_response)
    
    messages = [Message(role="user", content="What's the weather in London?")]
    tools = [{"type": "function", "function": {"name": "get_weather"}}]
    
    response = await openai_service.chat_completion(messages, "gpt-4o-mini", tools=tools)
    
    assert response.choices[0].message.tool_calls is not None
    assert len(response.choices[0].message.tool_calls) == 1

Adapter Layer Tests

Testing Format Conversion:

# tests/unit/test_adapters/test_bedrock_adapter.py
import pytest
from src.open_bedrock_server.adapters.bedrock_adapter import BedrockAdapter
from src.open_bedrock_server.strategies.claude_strategy import ClaudeStrategy
from src.open_bedrock_server.core.models import ChatCompletionRequest, Message

@pytest.fixture
def bedrock_adapter():
    return BedrockAdapter(model_id="anthropic.claude-3-haiku-20240307-v1:0")

def test_claude_strategy_selection(bedrock_adapter):
    """Test that Claude strategy is selected for Claude models"""
    assert isinstance(bedrock_adapter.strategy, ClaudeStrategy)

def test_convert_to_bedrock_request(bedrock_adapter):
    """Test conversion from standard format to Bedrock format"""
    request = ChatCompletionRequest(
        model="anthropic.claude-3-haiku-20240307-v1:0",
        messages=[
            Message(role="system", content="You are a helpful assistant."),
            Message(role="user", content="Hello!")
        ],
        max_tokens=100,
        temperature=0.7
    )
    
    bedrock_request = bedrock_adapter.strategy.convert_to_bedrock_request(request)
    
    assert bedrock_request["anthropic_version"] == "bedrock-2023-05-31"
    assert bedrock_request["max_tokens"] == 100
    assert bedrock_request["temperature"] == 0.7
    assert bedrock_request["system"] == "You are a helpful assistant."
    assert len(bedrock_request["messages"]) == 1
    assert bedrock_request["messages"][0]["role"] == "user"

Strategy Layer Tests

Testing Bedrock Strategies:

# tests/unit/test_strategies/test_claude_strategy.py
import pytest
from src.open_bedrock_server.strategies.claude_strategy import ClaudeStrategy
from src.open_bedrock_server.core.models import ChatCompletionRequest, Message

@pytest.fixture
def claude_strategy():
    return ClaudeStrategy()

def test_convert_multimodal_content(claude_strategy):
    """Test conversion of multimodal content"""
    request = ChatCompletionRequest(
        model="anthropic.claude-3-haiku-20240307-v1:0",
        messages=[
            Message(
                role="user",
                content=[
                    {"type": "text", "text": "What's in this image?"},
                    {"type": "image_url", "image_url": {"url": "https://example.com/image.jpg"}}
                ]
            )
        ]
    )
    
    bedrock_request = claude_strategy.convert_to_bedrock_request(request)
    
    assert len(bedrock_request["messages"][0]["content"]) == 2
    assert bedrock_request["messages"][0]["content"][0]["type"] == "text"
    assert bedrock_request["messages"][0]["content"][1]["type"] == "image"

def test_convert_tool_calls(claude_strategy):
    """Test conversion of tool calls"""
    bedrock_response = {
        "id": "msg_123",
        "content": [
            {"type": "text", "text": "I'll check the weather for you."},
            {
                "type": "tool_use",
                "id": "toolu_123",
                "name": "get_weather",
                "input": {"location": "London"}
            }
        ],
        "stop_reason": "tool_use",
        "usage": {"input_tokens": 25, "output_tokens": 45}
    }
    
    request = ChatCompletionRequest(model="claude-3-haiku", messages=[])
    response = claude_strategy.convert_from_bedrock_response(bedrock_response, request)
    
    assert response.choices[0].message.tool_calls is not None
    assert len(response.choices[0].message.tool_calls) == 1
    assert response.choices[0].message.tool_calls[0].function.name == "get_weather"

Model Tests

Testing Pydantic Models:

# tests/unit/test_models/test_chat_models.py
import pytest
from pydantic import ValidationError
from src.open_bedrock_server.core.models import (
    ChatCompletionRequest, Message, Tool, Function
)

def test_chat_completion_request_validation():
    """Test ChatCompletionRequest validation"""
    # Valid request
    request = ChatCompletionRequest(
        model="gpt-4o-mini",
        messages=[Message(role="user", content="Hello")]
    )
    assert request.model == "gpt-4o-mini"
    assert len(request.messages) == 1
    
    # Invalid request - missing required fields
    with pytest.raises(ValidationError):
        ChatCompletionRequest(model="gpt-4o-mini")  # Missing messages

def test_message_with_tool_calls():
    """Test Message model with tool calls"""
    message = Message(
        role="assistant",
        content="I'll help you with that.",
        tool_calls=[{
            "id": "call_123",
            "type": "function",
            "function": {"name": "get_weather", "arguments": '{"location": "London"}'}
        }]
    )
    
    assert message.role == "assistant"
    assert len(message.tool_calls) == 1
    assert message.tool_calls[0].function.name == "get_weather"

def test_tool_definition():
    """Test Tool model validation"""
    tool = Tool(
        type="function",
        function=Function(
            name="get_weather",
            description="Get current weather",
            parameters={
                "type": "object",
                "properties": {"location": {"type": "string"}},
                "required": ["location"]
            }
        )
    )
    
    assert tool.function.name == "get_weather"
    assert "location" in tool.function.parameters["properties"]

Integration Testing

API Integration Tests

Testing FastAPI Endpoints:

# tests/integration/test_api/test_chat_completions.py
import pytest
from fastapi.testclient import TestClient
from unittest.mock import patch, AsyncMock
from src.open_bedrock_server.api.app import app

@pytest.fixture
def client():
    return TestClient(app)

@pytest.fixture
def mock_openai_service():
    with patch('src.open_bedrock_server.services.llm_service_factory.LLMServiceFactory.get_service') as mock:
        service = AsyncMock()
        mock.return_value = service
        yield service

def test_chat_completions_openai_format(client, mock_openai_service):
    """Test chat completions with OpenAI format"""
    # Mock service response
    mock_response = {
        "id": "chatcmpl-123",
        "object": "chat.completion",
        "created": 1677652288,
        "model": "gpt-4o-mini",
        "choices": [{
            "index": 0,
            "message": {"role": "assistant", "content": "Hello! How can I help you?"},
            "finish_reason": "stop"
        }],
        "usage": {"prompt_tokens": 10, "completion_tokens": 15, "total_tokens": 25}
    }
    mock_openai_service.chat_completion.return_value = mock_response
    
    # Make request
    response = client.post(
        "/v1/chat/completions",
        headers={"Authorization": "Bearer test-api-key"},
        json={
            "model": "gpt-4o-mini",
            "messages": [{"role": "user", "content": "Hello"}]
        }
    )
    
    assert response.status_code == 200
    data = response.json()
    assert data["id"] == "chatcmpl-123"
    assert data["choices"][0]["message"]["content"] == "Hello! How can I help you?"

def test_chat_completions_bedrock_claude_format(client, mock_openai_service):
    """Test chat completions with Bedrock Claude format input"""
    mock_response = {
        "id": "msg_123",
        "type": "message",
        "role": "assistant",
        "content": [{"type": "text", "text": "Hello! How can I help you?"}],
        "model": "anthropic.claude-3-haiku-20240307-v1:0",
        "stop_reason": "end_turn",
        "usage": {"input_tokens": 10, "output_tokens": 15}
    }
    mock_openai_service.chat_completion.return_value = mock_response
    
    response = client.post(
        "/v1/chat/completions?target_format=bedrock_claude",
        headers={"Authorization": "Bearer test-api-key"},
        json={
            "anthropic_version": "bedrock-2023-05-31",
            "model": "anthropic.claude-3-haiku-20240307-v1:0",
            "max_tokens": 1000,
            "messages": [{"role": "user", "content": "Hello"}]
        }
    )
    
    assert response.status_code == 200
    data = response.json()
    assert data["type"] == "message"
    assert data["content"][0]["text"] == "Hello! How can I help you?"

def test_authentication_required(client):
    """Test that authentication is required"""
    response = client.post(
        "/v1/chat/completions",
        json={"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Hello"}]}
    )
    
    assert response.status_code == 401

def test_invalid_model(client):
    """Test handling of invalid model"""
    response = client.post(
        "/v1/chat/completions",
        headers={"Authorization": "Bearer test-api-key"},
        json={
            "model": "invalid-model",
            "messages": [{"role": "user", "content": "Hello"}]
        }
    )
    
    assert response.status_code == 404
    assert "not found" in response.json()["detail"].lower()

CLI Integration Tests

Testing CLI Commands:

# tests/integration/test_cli/test_commands.py
import pytest
from click.testing import CliRunner
from unittest.mock import patch, MagicMock
from src.open_bedrock_server.cli.main import cli

@pytest.fixture
def runner():
    return CliRunner()

def test_config_set_command(runner):
    """Test config set command"""
    with patch('builtins.input', side_effect=['sk-test-key', 'test-api-key', '', '', 'us-east-1']):
        result = runner.invoke(cli, ['config', 'set'])
        
        assert result.exit_code == 0
        assert "Configuration saved" in result.output

def test_config_show_command(runner):
    """Test config show command"""
    with patch.dict('os.environ', {
        'OPENAI_API_KEY': 'sk-test-key',
        'API_KEY': 'test-api-key',
        'AWS_REGION': 'us-east-1'
    }):
        result = runner.invoke(cli, ['config', 'show'])
        
        assert result.exit_code == 0
        assert "OPENAI_API_KEY" in result.output
        assert "sk-***" in result.output  # Masked value

def test_models_command(runner):
    """Test models list command"""
    with patch('requests.get') as mock_get:
        mock_response = MagicMock()
        mock_response.json.return_value = {
            "object": "list",
            "data": [
                {"id": "gpt-4o-mini", "object": "model", "owned_by": "openai"}
            ]
        }
        mock_response.status_code = 200
        mock_get.return_value = mock_response
        
        result = runner.invoke(cli, ['models'])
        
        assert result.exit_code == 0
        assert "gpt-4o-mini" in result.output

@patch('src.open_bedrock_server.cli.chat.InteractiveChatSession')
def test_chat_command(mock_chat_session, runner):
    """Test interactive chat command"""
    mock_session = MagicMock()
    mock_chat_session.return_value = mock_session
    
    result = runner.invoke(cli, ['chat', '--model', 'gpt-4o-mini'])
    
    assert result.exit_code == 0
    mock_chat_session.assert_called_once()
    mock_session.start.assert_called_once()

End-to-End Testing

Complete Workflow Tests

Testing Full User Workflows:

# tests/e2e/test_complete_workflows.py
import pytest
import asyncio
from fastapi.testclient import TestClient
from src.open_bedrock_server.api.app import app

@pytest.fixture
def client():
    return TestClient(app)

@pytest.mark.e2e
def test_openai_to_bedrock_conversion_workflow(client):
    """Test complete workflow: OpenAI input → Bedrock output"""
    # This would be a real integration test with actual API calls
    # or against a test environment
    
    response = client.post(
        "/v1/chat/completions?target_format=bedrock_claude",
        headers={"Authorization": "Bearer test-api-key"},
        json={
            "model": "gpt-4o-mini",
            "messages": [
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": "What is the capital of France?"}
            ],
            "max_tokens": 100,
            "temperature": 0.7
        }
    )
    
    assert response.status_code == 200
    data = response.json()
    
    # Verify Bedrock Claude format
    assert data["type"] == "message"
    assert data["role"] == "assistant"
    assert "content" in data
    assert isinstance(data["content"], list)
    assert data["content"][0]["type"] == "text"

@pytest.mark.e2e
def test_streaming_workflow(client):
    """Test streaming response workflow"""
    response = client.post(
        "/v1/chat/completions",
        headers={"Authorization": "Bearer test-api-key"},
        json={
            "model": "gpt-4o-mini",
            "messages": [{"role": "user", "content": "Tell me a short story"}],
            "stream": True
        }
    )
    
    assert response.status_code == 200
    assert response.headers["content-type"] == "text/plain; charset=utf-8"
    
    # Verify streaming format
    chunks = response.content.decode().split('\n')
    data_chunks = [chunk for chunk in chunks if chunk.startswith('data: ')]
    
    assert len(data_chunks) > 0
    assert any('data: [DONE]' in chunk for chunk in chunks)

Performance Testing

Load Testing

Testing System Performance:

# tests/performance/test_load.py
import pytest
import asyncio
import aiohttp
import time
from concurrent.futures import ThreadPoolExecutor

@pytest.mark.performance
async def test_concurrent_requests():
    """Test system performance under concurrent load"""
    async def make_request(session, request_id):
        async with session.post(
            "http://localhost:8000/v1/chat/completions",
            headers={"Authorization": "Bearer test-api-key"},
            json={
                "model": "gpt-4o-mini",
                "messages": [{"role": "user", "content": f"Request {request_id}"}],
                "max_tokens": 50
            }
        ) as response:
            return await response.json()
    
    start_time = time.time()
    
    async with aiohttp.ClientSession() as session:
        tasks = [make_request(session, i) for i in range(50)]
        responses = await asyncio.gather(*tasks)
    
    end_time = time.time()
    duration = end_time - start_time
    
    # Verify all requests succeeded
    assert len(responses) == 50
    assert all('choices' in response for response in responses)
    
    # Performance assertions
    assert duration < 30  # Should complete within 30 seconds
    print(f"50 concurrent requests completed in {duration:.2f} seconds")

@pytest.mark.performance
def test_memory_usage():
    """Test memory usage under load"""
    import psutil
    import os
    
    process = psutil.Process(os.getpid())
    initial_memory = process.memory_info().rss / 1024 / 1024  # MB
    
    # Simulate load
    # ... perform operations ...
    
    final_memory = process.memory_info().rss / 1024 / 1024  # MB
    memory_increase = final_memory - initial_memory
    
    # Memory should not increase significantly
    assert memory_increase < 100  # Less than 100MB increase

Test Configuration

Pytest Configuration

conftest.py:

# tests/conftest.py
import pytest
import os
import asyncio
from unittest.mock import patch
from fastapi.testclient import TestClient

# Test environment setup
@pytest.fixture(autouse=True)
def setup_test_env():
    """Setup test environment variables"""
    test_env = {
        "OPENAI_API_KEY": "sk-test-openai-key",
        "API_KEY": "test-api-key",
        "AWS_REGION": "us-east-1",
        "LOG_LEVEL": "DEBUG",
        "ENVIRONMENT": "test"
    }
    
    with patch.dict(os.environ, test_env):
        yield

# Async test support
@pytest.fixture(scope="session")
def event_loop():
    """Create an instance of the default event loop for the test session."""
    loop = asyncio.get_event_loop_policy().new_event_loop()
    yield loop
    loop.close()

# Mock fixtures
@pytest.fixture
def mock_openai_client():
    """Mock OpenAI client for testing"""
    with patch('openai.AsyncOpenAI') as mock:
        yield mock

@pytest.fixture
def mock_bedrock_client():
    """Mock Bedrock client for testing"""
    with patch('boto3.client') as mock:
        yield mock

# Test data fixtures
@pytest.fixture
def sample_messages():
    """Sample messages for testing"""
    return [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Hello!"},
        {"role": "assistant", "content": "Hello! How can I help you?"},
        {"role": "user", "content": "What's the weather like?"}
    ]

@pytest.fixture
def sample_tools():
    """Sample tools for testing"""
    return [
        {
            "type": "function",
            "function": {
                "name": "get_weather",
                "description": "Get current weather for a location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {"type": "string", "description": "City name"}
                    },
                    "required": ["location"]
                }
            }
        }
    ]

Test Markers

pytest.ini:

[tool:pytest]
markers =
    unit: Unit tests
    integration: Integration tests
    e2e: End-to-end tests
    performance: Performance tests
    slow: Slow running tests
    
testpaths = tests
python_files = test_*.py
python_classes = Test*
python_functions = test_*

# Async support
asyncio_mode = auto

# Coverage
addopts = 
    --cov=src
    --cov-report=html
    --cov-report=term-missing
    --cov-fail-under=80

Running Tests

Test Execution Commands

# Run all tests
pytest

# Run specific test types
pytest -m unit
pytest -m integration
pytest -m e2e
pytest -m performance

# Run with coverage
pytest --cov=src --cov-report=html

# Run specific test file
pytest tests/unit/test_services/test_openai_service.py

# Run with verbose output
pytest -v

# Run tests in parallel
pytest -n auto

# Run only failed tests from last run
pytest --lf

# Run tests matching pattern
pytest -k "test_chat_completion"

Continuous Integration

GitHub Actions Workflow:

name: Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        python-version: [3.8, 3.9, 3.10, 3.11]
    
    steps:
    - uses: actions/checkout@v3
    
    - name: Set up Python $
      uses: actions/setup-python@v3
      with:
        python-version: $
    
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install uv
        uv pip install -e ".[dev]"
    
    - name: Run unit tests
      run: pytest -m unit --cov=src --cov-report=xml
    
    - name: Run integration tests
      run: pytest -m integration
    
    - name: Upload coverage to Codecov
      uses: codecov/codecov-action@v3
      with:
        file: ./coverage.xml

Test Coverage Goals

Unit Tests: 90%+ coverage
Integration Tests: Cover all API endpoints and CLI commands
E2E Tests: Cover major user workflows
Performance Tests: Validate system performance under load

Best Practices

Test Isolation: Each test should be independent
Mock External Dependencies: Use mocks for API calls
Test Data: Use fixtures for consistent test data
Descriptive Names: Test names should describe what they test
Arrange-Act-Assert: Follow AAA pattern in tests
Edge Cases: Test error conditions and edge cases
Performance: Include performance regression tests

This comprehensive testing strategy ensures the Open Bedrock Server Server maintains high quality and reliability across all components and use cases.