Testing

Conventions for the test suite, coverage thresholds, and how to run a single test or the whole thing.

The test suite is bun test (no Jest, no Vitest). It uses the test patterns from bun:test and lives entirely under __tests__/.

Run

# All tests
bun test

# A single file
bun test __tests__/homeassistant/light-count.test.ts

# A pattern
bun test --test-name-pattern "control_light"

bunfig.toml auto-preloads test/setup.ts for every test, which sets HASS_TOKEN and JWT_SECRET to placeholder values so the config validation doesn’t trip on missing env vars.

Layout

__tests__/
├── homeassistant/         # mirrors src/tools/homeassistant/
│   └── lights.test.ts
├── tools/                 # mirrors src/tools/
│   └── search-entities.test.ts
├── integration/           # end-to-end tests
│   └── mcp-roundtrip.test.ts
├── mcp/
├── security/
├── speech/
└── ...

The convention: mirror the src/ structure. A tool at src/tools/homeassistant/foo.tool.ts has its test at __tests__/homeassistant/foo.test.ts (or __tests__/tools/homeassistant/foo.test.ts — match the existing convention in the file you’re contributing to).

Patterns

Mocking the HA client

The HA client talks to a real WebSocket on a real HA instance, which isn’t available in CI. Most tests mock it. The standard pattern:

import { describe, expect, it, mock } from "bun:test";

const mockHassClient = {
  getStates: async () => [{ entity_id: "light.living_room", state: "on" }],
  callService: async () => ({}),
  // ... the methods your tool actually uses
};

const makeContext = () => ({
  hassClient: mockHassClient as any,
  logger: {
    info: () => {},
    warn: () => {},
    error: () => {},
    debug: () => {},
  } as any,
  requestId: "test",
});

as any is the common escape hatch — the ToolContext type is wide and you usually only need a few fields.

Testing the tool directly

it("turns the light on at the requested brightness", async () => {
  const tool = new ControlLightTool();
  const result = await tool.execute(
    { entity_id: "light.living_room", action: "turn_on", brightness: 200 },
    makeContext(),
  );
  expect(result.state).toBe("on");
});

No HTTP, no server, no transport. The tool’s contract is: given a valid input and a context, produce a result. Test the contract.

Testing the transport

For tests that exercise the full HTTP or WebSocket surface, see __tests__/integration/. The pattern is to boot the server in-process, hit it with fetch or a WebSocket client, and assert on the response.

These tests are slower and more brittle. Use them sparingly — prefer tool-level tests for logic, integration tests for plumbing.

Coverage

bunfig.toml declares the coverage thresholds:

[test]
coverage = true
coverageThreshold = {
  statements = 0.8,
  lines = 0.8,
  functions = 0.8,
  branches = 0.7,
}

To see the report:

bun test --coverage

Open coverage/index.html in a browser for the line-by-line view.

The thresholds are enforced in CI (see .github/workflows/). A PR that drops coverage below the thresholds fails the build. If you’re removing a feature, also remove its tests (otherwise coverage is artificially high and the next person gets a rude surprise). If you’re adding a feature, add a test.

Conventions

One describe per file, named after the unit under test (describe("ControlLightTool", ...)).
One it per behavior, with a sentence-style name (it("returns 0 when no lights are on", ...)).
Use expect().toBe() for primitives, expect().toEqual() for objects/arrays.
For async assertions, use expect().resolves.toBe(...) or await expect(...).rejects.toThrow(...).
Don’t use done callbacks. bun:test handles promises natively.
Cleanup with afterEach(() => mock.restore()) if you used mock().

Mocking `fetch` / `ws`

The HA client uses fetch for some calls and ws for the WebSocket. To mock them:

import { mock } from "bun:test";

const originalFetch = globalThis.fetch;
afterEach(() => {
  globalThis.fetch = originalFetch;
});

it("...", async () => {
  globalThis.fetch = mock(() =>
    Promise.resolve(new Response(JSON.stringify({ ok: true }))),
  ) as any;
  // ...
});

For WebSocket, prefer mocking the HassClient directly — the ws library doesn’t have a clean mock surface.

Disabling tests temporarily

If you need to skip a test during development (e.g. it’s flaky in CI), use it.skip(...) or describe.skip(...). Don’t comment it out — skip is visible in the test report.

Common pitfalls

bunfig.toml is read once. If you change the preloaded test/setup.ts, restart bun test.
test/setup.ts sets placeholder env vars. Don’t write a test that asserts on a specific token value; read from the test’s own setup.
Don’t share state between tests. Each test should construct its own tool instance and context. The HA client mock is cheap to recreate.
The coverage report lags by one test run. If you just added a test and the coverage report is unchanged, run bun test --coverage again.

Adding a Tool — the full contribution flow.
Architecture > Tool System — what the tests are testing.

Previous ← Adding a Tool Next Speech Features →