Harness Runtime

Harness Runtime is the distributed nervous system that orchestrates observation, auto-correction and governance across all Vectora layers. It is NOT a validation module, but the intelligence that enables Gemini to watch its own tool execution and adjust reasoning in real-time.

CRITICAL REDEFINITION: Harness Runtime is NOT a folder /harness in code, NOT a protocol, NOT a single module. It is an ARCHITECTURAL PATTERN that permeates system prompts, tool schemas, state management, and configuration. It emerges from the INTERACTION of observation hooks, immutable state threads, recovery strategies, and distributed governance.

What Harness Really Is

Harness has evolved from a validation module to a distributed system pattern:

old_view:
  harness: "Safety wrapper that validates before/after tool execution"

new_view:
  harness: "Distributed nervous system that enables real-time observation, self-correction and governance"
  harness_reality: "Not implemented as code, but as a PATTERN across 5 distributed layers"

transformation: "From (guardian_check → run_tool → validate_output) to (context_pipeline → streaming_execution → recovery_ladder → termination_conditions → state_threading)"

The 5 Distributed Layers of Harness

Harness emerges from the orchestration of 5 distributed layers:

1. Context Pipeline (Preparation)

What it does: Prepares the environment and context before execution

Validates permissions via Guardian blocklist
Checks preconditions (API keys, namespaces)
Injects metrics baseline into system prompt
Loads recovery strategies from YAML config

Code Pattern:

harness:
  layer_1_context_pipeline:
    pre_execution_checks:
      - validate_guardian
      - check_preconditions
      - load_baseline_metrics
      - inject_recovery_ladder
    timeout_ms: 500

2. Streaming Execution (Observation)

What it does: Runs tool with observation hooks at every step

Captures output in 4KB chunks with metadata
Watches latency, token count, error patterns
Injects metrics DURING execution (not after)
Enables Gemini to “see” what’s happening

Code Pattern:

harness:
  layer_2_streaming_execution:
    observation_points:
      search_context: ["query_received", "embedding_generated", "vector_search_completed", "reranking_started"]
      bash_terminal: ["command_sent", "output_chunk_received", "exit_code_captured"]
      voyage_rerank: ["chunk_n_processed", "confidence_scores_updated"]
    metrics_injection_frequency_ms: 100

3. Recovery Ladder (Resilience)

What it does: Ordered strategies to recover from failures, each with a cost

Attempt 1: Retry immediately (cost: latency)
Attempt 2: Refine query and retry (cost: reranking latency)
Attempt 3: Use local reranker (cost: lower precision)
Attempt 4: Fallback to basic search (cost: speed, not quality)
Attempt 5: Return null with explanation (cost: no response)

Code Pattern:

harness:
  layer_3_recovery_ladder:
    strategies:
      - name: "retry_immediate"
        max_attempts: 1
        cost: "latency:10ms"
        enabled: true
      - name: "refine_and_retry"
        max_attempts: 1
        cost: "latency:200ms"
        enabled: true
      - name: "local_reranker"
        max_attempts: 1
        cost: "precision:-0.15"
        enabled: true
      - name: "basic_search"
        max_attempts: 1
        cost: "latency:50ms"
        enabled: true
      - name: "graceful_degradation"
        returns_null: true
        enabled: true

4. Termination Conditions (Control)

What it does: Decides WHEN to stop trying and move on

Success: precision >= 0.65 OR confidence >= 0.80
Timeout: execution time >= 2000ms
Resource exhausted: recovery attempts > 5
User preference: –fast flag forces immediate return

Code Pattern:

harness:
  layer_4_termination_conditions:
    success_criteria:
      - precision >= 0.65
      - confidence >= 0.80
    timeout_ms: 2000
    max_recovery_attempts: 5
    user_preferences:
      --fast: "return after attempt 1"
      --thorough: "run all recovery attempts"
      --safe: "require precision >= 0.75"

5. State Threading (Persistence)

What it does: Maintains immutable typed state across the entire execution

AgentState: {iteration, precision, confidence, recovery_used, audit_trail}
Audit trail: every decision logged with timestamp, metrics, reason
Enables Gemini to “remember” what happened and why
Enables humans to understand the full reasoning chain

Code Pattern:

harness:
  layer_5_state_threading:
    state_type:
      iteration: int
      precision: float
      confidence: float
      recovery_ladder_step: string
      metrics_snapshot:
        latency_ms: int
        tokens_used: int
        safety_events: int
      audit_trail: [event]
    immutable: true
    persistence: mongodb

Tool Observation

Harness injects observation points into EACH TOOL where Gemini can “watch” execution and adjust reasoning in real-time.

Per-Tool Observation Points

search_context:
  observation_points:
    - "vector_search_completed: top_k_results_received"
    - "reranking_completed: precision_calculated"
    - "precision_threshold_check: if precision < 0.65 then suggest_retry"

voyage_rerank:
  observation_points:
    - "chunk_0_processed: confidence_score_assigned"
    - "chunk_n_processed: running_confidence_average_calculated"
    - "reranking_complete: final_precision_known"

bash_terminal:
  observation_points:
    - "command_sent: shell_type_identified"
    - "output_chunk_received: error_pattern_detected"
    - "exit_code_captured: success_or_failure_known"

Example: Gemini Watches search_context

1. Gemini calls search_context("validate JWT tokens")
2. At "reranking_completed": Gemini receives {precision: 0.72, top_5: [...]}
3. Gemini evaluates: "0.72 >= 0.65 Good enough, proceed with response"
4. But if precision was 0.58: "0.58 < 0.65 Refine and retry with different query"
5. Gemini uses recovery ladder: attempts query refinement before giving up

Metrics & SLAs

Harness captures metrics at EVERY observation point and defines SLA thresholds for automatic behavior adjustment.

Core Metrics & Thresholds

metrics:
  retrieval_precision:
    description: "Quality of retrieved chunks (0-1)"
    target: ">= 0.65"
    action_if_below: "trigger recovery ladder"

  tool_accuracy:
    description: "Success rate of tool execution (0-1)"
    target: ">= 0.95"
    action_if_below: "increase retry attempts"

  confidence_score:
    description: "Gemini's confidence in answer (0-1)"
    target: ">= 0.80"
    action_if_below: "add disclaimer to response"

  latency_p95:
    description: "95th percentile execution time"
    target: "< 2000ms"
    action_if_above: "use local reranker or fallback"

  token_efficiency:
    description: "Useful tokens / total tokens (0-1)"
    target: ">= 0.85"
    action_if_below: "refine prompt or truncate context"

Execution Example

Query: "How is JWT validation done in Go?"
├─ Context Pipeline:
│ ├─ Guardian: User has access
│ ├─ Preconditions: API keys loaded
│ └─ Baseline: {precision_target: 0.65, confidence_target: 0.80}
│
├─ Streaming Execution:
│ ├─ Embedding generated: latency=45ms
│ ├─ Vector search completed: results=1200, latency=120ms
│ ├─ Reranking started: top_100 input
│ └─ Reranking completed:
│ ├─ precision=0.72 (target 0.65)
│ ├─ confidence=0.87 (target 0.80)
│ ├─ latency_p95=165ms (target 2000ms)
│
├─ Termination Conditions:
│ ├─ Success criteria met: precision >= 0.65
│ └─ Decision: RETURN top_5 to Gemini
│
└─ State Threading:
   ├─ Iteration: 1
   ├─ Precision: 0.72
   ├─ Recovery used: none
   └─ Audit trail: [{timestamp, step, metrics}]

Configuring Harness

Harness configuration is distributed across system prompt, tool schemas, and YAML config:

harness:
  enabled: true

  # Layer 1: Context Pipeline
  context_pipeline:
    guardian_validation: true
    precondition_checks: true
    baseline_metrics_injection: true

  # Layer 2: Streaming Execution
  streaming_execution:
    observation_enabled: true
    metrics_injection_frequency_ms: 100
    output_chunk_size_bytes: 4096

  # Layer 3: Recovery Ladder
  recovery_ladder:
    strategies:
      - retry_immediate: { max_attempts: 1 }
      - refine_and_retry: { max_attempts: 1 }
      - local_reranker: { max_attempts: 1 }
      - basic_search: { max_attempts: 1 }
      - graceful_degradation: { enabled: true }

  # Layer 4: Termination Conditions
  termination_conditions:
    success_thresholds:
      precision: 0.65
      confidence: 0.80
    timeouts:
      max_execution_ms: 2000
      max_recovery_attempts: 5

  # Layer 5: State Threading
  state_threading:
    immutable_state: true
    audit_trail_enabled: true
    persistence_backend: mongodb

Testing Harness

Test that observation, auto-correction and governance work together:

Scenario: Harness auto-corrects low precision
  Given Gemini asks for "JWT validation patterns"
  When search_context returns precision=0.58
  Then Harness triggers recovery_ladder
  And Gemini refines query with "JWT authentication middleware"
  And search_context returns precision=0.74
  Then Gemini proceeds with high-confidence response

Scenario: Harness respects termination conditions
  Given recovery_ladder attempts = 5
  When precision still < 0.65
  Then Harness stops recovery
  And returns graceful_degradation response

Scenario: Harness maintains state threading
  Given execution iteration = 3
  When Gemini retrieves AgentState
  Then state includes {iteration, precision, recovery_used, audit_trail}
  And audit_trail has 15 entries (5 per iteration)
  And state is immutable

File Structure (Harness is a Pattern, Not a Folder)

Harness is NOT a folder named /harness. It is a PATTERN distributed across:

Vectora/
├─ internal/
│ ├─ llm/
│ │ └─ gemini.go ← System prompt with observation hooks
│ ├─ storage/
│ │ └─ state.go ← AgentState immutable threading
│ ├─ tools/
│ │ ├─ search_context.go ← Observation points in each tool
│ │ └─ analyze_deps.go
│ └─ server/
│ └─ handler.go ← Recovery ladder logic
│
├─ config/
│ ├─ harness.yaml ← Recovery strategies, metrics, thresholds
│ └─ guardian.yaml ← Permission blocklist
│
└─ docs/
   └─ concepts/
      └─ harness-runtime.md ← This file

Key insight: Harness “lives” in the INTERACTION between these components, not in a single folder.

FAQ

Is Harness Runtime a module I import in my code?

No. Harness is a PATTERN, not a library. You DON’T import harness. Instead, you:

Inject observation hooks into your tool implementations
Define recovery strategies in YAML config
Thread immutable state through your agent execution
Let Gemini’s system prompt enable “watching” behavior

Harness emerges from the interaction of these patterns, not from a single codebase component.

Why is Harness called a "nervous system"?

Like a nervous system, Harness:

Observes: Watches every tool execution via observation points
Communicates: Sends metrics to Gemini in real-time
Responds: Triggers recovery strategies when metrics fall below SLAs
Learns: Updates state and audit trail for continuous improvement
Protects: Guardian checks prevent dangerous operations before execution

It’s distributed, reactive, and permeates every layer of Vectora.

Can I disable Harness?

Yes, by setting harness.enabled: false in config. But this is NOT recommended for production because:

Gemini can’t “watch” tool execution
No auto-correction on precision < 0.65
Recovery ladder disabled
State threading disabled
Audit trail lost

For development/testing only.

Questions about Harness? GitHub Discussions · Consult the system prompt

External Linking

Concept	Resource	Link
Gemini API	Google AI Studio & Gemini API Documentation	ai.google.dev/docs
JWT	RFC 7519: JSON Web Token Standard	datatracker.ietf.org/doc/html/rfc7519
Anthropic Claude	Claude API Documentation	docs.anthropic.com/
Anthropic Cookbook	Recipes and patterns for using Claude	github.com/anthropics/anthropic-cookbook
MongoDB Atlas	Atlas Vector Search Documentation	www.mongodb.com/docs/atlas/atlas-vector-search/
OpenTelemetry	Observability framework for distributed systems	opentelemetry.io/docs/

Vectora v0.1.0 · GitHub · License (MIT) · Contributors

Part of the Vectora AI Agent ecosystem. Built with ADK, Claude, and Go.

Connected RAG: Why isolated context kills your productivity Namespaces