Skip to content

Reference

PT | EN

Vectora is a modular layered architecture that combines embedding (Voyage), search (Qdrant), reranking (Voyage), and reasoning (Gemini) to provide intelligent and governed context.

┌─────────────────────────────────────────────────────┐
│ IDEs (Claude Code, Cursor, VS Code) │
└────────────────────┬────────────────────────────────┘
                     │ MCP Protocol
┌─────────────────────────────────────────────────────┐
│ Vectora MCP Server │
│ (search_context, analyze_dependencies, etc) │
└────────┬──────────────────────────────────────────┬─┘
         │ │
         ▼ ▼
┌──────────────────────┐ ┌──────────────────────┐
│ Context Engine │ │ Harness Runtime │
│ - Embedding (V4) │ │ - Pre-execution │
│ - Search (HNSW) │ │ - Validation │
│ - Reranking (V2.5) │ │ - Metrics │
└────────┬─────────────┘ └──────────┬───────────┘
         │ │
         ▼ ▼
┌──────────────────────┐ ┌──────────────────────┐
│ Guardian Blocklist │ │ RBAC System │
│ - Path isolation │ │ - 5 roles │
│ - Trust folder │ │ - 15 permissions │
│ - Pattern matching │ │ - User management │
└────────┬─────────────┘ └──────────┬───────────┘
         │ │
         ▼──────────────┬───────────────────▼
            ┌──────────────────────┐
            │ Qdrant Vector DB │
            │ - Collections │
            │ - HNSW Index │
            │ - Namespaces │
            │ - Metadata Filters │
            └──────────────────────┘
            ┌──────────────────────┐
            │ File Storage │
            │ - Trust folder │
            │ - Vector index │
            │ - Cache (.vectora) │
            └──────────────────────┘

Layers

Vectora’s architecture is organized into four main layers that ensure everything from the user interface to secure data persistence.

1. Integration Layer (IDEs)

Where the user interacts with Vectora:

  • Claude Code: Native MCP
  • Cursor: Native MCP
  • VS Code: Own extension
  • ChatGPT: Custom GPT Plugin
  • CLI: Direct commands

2. MCP Server Layer

This layer acts as the communication brain, following the Model Context Protocol open standard to ensure full interoperability.

// MCP Tool example
tool: {
  name: "search_context",
  inputSchema: { /* JSON Schema */ }
}

Converts requests into Context Engine calls.

3. Core Logic Layer

Here resides the system’s intelligence, where context is processed, validated, and governed before reaching the tool executor.

Context Engine

Orchestrates intelligent search:

  1. Embedding: Text → vector (Voyage 4, 1536D)
  2. Search: HNSW search in Qdrant (top-100)
  3. Reranking: Refines top-100 → top-10 (Voyage Rerank 2.5)
  4. Compaction: Reduces size while maintaining context (head/tail)
  5. Validation: Harness validates output

Harness Runtime

Protection and validation:

  • Pre-execution: Guardian checks, rate limit, preconditions
  • Execution: Wrapped tool call with timeout/retry
  • Post-execution: Validation, metrics, comparison mode

Guardian Blocklist

Hard-coded security:

  • Trust Folder: /absolute/path/to/src is the perimeter
  • Path Isolation: Directory traversal blocked
  • Pattern Matching: Regex rules for blocking
  • Audit Logging: All attempts recorded

RBAC (Role-Based Access Control)

5 hierarchical levels:

Owner
  ├─ Edit namespace, manage users
  ├─ Admin
  │ ├─ Configure server, manage keys
  │ ├─ Editor
  │ │ ├─ Index, search, analyze
  │ │ ├─ Viewer
  │ │ │ └─ Search only
  │ │ └─ Guest
  │ │ └─ Limited search (rate limited)

15 granular permissions: search, index, delete, configure, etc.

4. Storage Layer

The storage layer ensures that vector indices and metadata are persisted securely and efficiently at a local or distributed level.

Qdrant (Vector Database)

  • Collections: One per namespace
  • HNSW: Hierarchical Navigable Small World
  • Metadata Filtering: Pre-filtering by namespace
  • Quantization: Dimensionality reduction (4x faster)
collection: "your-namespace"
vectors:
  size: 1536
  distance: cosine
  hnsw:
    m: 16
    ef_construct: 200
    ef_search: 150

Local Storage

  • Indexing State: .vectora/ (cache)
  • Configuration: vectora.config.yaml
  • Credentials: ~/.vectora/credentials.enc (encrypted)
  • AGENTS.md: Agent memory (json-in-frontmatter)

Data Flow

Data flow in Vectora is optimized for ultra-low latency, ensuring that context is retrieved and validated in milliseconds.

Semantic Search

1. User Query
   "How to do authentication?"
2. Embedding (Voyage 4)
   [0.12, 0.45, ..., 0.67] (1536D)
3. Vector Search (HNSW/Qdrant)
   Top-100 chunks by cosine similarity
4. Reranking (Voyage Rerank 2.5)
   Refines to top-10 (semantic relevance)
5. Compaction
   Head (first lines) + Tail (last lines)
   Maintains context, reduces tokens
6. Validation (Harness)
   - Output schema
   - Security checks
   - Metrics captured
7. Response
   {chunks: [...], precision: 0.87}
8. To IDE
   Claude/Cursor/VS Code receive chunks

Rate Limiting & SLA

Request → Guardian (check blocklist) →
Rate Limiter (60 req/min free tier) →
Timeout (30s default) →
Retry (3 attempts) →
Circuit Breaker (fail-open after 5 errors)

Key Components

ComponentFunctionProvider
EmbeddingConvert text→vectorVoyage 4
Vector StoreStore/search vectorsQdrant
RerankingRefine relevanceVoyage Rerank 2.5
LLMReasoning + analysisGemini 3 Flash
AuthToken validationJWT + RBAC
NamespaceLogical isolationQdrant collections
Trust FolderPath isolationGuardian

System Configuration

# vectora.config.yaml
project:
  name: "Your Project"
  namespace: "your-namespace"
  trust_folder: "./src"

providers:
  embedding:
    name: "voyage"
    model: "voyage-4"
  reranker:
    name: "voyage"
    model: "voyage-rerank-2.5"
  llm:
    name: "gemini"
    model: "gemini-3-flash"

context_engine:
  strategy: "semantic"
  max_depth: 3
  timeout_ms: 2000

harness:
  enabled: true
  pre_execution:
    validate_guardian: true
    rate_limit_per_minute: 60
  post_execution:
    validate_output: true
    capture_metrics: true

guardian:
  rules:
    - pattern: "^(src|docs)/"
      action: "allow"
    - pattern: "\.env.*"
      action: "block"

rbac:
  roles:
    - owner
    - admin
    - editor
    - viewer
    - guest

Performance Targets

MetricTargetTypical
Search Latency<500ms~234ms
Embedding<200ms~120ms
Reranking<100ms~50ms
Retrieval Precision≥ 0.65~0.78
Tool Accuracy≥ 0.95~0.98
Security Events00
Availability99.9%99.95%

Scalability

Horizontal

  • Multiple Qdrant clusters: For physical isolation
  • Load balancing: Between MCP servers
  • Read replicas: For large-scale search

Vertical

  • Quantization: Reduces size by 4x
  • Compaction: Reduces output by 50%
  • Caching: Local result in .vectora/

Security

Defense in Depth

1. Trust Folder (path isolation)
2. Guardian Blocklist (pattern matching)
3. RBAC (user-level permissions)
4. Harness (pre/post execution validation)
5. Audit Logging (who did what)
6. Encryption (API keys, tokens)

Data Privacy

  • BYOK (Bring Your Own Key): You control the keys
  • Local processing: Embeddings are calculated locally
  • No data sync: Code never leaves your server
  • Audit trail: Complete and immutable

Next: Plans - Free


Part of the Vectora ecosystem · Open Source (MIT) · Contributors