MongoDB Atlas

When building agentic systems, developers often face “data fragmentation”:

Vectors in a specialized provider
Metadata in Postgres or another relational database
Session state in Redis or volatile storage
Audit logs in separate systems

This fragmentation introduces latency, inconsistency between systems, and significant operational complexity. Vectora resolves this by consolidating everything into MongoDB Atlas.

The Challenge: Unified Storage for Agents

Vectors in a specialized provider
Metadata in Postgres or another relational database
Session state in Redis or volatile storage
Audit logs in separate systems

This fragmentation introduces latency, inconsistency between systems, and significant operational complexity. Vectora resolves this by consolidating everything into MongoDB Atlas.

What is MongoDB Atlas

MongoDB Atlas is a fully managed multi-cloud data platform. While it started as a NoSQL document database, it evolved to include a robust Vector Search implementation natively integrated into the document model.

In Vectora, Atlas is not just a database; it is the infrastructure that sustains context governance, allowing vectors, metadata, operational state, and audit logs to coexist in the same ecosystem with guaranteed consistency.

Technical Specifications (Kaffyn Managed Level)

Feature	Detail
Database Type	Multi-cluster Document + Integrated Vector Search
Vector Indexing	HNSW (Hierarchical Navigable Small World)
Scalability	Automatic sharding with dynamic balancing
Availability	99.99% with Replica Sets across multiple zones
Encryption	AES-256 at rest and TLS 1.3 in transit
Backup	Continuous snapshots with configurable retention
Isolation	Logical namespaces with mandatory filtering

Why MongoDB Atlas for Vectora

The selection of MongoDB Atlas as a unified backend was grounded in three architectural pillars:

1. Metadata-Vector Atomicity

In Atlas, the embedding vector and the code metadata reside in the same BSON document. This eliminates consistency issues such as “orphan vectors” or outdated metadata. When a file is modified, the update of both embedding and metadata occurs in a single atomic operation.

2. Namespace Filtering with Native Performance

Vectora isolates data from different projects and users through logical namespaces. MongoDB Atlas allows metadata filters to be applied directly within the HNSW vector query. This ensures that semantic searches never return data from unauthorized namespaces, implementing multi-tenant isolation at the index level.

3. Integrated State and Memory Layer

Unlike solutions specialized only in vectors, Atlas efficiently stores both semantic embeddings and structural data like session history, agent operational state, and persistent memory. This allows Vectora to retrieve historical context and semantic vectors in a single connection, reducing latency and complexity.

Collection Structure in Vectora

The backend is organized into collections optimized to support the Harness Runtime and the MCP operation flow:

documents Collection

Stores processed code chunks, AST metadata, file paths, and embeddings generated by Voyage 4.

embedding_vector field: 1024-dimension vector (Voyage 4)
HNSW index configured with optimized efConstruction and maxConnections
Mandatory filters by namespace_id and visibility on all queries

sessions Collection

Stores the history of interactions from the main agent via MCP, decisions made by the Context Engine, and the current state of the execution plan.

Session key by userId + namespace
Configurable TTL for automatic cleanup of inactive sessions
At-rest encryption for sensitive session data

audit_logs Collection

Immutable records of each tool executed, identifying who executed it, when, which tool, and the operation result (metadata only, never code content).

Append-only structure for forensic integrity
Indexing by timestamp and userId for efficient audit queries
Configurable retention based on plan and compliance policies

How Vectora Optimizes Atlas

Dynamic HNSW Index Configuration

Vectora dynamically adjusts the efConstruction and maxConnections parameters of the HNSW index based on the volume and distribution of user data. Smaller codebases receive configurations optimized for low latency; large codebases receive configurations that prioritize recall accuracy.

Pre-Indexing Semantic Compaction

Before saving documents to Atlas, Vectora applies compaction algorithms that remove syntactic noise and preserve only semantically relevant content. This reduces storage volume and improves vector search efficiency without compromising the quality of the retrieved context.

Transparent Embedding Fallback

In scenarios where the primary provider (Voyage 4) is unavailable, Vectora automatically routes to gemini-embedding-2, maintaining the same vector dimension (1024) for compatibility with existing indices. The fallback is transparent to the main agent and requires no reindexing.

Kaffyn Management (Zero Ops)

When you use Vectora, you don’t need to manually configure instances in the MongoDB console. Kaffyn provisions and manages the backend automatically:

Free Plan: Optimized shared cluster, 512MB total storage limit, 30-day retention after inactivity for vector index
Pro Plan: High-performance dedicated or serverless cluster, 10GB limit, priority backups, and 90-day retention after cancellation
Team Plan: Clusters with VPC peering, 50GB limit, granular RBAC, and 180-day retention after cancellation
Security: Each user and team receives isolated credentials, encrypted namespaces, and access policies validated at runtime

Backend FAQ

Q: Is my code data sent to the cloud? A: Yes, embeddings (numerical vectors) and structural metadata (AST, paths, timestamps) are stored in the MongoDB Atlas managed by Kaffyn. Raw code content is processed locally by the Guardian to ensure that secrets and sensitive files are never indexed or transmitted.

Q: Can I use my own MongoDB Atlas cluster? A: Yes, in the Enterprise plan or via the backend.custom_connection_string configuration in vectora.config.yaml. This option requires manual configuration of collections, indices, and security policies.

Q: What happens if the backend becomes unavailable? A: Vectora implements local fallback for basic filesystem operations and caching of recent embeddings. The Harness Runtime detects unavailability and degrades gracefully, maintaining essential functionality while notifying the user.

Q: How is isolation between namespaces guaranteed? A: All Atlas queries include mandatory filters by namespace_id and visibility. RBAC at the application layer validates permissions before any query, and the Guardian blocks unauthorized access even if application validation fails.

Q: Can I export my data from Atlas? A: Yes. The vectora export command allows exporting metadata, embeddings (as base64), and audit logs in a portable format. Exporting is available at any time, regardless of plan or subscription status.

External Linking

Concept	Resource	Link
MongoDB Atlas	Atlas Vector Search Documentation	www.mongodb.com/docs/atlas/atlas-vector-search/
MCP	Model Context Protocol Specification	modelcontextprotocol.io/specification
MCP Go SDK	Go SDK for MCP (mark3labs)	github.com/mark3labs/mcp-go
Voyage Embeddings	Voyage Embeddings Documentation	docs.voyageai.com/docs/embeddings
Voyage Reranker	Voyage Reranker API	docs.voyageai.com/docs/reranker
HNSW	Efficient and robust approximate nearest neighbor search	arxiv.org/abs/1603.09320

Phrase to remember: “MongoDB Atlas is where Vectora stores structured knowledge. The intelligence is in the runtime; the memory is in Atlas; governance is in the application.”

Part of the Vectora ecosystem · Open Source (MIT) · Contributors

State Persistence