Vector Search
Vector search is the core mechanism that allows Vectora to retrieve semantically relevant context in complex codebases. Unlike keyword-based text searches, vector search operates in semantic space, capturing functional similarity between code concepts.
Vector Search Fundamentals
How It Works
- Embedding: Code snippets are transformed into high-dimensional numerical vectors using the
voyage-4model. - Indexing: Vectors are stored in MongoDB Atlas with an HNSW index for Approximate Nearest Neighbor (ANN) search.
- Query: A query is converted into an embedding and compared against the index to find the most similar vectors.
- Filtering: Results are filtered by namespace, visibility, and structural metadata before being returned to the agent.
graph LR
A[Source Code] --> B[AST Parser + Chunking]
B --> C[Embedding via Voyage 4]
C --> D[HNSW Index in MongoDB Atlas]
E[User Query] --> F[Query Embedding]
F --> D
D --> G[Filtering by Namespace + Metadata]
G --> H[Structured Context for LLM]
Why Vector Search for Code
Traditional text searches fail in software engineering scenarios because:
- Lexical similarity does not imply functional similarity:
validateTokenandcheckJWTmight be semantically equivalent but lexically distinct. - Boilerplate generates noise: Files with similar structures but different logic appear relevant.
- Implicit dependencies are not captured: Imports, function calls, and architectural patterns require structural understanding.
Specialized code embeddings, such as voyage-4, are trained on billions of snippets and capture:
- Functional similarity between implementations
- Recurring architectural patterns
- Relationships between imports and dependencies
- Semantic context from comments and docstrings
Vector Search Architecture in Vectora
Unified Backend: MongoDB Atlas
Vectora uses MongoDB Atlas as a unified backend for vectors, metadata, and operational state. This choice eliminates the need for synchronization between separate systems and ensures atomic consistency between embeddings and their associated metadata.
| Component | Implementation | Benefit |
|---|---|---|
| Vector Index | HNSW with cosine metric | ANN search with logarithmic complexity |
| Vector Storage | embedding_vector field in BSON document | Vector and metadata in the same document |
| Filtering | Native Atlas payload filtering | Filter by namespace before vector search |
| Scalability | Automatic Atlas sharding | Scale from MBs to TBs without manual reconfiguration |
Atlas Document Structure
Each indexed code chunk is stored as a MongoDB document with the following structure:
{
"_id": "ObjectId(...)",
"namespace_id": "auth-service",
"file_path": "src/auth/jwt_validator.go",
"start_line": 45,
"end_line": 78,
"content": "func ValidateToken(token string) error { ... }",
"ast_metadata": {
"function_name": "ValidateToken",
"imports": ["github.com/golang-jwt/jwt"],
"dependencies": ["ParseToken", "VerifySignature"]
},
"embedding_vector": [0.023, -0.145, ..., 0.089],
"visibility": "private",
"indexed_at": "2026-04-18T22:30:00Z",
"checksum": "sha256:abc123..."
}HNSW Index Configuration
Vectora configures HNSW indices in MongoDB Atlas with parameters optimized for codebases:
# Default vector index configuration
vector_index:
name: "vector_search_index"
path: "embedding_vector"
dimensions: 1024 # Voyage 4
similarity: "cosine"
type: "vector"
hnsw_config:
m: 16 # Number of connections per node
ef_construction: 200 # Accuracy in index construction
ef_search: 100 # Accuracy in search (configurable per query)Adjustable parameters based on codebase size:
| Parameter | Low Value | High Value | Impact |
|---|---|---|---|
m | 8 | 32 | More connections = higher accuracy, more memory |
ef_construction | 100 | 400 | More candidates in construction = more precise index |
ef_search | 50 | 200 | More candidates in search = higher recall, higher latency |
Indexing Pipeline
AST-Guided Chunking
Before generating embeddings, Vectora parses the code using tree-sitter to identify coherent semantic units:
- Functions and methods
- Classes and structs
- Conditional logic blocks
- Imports and type declarations
Each chunk is limited to 512 tokens for compatibility with the embedding model, preserving syntactic boundaries whenever possible.
// packages/core/src/indexer/chunker.ts
export function chunkCodeByAST(content: string, language: string): CodeChunk[] {
const parser = new Parser();
parser.setLanguage(getLanguage(language));
const tree = parser.parse(content);
return recursiveChunk(tree.rootNode, {
maxTokens: 512,
preserveBoundaries: true, // Do not cut in the middle of a function
includeImports: true, // Append import list to the chunk
minSize: 32, // Ignore very small chunks
});
}Embedding Generation with Voyage 4
Each chunk is sent to the Voyage AI API for embedding generation:
// packages/core/src/providers/voyage.ts
export async function generateEmbedding(chunk: CodeChunk): Promise<number[]> {
const response = await voyageClient.embed({
input: chunk.content,
model: "voyage-4",
encoding_format: "float",
input_type: "document", // Optimized for code
});
return response.data[0].embedding;
}The voyage-4 model was chosen for its:
- Fixed dimension of 1024, compatible with HNSW indices
- Specialized training on code, capturing functional similarity
- Long context support, allowing chunks with more structure
- Stable API with integrated retry logic and rate limiting
Atomic Insertion into Atlas
Vector and metadata are inserted into MongoDB Atlas in a single atomic operation:
// packages/core/src/backend/atlas-writer.ts
export async function insertChunkWithVector(chunk: CodeChunk, embedding: number[]): Promise<void> {
await mongodb.collection("documents").insertOne({
namespace_id: chunk.namespace,
file_path: chunk.filePath,
content: chunk.content,
ast_metadata: chunk.astMetadata,
embedding_vector: embedding,
visibility: chunk.visibility,
indexed_at: new Date(),
checksum: chunk.checksum,
});
// HNSW index is automatically updated by Atlas
}Vector Query with Namespace Filtering
Query Flow
When a main agent requests context via MCP:
- The query is converted into an embedding using
voyage-4. - A vector search is executed in Atlas with mandatory filters for
namespace_idandvisibility. - Results are re-ranked by similarity score and limited to the configured
top_k. - Structural metadata (AST, imports) is attached to enrich the returned context.
// packages/core/src/context/vector-search.ts
export async function semanticSearch(
query: string,
namespace: string,
options: SearchOptions,
): Promise<SearchResult[]> {
// 1. Query embedding
const queryEmbedding = await generateEmbedding({
content: query,
} as CodeChunk);
// 2. Vector search with mandatory filters
const results = await mongodb
.collection("documents")
.aggregate([
{
$vectorSearch: {
index: "vector_search_index",
path: "embedding_vector",
queryVector: queryEmbedding,
numCandidates: options.ef_search || 100,
limit: options.top_k || 10,
filter: {
namespace_id: namespace,
visibility: { $in: ["private", "team", "public"] },
},
},
},
{
$project: {
score: { $meta: "vectorSearchScore" },
file_path: 1,
content: 1,
ast_metadata: 1,
start_line: 1,
end_line: 1,
},
},
])
.toArray();
// 3. Enrich with structural metadata
return results.map((r) => enrichWithAST(r));
}Namespace Isolation
All vector queries include mandatory filters for namespace_id. This ensures that:
- Data from different projects never mix
privatenamespaces remain isolated even in multi-tenant clusterspublicnamespaces can be mounted in multiple workspaces without data duplication
# Example of filter automatically applied
filter:
namespace_id: "auth-service"
visibility: { $in: ["private", "team"] }Performance Optimizations
Query Embedding Cache
Frequent queries are cached to avoid repeated calls to the Voyage API:
// packages/core/src/cache/query-embeddings.ts
export class QueryEmbeddingCache {
private cache: Map<string, { embedding: number[]; timestamp: number }>;
private readonly TTL_MS = 24 * 60 * 60 * 1000; // 24 hours
async getOrGenerate(query: string): Promise<number[]> {
const key = createHash("sha256").update(query).digest("hex");
const cached = this.cache.get(key);
if (cached && Date.now() - cached.timestamp < this.TTL_MS) {
return cached.embedding;
}
const embedding = await generateEmbedding({ content: query } as CodeChunk);
this.cache.set(key, { embedding, timestamp: Date.now() });
return embedding;
}
}Batch Insertion for Mass Indexing
During initial ingestion or re-indexing, chunks are processed in batches to maximize throughput:
// packages/core/src/indexer/batch-ingest.ts
export async function batchIngest(chunks: CodeChunk[], batchSize: number = 32): Promise<void> {
for (let i = 0; i < chunks.length; i += batchSize) {
const batch = chunks.slice(i, i + batchSize);
// Generate embeddings in parallel
const embeddings = await Promise.all(batch.map((chunk) => generateEmbedding(chunk)));
// Insert into Atlas in bulk
await mongodb.collection("documents").insertMany(
batch.map((chunk, idx) => ({
...chunk,
embedding_vector: embeddings[idx],
indexed_at: new Date(),
})),
);
}
}Dynamic ef_search Adjustment
The ef_search parameter controls the trade-off between accuracy and latency. Vectora adjusts it dynamically based on the query context:
- General navigation queries:
ef_search=50(low latency) - Critical refactoring queries:
ef_search=150(high accuracy) - Multiple-hop queries:
ef_search=200(maximum recall)
// packages/core/src/context/search-config.ts
export function getEfSearchForQuery(query: QueryContext): number {
if (query.intent === "refactor" || query.intent === "security_audit") {
return 150;
}
if (query.multiHop) {
return 200;
}
return 100; // default
}Integration with Context Engine
Vector search is just one source of context. The Context Engine decides when to use vector search, filesystem search, or a hybrid combination:
graph TD
A[Main Agent Query] --> B{Context Engine}
B -->|Semantic query| C[Vector Search + Voyage 4]
B -->|Structural query| D[AST Search + Filesystem]
B -->|Hybrid query| E[Vector + Structural Combination]
C --> F[Namespace Filtering + Reranking]
D --> F
E --> F
F --> G[Structured Context for LLM]
Optional Reranking
For critical queries, vector search results can undergo reranking with voyage-rerank-2.5 for higher accuracy:
// packages/core/src/context/reranker.ts
export async function rerankResults(query: string, results: SearchResult[]): Promise<SearchResult[]> {
const documents = results.map((r) => r.content);
const reranked = await voyageClient.rerank({
query,
documents,
model: "voyage-rerank-2.5",
top_k: results.length,
});
return reranked.results.sort((a, b) => b.relevance_score - a.relevance_score).map((r) => results[r.index]);
}FAQ
Q: What is the dimension of the vectors generated by Voyage 4? A: 1024 dimensions. This fixed dimension allows for efficient HNSW indices and compatibility between queries and documents.
Q: How is namespace isolation guaranteed in vector search?
A: All MongoDB Atlas queries include mandatory filters for namespace_id and visibility. RBAC at the application layer validates permissions before any query.
Q: Can I adjust vector search accuracy?
A: Yes. The ef_search parameter controls the trade-off between recall and latency. Higher values increase accuracy but also increase latency.
Q: What happens if the Voyage API is unavailable?
A: Vectora automatically routes to gemini-embedding-2 as a fallback, maintaining the same vector dimension for compatibility with existing indices.
Q: How are embeddings updated when code changes? A: The file watcher detects modifications, recalculates embeddings for affected chunks, and updates documents in Atlas atomically. Unmodified chunks remain unchanged.
Q: Does vector search work for documentation and comments?
A: Yes. The voyage-4 model is trained on code and technical documentation, capturing semantic similarity between comments, docstrings, and implementations.
Phrase to remember: “Embedding transforms code into a vector. HNSW finds similar ones. Namespace filters the scope. Context Engine orchestrates the result.”
External Linking
| Concept | Resource | Link |
|---|---|---|
| MongoDB Atlas | Atlas Vector Search Documentation | www.mongodb.com/docs/atlas/atlas-vector-search/ |
| Voyage Embeddings | Voyage Embeddings Documentation | docs.voyageai.com/docs/embeddings |
| Voyage Reranker | Voyage Reranker API | docs.voyageai.com/docs/reranker |
| AST Parsing | Tree-sitter Official Documentation | tree-sitter.github.io/tree-sitter/ |
| HNSW | Efficient and robust approximate nearest neighbor search | arxiv.org/abs/1603.09320 |
| JWT | RFC 7519: JSON Web Token Standard | datatracker.ietf.org/doc/html/rfc7519 |
Part of the Vectora ecosystem · Open Source (MIT) · Contributors