Local Reranker
Local Reranker provides intelligent retrieval without vector database dependency, combining deterministic search with semantic reranking. Ideal for mutable data, dynamic files, and scenarios where embedding cost is prohibitive.
The Problem: Why Embeddings Are Expensive
Embeddings are essential for semantic search, but have significant overhead with mutable data. Every file change requires re-embedding, re-indexing, and vector DB synchronization. For evolving documentation, Git repositories, or dynamic logs, this cost multiplies.
| Scenario | Embedding Cost | Sync Time | Viability |
|---|---|---|---|
| Static data | Single (~$0.1) | 1x | [x] Excellent |
| 10 data changes | 10x (~$1) | 10x | [!] Marginal |
| 100 data changes | 100x (~$10) | 100x | [ ] Unfeasible |
| Real-time logs | N/A | Continuous | [ ] Impossible |
Local Reranker solves this by inverting the paradigm: search first (deterministic, free), rank second (semantic, cheap).
The Solution: Late Binding of Relevance
Instead of upfront embeddings, Local Reranker uses late binding — semantic relevance is calculated only for candidate documents found by deterministic search (keyword, AST, regex).
Flow:
- Query Understanding — Gemini 3 Flash decomposes query into intent + entities
- search_local() — Deterministic search returns 50-100 candidates (fast, free)
- Reranking — Voyage Rerank 2.5 sorts top 10-20 by relevance (minimal cost)
- Response — Top-3 delivered to user
Result: 95% of embedding quality, 10% of the cost.
Detailed Architecture
Query Understanding with Gemini 3 Flash
interface QueryAnalysis {
intent: "search" | "navigate" | "explain" | "compare";
entities: string[];
keywords: string[];
context: "code" | "docs" | "logs" | "config";
urgency: "immediate" | "thorough";
}
async function analyzeQuery(query: string): Promise<QueryAnalysis> {
const prompt = `Analyze this query for a code documentation system:
Query: "${query}"
Return JSON: { intent, entities, keywords, context, urgency }`;
const response = await gemini.generateContent(prompt);
return JSON.parse(response.text());
}Gemini breaks down “How do I validate JWTs with custom claims?” into:
- intent: “explain”
- entities: [“JWT”, “validation”, “custom claims”]
- keywords: [“validate”, “JWT”, “claims”]
- context: “code”
- urgency: “immediate”
search_local() — Deterministic Search
interface SearchCandidate {
file: string;
lineStart: number;
lineEnd: number;
snippet: string;
score: number; // 0-100, based on match quality
}
async function searchLocal(query: QueryAnalysis, namespace: string): Promise<SearchCandidate[]> {
const results = [];
// 1. Keyword match
const keywordMatches = await searchKeywords(query.keywords, namespace);
// 2. AST match (for code)
let astMatches = [];
if (query.context === "code") {
astMatches = await searchAST(query.entities, namespace);
}
// 3. Regex match (flexible)
const regexMatches = await searchRegex(query.keywords, namespace);
// Merge and deduplicate
const merged = mergeResults([...keywordMatches, ...astMatches, ...regexMatches]);
// Top 50-100 candidates
return merged.slice(0, 100);
}Reranking with Voyage Rerank 2.5
interface RerankedResult {
file: string;
snippet: string;
relevanceScore: number; // 0-1, confidence
explanation: string;
}
async function rerank(query: string, candidates: SearchCandidate[]): Promise<RerankedResult[]> {
// Llama Index / Cohere integration
const reranker = new VoyageReranker({
modelName: "rerank-2.5-v2",
apiKey: process.env.VOYAGE_API_KEY,
});
const reranked = await reranker.rerank(
query,
candidates.map((c) => c.snippet),
{ topK: 10 },
);
return reranked.map((result, idx) => ({
file: candidates[result.index].file,
snippet: candidates[result.index].snippet,
relevanceScore: result.score,
explanation: `Matched: ${query
.split(" ")
.filter((w) => candidates[result.index].snippet.toLowerCase().includes(w.toLowerCase()))
.join(", ")}`,
}));
}Technical Comparison
| Aspect | Embeddings (VectorDB) | Local Reranker | Hybrid |
|---|---|---|---|
| Latency | 50-100ms | 10-30ms | 30-50ms (best UX) |
| Cost per query | $0.001-0.01 | $0.0001-0.0005 | $0.0005-0.005 |
| Quality (recall) | 98% | 85-90% | 96%+ |
| Setup time | 1-2 weeks | 1-2 hours | 2-3 hours |
| Infrastructure | Vector DB + LLM | LLM + Reranker | VectorDB + Reranker |
| Mutable data | [!] Complex | [x] Native | [x] Excellent |
| Scalability (10K docs) | [x] Excellent | [!] Marginal | [x] Excellent |
Hybrid Strategy: Tiered Retrieval
For maximum flexibility, use both:
graph LR
A["Query"] --> B["Query Understanding"]
B --> C{Metadata?}
C -->|Yes, mutable| D["Local Reranker"]
C -->|No, static| E["Vector Search"]
D --> F["Rerank"]
E --> F
F --> G["Top-3 Results"]
Logic:
- If data is mutable (files, logs, dynamic DB) → Local Reranker
- If data is static (published docs, trained models) → Vector Search
- If both exist → Try Local Reranker first, fallback to Vector Search
Advanced Optimizations
Progressive Retrieval
Retrieve in stages, stopping when confidence is sufficient:
async function progressiveRetrieve(query: string) {
// Stage 1: Top keywords (10ms)
const stage1 = await searchKeywords(query, 20);
const confidence1 = calculateConfidence(stage1);
if (confidence1 > 0.8) return stage1; // 80% confidence, return
// Stage 2: AST + Regex (20ms)
const stage2 = await Promise.all([searchAST(query), searchRegex(query)]);
const merged = mergeResults([...stage1, ...stage2]);
const confidence2 = calculateConfidence(merged);
if (confidence2 > 0.9) return merged;
// Stage 3: Rerank top 50 (30ms)
return await rerank(query, merged.slice(0, 50));
}Reduces average latency from 30ms to 12ms (60% faster).
Lazy Embedding
Embed only top-10 candidates, not entire library:
async function lazyEmbed(candidates: SearchCandidate[]) {
// Only top 10 receive full embedding
const topK = 10;
const toEmbed = candidates.slice(0, topK);
const embeddings = await batchEmbed(toEmbed.map((c) => c.snippet));
// Recalculate relevance with embeddings
const semanticScores = await cosineSimilarity(queryEmbedding, embeddings);
// Hybrid score: 60% deterministic + 40% semantic
return toEmbed.map((c, idx) => ({
...c,
finalScore: 0.6 * c.score + 0.4 * semanticScores[idx],
}));
}Cost Analysis
For 1,000 queries/day on technical documentation:
Full VectorDB Embeddings:
- Embedding: 1K queries × $0.01 = $10/day
- Vector DB: ~$50/month (Pinecone Pro)
- Total: ~$350/month
Local Reranker:
- Reranking: 1K queries × $0.0003 = $0.30/day
- LLM (Gemini): 1K queries × $0.0001 = $0.10/day
- Total: ~$12/month
Savings: 97% cost reduction.
Limitations and Mitigations
| Limitation | Root Cause | Mitigation |
|---|---|---|
| ~85% recall (vs 98% in VectorDB) | Deterministic search imprecise | [x] Hybrid with VectorDB for critical recall |
| Performance degrades 50K+ docs | O(n) keyword matching | [x] Secondary indexing (SQLite/PostgreSQL) |
| No pure semantic search | Late binding always deterministic | [x] Lazy embedding for top-K |
| Fails in specialized domains | Generic LLM unfamiliar | [x] Fine-tune Query Understanding with examples |
Configuration
# vectora.yaml
retrieval:
strategy: "hybrid" # hybrid, local_only, vector_only
local:
enabled: true
max_candidates: 100 # Top-K before rerank
rerank_top_k: 10 # Final results
progressive: true # Confidence-based early exit
confidence_threshold: 0.8 # Stop if conf > 80%
search_methods:
keywords:
weight: 0.5
enabled: true
ast:
weight: 0.3
enabled: true
languages: [typescript, python, go]
regex:
weight: 0.2
enabled: true
vector:
enabled: true
fallback_on_low_confidence: true
min_confidence: 0.7
reranker:
model: "voyage-rerank-2.5-v2"
api_key: ${VOYAGE_API_KEY}
timeout_ms: 5000Frequently Asked Questions
Q: Should I use Local Reranker or Vector Search? A: Local Reranker for mutable/dynamic data (97% savings). Vector Search for static data (maximum recall). Hybrid for both.
Q: What’s the real latency? A: Local Reranker: 10-30ms. Vector Search: 50-100ms. Hybrid: 30-50ms (best trade-off).
Q: Can I use it with confidential data? A: Yes. Data is never sent to external embeddings. Everything stays local (keyword, AST, regex).
Q: Does it work with programming languages? A: Yes. Query Understanding works with any language. AST matching supports TypeScript, Python, Go (extensible).
Q: What if deterministic search fails?
A: Hybrid strategy with Vector DB fallback. Configure fallback_on_low_confidence: true.
Next Steps
- Integrate — Add Local Reranker to your namespace in 1-2 hours
- Test — Compare latency and cost vs your current solution
- Optimize — Adjust
confidence_thresholdandmax_candidatesfor your use case - Scale — When critical recall needed, enable Hybrid Strategy with VectorDB
- Monitor — Track metrics via
vectora analytics(confidence, latency, cost)
Want to explore Local Reranker? Open a Discussion