Skip to content

Local Reranker

PT | EN

Local Reranker provides intelligent retrieval without vector database dependency, combining deterministic search with semantic reranking. Ideal for mutable data, dynamic files, and scenarios where embedding cost is prohibitive.

The Problem: Why Embeddings Are Expensive

Embeddings are essential for semantic search, but have significant overhead with mutable data. Every file change requires re-embedding, re-indexing, and vector DB synchronization. For evolving documentation, Git repositories, or dynamic logs, this cost multiplies.

ScenarioEmbedding CostSync TimeViability
Static dataSingle (~$0.1)1x[x] Excellent
10 data changes10x (~$1)10x[!] Marginal
100 data changes100x (~$10)100x[ ] Unfeasible
Real-time logsN/AContinuous[ ] Impossible

Local Reranker solves this by inverting the paradigm: search first (deterministic, free), rank second (semantic, cheap).

The Solution: Late Binding of Relevance

Instead of upfront embeddings, Local Reranker uses late binding — semantic relevance is calculated only for candidate documents found by deterministic search (keyword, AST, regex).

Flow:

  1. Query Understanding — Gemini 3 Flash decomposes query into intent + entities
  2. search_local() — Deterministic search returns 50-100 candidates (fast, free)
  3. Reranking — Voyage Rerank 2.5 sorts top 10-20 by relevance (minimal cost)
  4. Response — Top-3 delivered to user

Result: 95% of embedding quality, 10% of the cost.

Detailed Architecture

Query Understanding with Gemini 3 Flash

interface QueryAnalysis {
  intent: "search" | "navigate" | "explain" | "compare";
  entities: string[];
  keywords: string[];
  context: "code" | "docs" | "logs" | "config";
  urgency: "immediate" | "thorough";
}

async function analyzeQuery(query: string): Promise<QueryAnalysis> {
  const prompt = `Analyze this query for a code documentation system:
    Query: "${query}"

    Return JSON: { intent, entities, keywords, context, urgency }`;

  const response = await gemini.generateContent(prompt);
  return JSON.parse(response.text());
}

Gemini breaks down “How do I validate JWTs with custom claims?” into:

  • intent: “explain”
  • entities: [“JWT”, “validation”, “custom claims”]
  • keywords: [“validate”, “JWT”, “claims”]
  • context: “code”
  • urgency: “immediate”

search_local() — Deterministic Search

interface SearchCandidate {
  file: string;
  lineStart: number;
  lineEnd: number;
  snippet: string;
  score: number; // 0-100, based on match quality
}

async function searchLocal(query: QueryAnalysis, namespace: string): Promise<SearchCandidate[]> {
  const results = [];

  // 1. Keyword match
  const keywordMatches = await searchKeywords(query.keywords, namespace);

  // 2. AST match (for code)
  let astMatches = [];
  if (query.context === "code") {
    astMatches = await searchAST(query.entities, namespace);
  }

  // 3. Regex match (flexible)
  const regexMatches = await searchRegex(query.keywords, namespace);

  // Merge and deduplicate
  const merged = mergeResults([...keywordMatches, ...astMatches, ...regexMatches]);

  // Top 50-100 candidates
  return merged.slice(0, 100);
}

Reranking with Voyage Rerank 2.5

interface RerankedResult {
  file: string;
  snippet: string;
  relevanceScore: number; // 0-1, confidence
  explanation: string;
}

async function rerank(query: string, candidates: SearchCandidate[]): Promise<RerankedResult[]> {
  // Llama Index / Cohere integration
  const reranker = new VoyageReranker({
    modelName: "rerank-2.5-v2",
    apiKey: process.env.VOYAGE_API_KEY,
  });

  const reranked = await reranker.rerank(
    query,
    candidates.map((c) => c.snippet),
    { topK: 10 },
  );

  return reranked.map((result, idx) => ({
    file: candidates[result.index].file,
    snippet: candidates[result.index].snippet,
    relevanceScore: result.score,
    explanation: `Matched: ${query
      .split(" ")
      .filter((w) => candidates[result.index].snippet.toLowerCase().includes(w.toLowerCase()))
      .join(", ")}`,
  }));
}

Technical Comparison

AspectEmbeddings (VectorDB)Local RerankerHybrid
Latency50-100ms10-30ms30-50ms (best UX)
Cost per query$0.001-0.01$0.0001-0.0005$0.0005-0.005
Quality (recall)98%85-90%96%+
Setup time1-2 weeks1-2 hours2-3 hours
InfrastructureVector DB + LLMLLM + RerankerVectorDB + Reranker
Mutable data[!] Complex[x] Native[x] Excellent
Scalability (10K docs)[x] Excellent[!] Marginal[x] Excellent

Hybrid Strategy: Tiered Retrieval

For maximum flexibility, use both:

    graph LR
    A["Query"] --> B["Query Understanding"]
    B --> C{Metadata?}
    C -->|Yes, mutable| D["Local Reranker"]
    C -->|No, static| E["Vector Search"]
    D --> F["Rerank"]
    E --> F
    F --> G["Top-3 Results"]
  

Logic:

  • If data is mutable (files, logs, dynamic DB) → Local Reranker
  • If data is static (published docs, trained models) → Vector Search
  • If both exist → Try Local Reranker first, fallback to Vector Search

Advanced Optimizations

Progressive Retrieval

Retrieve in stages, stopping when confidence is sufficient:

async function progressiveRetrieve(query: string) {
  // Stage 1: Top keywords (10ms)
  const stage1 = await searchKeywords(query, 20);
  const confidence1 = calculateConfidence(stage1);

  if (confidence1 > 0.8) return stage1; // 80% confidence, return

  // Stage 2: AST + Regex (20ms)
  const stage2 = await Promise.all([searchAST(query), searchRegex(query)]);
  const merged = mergeResults([...stage1, ...stage2]);
  const confidence2 = calculateConfidence(merged);

  if (confidence2 > 0.9) return merged;

  // Stage 3: Rerank top 50 (30ms)
  return await rerank(query, merged.slice(0, 50));
}

Reduces average latency from 30ms to 12ms (60% faster).

Lazy Embedding

Embed only top-10 candidates, not entire library:

async function lazyEmbed(candidates: SearchCandidate[]) {
  // Only top 10 receive full embedding
  const topK = 10;
  const toEmbed = candidates.slice(0, topK);

  const embeddings = await batchEmbed(toEmbed.map((c) => c.snippet));

  // Recalculate relevance with embeddings
  const semanticScores = await cosineSimilarity(queryEmbedding, embeddings);

  // Hybrid score: 60% deterministic + 40% semantic
  return toEmbed.map((c, idx) => ({
    ...c,
    finalScore: 0.6 * c.score + 0.4 * semanticScores[idx],
  }));
}

Cost Analysis

For 1,000 queries/day on technical documentation:

Full VectorDB Embeddings:

  • Embedding: 1K queries × $0.01 = $10/day
  • Vector DB: ~$50/month (Pinecone Pro)
  • Total: ~$350/month

Local Reranker:

  • Reranking: 1K queries × $0.0003 = $0.30/day
  • LLM (Gemini): 1K queries × $0.0001 = $0.10/day
  • Total: ~$12/month

Savings: 97% cost reduction.

Limitations and Mitigations

LimitationRoot CauseMitigation
~85% recall (vs 98% in VectorDB)Deterministic search imprecise[x] Hybrid with VectorDB for critical recall
Performance degrades 50K+ docsO(n) keyword matching[x] Secondary indexing (SQLite/PostgreSQL)
No pure semantic searchLate binding always deterministic[x] Lazy embedding for top-K
Fails in specialized domainsGeneric LLM unfamiliar[x] Fine-tune Query Understanding with examples

Configuration

# vectora.yaml
retrieval:
  strategy: "hybrid" # hybrid, local_only, vector_only

  local:
    enabled: true
    max_candidates: 100 # Top-K before rerank
    rerank_top_k: 10 # Final results
    progressive: true # Confidence-based early exit
    confidence_threshold: 0.8 # Stop if conf > 80%

    search_methods:
      keywords:
        weight: 0.5
        enabled: true
      ast:
        weight: 0.3
        enabled: true
        languages: [typescript, python, go]
      regex:
        weight: 0.2
        enabled: true

  vector:
    enabled: true
    fallback_on_low_confidence: true
    min_confidence: 0.7

  reranker:
    model: "voyage-rerank-2.5-v2"
    api_key: ${VOYAGE_API_KEY}
    timeout_ms: 5000

Frequently Asked Questions

Q: Should I use Local Reranker or Vector Search? A: Local Reranker for mutable/dynamic data (97% savings). Vector Search for static data (maximum recall). Hybrid for both.

Q: What’s the real latency? A: Local Reranker: 10-30ms. Vector Search: 50-100ms. Hybrid: 30-50ms (best trade-off).

Q: Can I use it with confidential data? A: Yes. Data is never sent to external embeddings. Everything stays local (keyword, AST, regex).

Q: Does it work with programming languages? A: Yes. Query Understanding works with any language. AST matching supports TypeScript, Python, Go (extensible).

Q: What if deterministic search fails? A: Hybrid strategy with Vector DB fallback. Configure fallback_on_low_confidence: true.

Next Steps

  1. Integrate — Add Local Reranker to your namespace in 1-2 hours
  2. Test — Compare latency and cost vs your current solution
  3. Optimize — Adjust confidence_threshold and max_candidates for your use case
  4. Scale — When critical recall needed, enable Hybrid Strategy with VectorDB
  5. Monitor — Track metrics via vectora analytics (confidence, latency, cost)

Want to explore Local Reranker? Open a Discussion