Gemini API

OWN APP: Vectora offers deep integration with Gemini 3 Flash via proprietary REST API. Use Gemini as the “reasoning engine” for code - analysis, review, code generation with Vectora context.

Gemini API (own system) vs MCP Protocol or Extension. Choose based on workflow: code analysis, code review, auto documentation.

Basic Configuration

Get Gemini API Key

Go to Google AI Studio
Click “Create API Key”
Copy the key

Configure in Vectora

# Option A: Via CLI interactive
vectora config set --key GEMINI_API_KEY

# Option B: Via .env
echo "GEMINI_API_KEY=sk-..." >> .env

# Option C: Via config.yaml
# vectora.config.yaml
providers:
  llm:
    name: "gemini"
    model: "gemini-3-flash"
    api_key: "${GEMINI_API_KEY}"

Verify

vectora config list
# Should show GEMINI_API_KEY: ••••••••••

Available Models

Model	Tokens	Latency	Cost	Use Case
`gemini-3-flash`	1M	<500ms	Free*	Default, fast analysis
`gemini-2-flash`	1M	<500ms	$$	Legacy (deprecated)
`gemini-pro`	32K	<1s	$$	Deep analysis
`gemini-vision`	4K	<2s	$$	Visual analysis (screenshots)

*Free tier: 60 req/min, 1.5M tokens/month

Selecting Model

# vectora.config.yaml
providers:
  llm:
    model: "gemini-3-flash"

    # Fallback if primary fails
    fallback_model: "gemini-pro"
    fallback_on:
      - "rate_limit_exceeded"
      - "timeout"

Real-World Workflows

The workflows below illustrate how Gemini 3 Flash can be used with Vectora context for deep code analysis, auto documentation, and performance optimization.

Workflow 1: Automatic Code Review (Architecture)

Scenario: Review new cache module before merge

vectora review \
  --file src/cache/redis.ts \
  --context "cache patterns in project" \
  --criteria "security, performance, testability" \
  --model gemini-pro \
  --output review.md

Gemini Output (with Vectora context):

# Code Review: src/cache/redis.ts

## Security

Redis PASSWORD in .env (not hardcoded)
TTL implemented (prevents stale cache)
No rate limiting for invalidation
Compared with: src/cache/memory.ts (better implemented)

## Performance

Cache hit rate 89% (project has 3 similar implementations)
Avoids N+1 queries (pattern followed in order.service.ts)

## Testability

Missing RedisClient mocks
See example at: src/**tests**/cache.mock.ts:23

## Recommendations

1. Add event-based invalidation
   (See patterns in src/events/cache-invalidator.ts)

2. Implement circuit breaker
   (Similar to: src/resilience/circuit-breaker.ts)

3. Add fallback tests when Redis fails
   Test example: src/**tests**/cache-fallback.test.ts

## Alignment

This code follows 95% of project patterns.
Ready to merge after feedback above.

Workflow 2: Automatic Documentation (API)

Scenario: Generate docs for new endpoints

vectora generate-docs \
  --from "src/routes/api.ts" \
  --type "openapi" \
  --llm gemini-3-flash \
  --output docs/api.openapi.yaml

Example output:

openapi: 3.0.0
info:
  title: Vectora API
  version: 1.0.0

paths:
  /search:
    post:
      summary: "Semantic search (generated by Gemini + context)"
      description: |
        Search by semantic similarity using embeddings.
        Implementation: src/routes/search.ts
        Uses: Context Engine with Voyage 4
        Latency: ~230ms
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: "#/components/schemas/SearchRequest"
      responses:
        "200":
          description: Success
          content:
            application/json:
              schema:
                $ref: "#/components/schemas/SearchResponse"
        "429":
          description: "Rate limited (see: src/middleware/rate-limit.ts)"

Workflow 3: Performance Analysis (Investigation)

Scenario: API is slow, need to identify bottlenecks

vectora analyze-performance \
  --query "Search is taking 2s. What's the problem?" \
  --include-logs "vectora.log" \
  --model gemini-pro \
  --output perf-report.md

Gemini Analysis:

# Performance Analysis Report

## Problem Identified

Search taking 2000ms when SLA is 300ms.

## Root Cause

Reranking via Voyage Rerank 2.5 taking 1.2s.

**Evidence in code:**

- src/context-engine.ts:145
  `const reranked = await reranker.rank(topK=100, maxTokens=2048)`

- Problem: maxTokens too high
- Comparison: another project uses maxTokens=512 (src/search/optimized.ts:23)
  Result: 180ms vs your 1200ms

## Solutions

1. **Quick win** (implement today)
   Reduce maxTokens from 2048 → 512
   Expected result: 1.8s → 0.8s

2. **Medium-term** (next sprint)
   Implement embedding cache
   See pattern at: src/cache/embedding-cache.ts

3. **Long-term**
   Use Gemini 3 Flash as reranker instead of Voyage
   Trade-off: faster but less precise
   Test: src/**tests**/reranker-comparison.ts

## Impact

- Current: p95=2000ms, p99=3000ms
- After fix #1: p95=800ms, p99=1500ms
- After fix #2: p95=400ms, p99=600ms

## Recommendation

Apply fix #1 today (10 min change), measure before #2.

Workflow 4: Type Generation (TypeScript)

Scenario: API returns complex JSON, need types

vectora generate-types \
  --from "src/responses/" \
  --target "typescript" \
  --model gemini-3-flash \
  --output "src/types/generated.ts"

Output (using common patterns in project):

// src/types/generated.ts
// Auto-generated by Gemini via Vectora Context Engine

export interface SearchResponse {
  chunks: SearchChunk[];
  metadata: {
    retrieval_precision: number;
    latency_ms: number;
    total_searched: number;
  };
  // Pattern seen in src/types/search.ts
}

export interface SearchChunk {
  file: string;
  line_start: number;
  line_end: number;
  content: string;
  precision: number; // 0.0 - 1.0
  // Compatible with: src/__tests__/search.mock.ts
}

// Generated types 98% compatible with existing codebase
// Follows conventions from src/types/*
// Includes guardrail types (src/security/guardian.types.ts)

Advanced Configuration

Custom System Prompt

providers:
  llm:
    model: "gemini-3-flash"
    system_prompt: |
      You are an expert in TypeScript code.
      Always cite line numbers when responding.
      Keep explanations concise.
      Focus on security and performance.

Temperature & Parameters

providers:
  llm:
    model: "gemini-3-flash"
    temperature: 0.7 # 0=deterministic, 1=creative
    top_p: 0.95 # Nucleus sampling
    max_tokens: 2048
    stop_sequences:
      - "\n---\n"

Streaming

For real-time responses:

vectora analyze \
  --query "Explain the authentication module" \
  --stream # Output in real-time
  --model gemini-pro

Context Engine Integration

Automatic Context Passing

Vectora automatically passes Context Engine chunks to Gemini:

Your prompt: "Why is this test failing?"

Context sent to Gemini:
{
  "chunks": [
    {"file": "spec/auth.test.ts", "content": "..."},
    {"file": "src/auth/jwt.ts", "content": "..."},
    {"file": "src/auth/guards.ts", "content": "..."}
  ],
  "precision": 0.78,
  "retrieval_time_ms": 234
}

Gemini reads context + your prompt → response

Reranking with Gemini

Optionally use Gemini as reranker instead of Voyage:

providers:
  reranker:
    name: "gemini" # Instead of "voyage"
    model: "gemini-3-flash"
    # Gemini evaluates relevance of each chunk
    # More accurate, but slower

Safety & Guardrails

Built-in Guardrails

Gemini has protections against:

Generating malicious code
Leaking sensitive data
Offensive content

Custom Safety Rules

providers:
  llm:
    safety:
      block_on_sensitive_patterns:
        - "api_key"
        - "password"
        - "secret"

      block_keywords:
        - "exploit"
        - "malware"

Audit Logging

VECTORA_LOG_LEVEL=debug vectora analyze "..."

# Logs include:
# [Gemini Request] tokens=500, prompt_hash=abc123
# [Gemini Response] tokens=200, safety_rating=safe

Pricing & Quotas

Free Tier

Rate: 60 requests/minute
Tokens: 1.5M tokens/month
Models: gemini-3-flash only

Pro Tier

Rate: 2000 req/min
Tokens: Unlimited
Models: All

Upgrade at: Google Cloud Console

Monitoring Costs

vectora cost-report --period month

# Output:
# Gemini API usage:
# gemini-3-flash: $0.00 (within free tier)
# Tokens: 1,234,567 / 1,500,000

Troubleshooting

“API key invalid”

# Check if key is configured
echo $GEMINI_API_KEY

# If empty, configure
export GEMINI_API_KEY="your-value"

# Test if valid
curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-flash:generateContent" \
  -H "Content-Type: application/json" \
  -d "{\"contents\": [{\"parts\": [{\"text\": \"test\"}]}]}" \
  -H "x-goog-api-key: $GEMINI_API_KEY"

“Quota exceeded”

Cause: Reached free tier limit.

Solution:

Upgrade to Plus plan
Or wait for monthly reset (1st of month)
Or use local fallback model:

providers:
  llm:
    name: "gemini"
    fallback_model: "local-mistral" # Ollama local

“Model not found”

Check available models:

vectora models list
# Shows: gemini-3-flash, gemini-pro, ...

Comparison: Gemini vs Alternatives

Model	Cost	Context
Gemini 3 Flash	Free	1M tokens
Claude 3 Opus	$$	200K tokens
GPT-4	$$	128K tokens
Llama 2 (local)	Free	4K tokens

Recommendation: Gemini is best for RAG (fast + efficient).

Next: MCP Tools Reference

External Linking

Concept	Resource	Link
Gemini API	Google AI Studio Documentation	ai.google.dev/docs
MCP	Model Context Protocol Specification	modelcontextprotocol.io/specification
MCP Go SDK	Go SDK for MCP (mark3labs)	github.com/mark3labs/mcp-go
Voyage Embeddings	Voyage Embeddings Documentation	docs.voyageai.com/docs/embeddings
Voyage Reranker	Voyage Reranker API	docs.voyageai.com/docs/reranker
OpenAPI	OpenAPI Specification	swagger.io/specification/

Part of the Vectora ecosystem · Open Source (MIT) · Contributors

MCP Protocol Integration ChatGPT Plugin