Gemini API
Overview
OWN APP: Vectora offers deep integration with Gemini 3 Flash via proprietary REST API. Use Gemini as the “reasoning engine” for code - analysis, review, code generation with Vectora context.
Important
Gemini API (own system) vs MCP Protocol or Extension. Choose based on workflow: code analysis, code review, auto documentation.
Basic Configuration
Get Gemini API Key
- Go to Google AI Studio
- Click “Create API Key”
- Copy the key
Configure in Vectora
# Option A: Via CLI interactive
vectora config set --key GEMINI_API_KEY
# Option B: Via .env
echo "GEMINI_API_KEY=sk-..." >> .env
# Option C: Via config.yaml
# vectora.config.yaml
providers:
llm:
name: "gemini"
model: "gemini-3-flash"
api_key: "${GEMINI_API_KEY}"Verify
vectora config list
# Should show GEMINI_API_KEY: ••••••••••Available Models
| Model | Tokens | Latency | Cost | Use Case |
|---|---|---|---|---|
gemini-3-flash | 1M | <500ms | Free* | Default, fast analysis |
gemini-2-flash | 1M | <500ms | $$ | Legacy (deprecated) |
gemini-pro | 32K | <1s | $$ | Deep analysis |
gemini-vision | 4K | <2s | $$ | Visual analysis (screenshots) |
*Free tier: 60 req/min, 1.5M tokens/month
Selecting Model
# vectora.config.yaml
providers:
llm:
model: "gemini-3-flash"
# Fallback if primary fails
fallback_model: "gemini-pro"
fallback_on:
- "rate_limit_exceeded"
- "timeout"Real-World Workflows
The workflows below illustrate how Gemini 3 Flash can be used with Vectora context for deep code analysis, auto documentation, and performance optimization.
Workflow 1: Automatic Code Review (Architecture)
Scenario: Review new cache module before merge
vectora review \
--file src/cache/redis.ts \
--context "cache patterns in project" \
--criteria "security, performance, testability" \
--model gemini-pro \
--output review.mdGemini Output (with Vectora context):
# Code Review: src/cache/redis.ts
## Security
Redis PASSWORD in .env (not hardcoded)
TTL implemented (prevents stale cache)
No rate limiting for invalidation
Compared with: src/cache/memory.ts (better implemented)
## Performance
Cache hit rate 89% (project has 3 similar implementations)
Avoids N+1 queries (pattern followed in order.service.ts)
## Testability
Missing RedisClient mocks
See example at: src/**tests**/cache.mock.ts:23
## Recommendations
1. Add event-based invalidation
(See patterns in src/events/cache-invalidator.ts)
2. Implement circuit breaker
(Similar to: src/resilience/circuit-breaker.ts)
3. Add fallback tests when Redis fails
Test example: src/**tests**/cache-fallback.test.ts
## Alignment
This code follows 95% of project patterns.
Ready to merge after feedback above.Workflow 2: Automatic Documentation (API)
Scenario: Generate docs for new endpoints
vectora generate-docs \
--from "src/routes/api.ts" \
--type "openapi" \
--llm gemini-3-flash \
--output docs/api.openapi.yamlExample output:
openapi: 3.0.0
info:
title: Vectora API
version: 1.0.0
paths:
/search:
post:
summary: "Semantic search (generated by Gemini + context)"
description: |
Search by semantic similarity using embeddings.
Implementation: src/routes/search.ts
Uses: Context Engine with Voyage 4
Latency: ~230ms
requestBody:
required: true
content:
application/json:
schema:
$ref: "#/components/schemas/SearchRequest"
responses:
"200":
description: Success
content:
application/json:
schema:
$ref: "#/components/schemas/SearchResponse"
"429":
description: "Rate limited (see: src/middleware/rate-limit.ts)"Workflow 3: Performance Analysis (Investigation)
Scenario: API is slow, need to identify bottlenecks
vectora analyze-performance \
--query "Search is taking 2s. What's the problem?" \
--include-logs "vectora.log" \
--model gemini-pro \
--output perf-report.mdGemini Analysis:
# Performance Analysis Report
## Problem Identified
Search taking 2000ms when SLA is 300ms.
## Root Cause
Reranking via Voyage Rerank 2.5 taking 1.2s.
**Evidence in code:**
- src/context-engine.ts:145
`const reranked = await reranker.rank(topK=100, maxTokens=2048)`
- Problem: maxTokens too high
- Comparison: another project uses maxTokens=512 (src/search/optimized.ts:23)
Result: 180ms vs your 1200ms
## Solutions
1. **Quick win** (implement today)
Reduce maxTokens from 2048 → 512
Expected result: 1.8s → 0.8s
2. **Medium-term** (next sprint)
Implement embedding cache
See pattern at: src/cache/embedding-cache.ts
3. **Long-term**
Use Gemini 3 Flash as reranker instead of Voyage
Trade-off: faster but less precise
Test: src/**tests**/reranker-comparison.ts
## Impact
- Current: p95=2000ms, p99=3000ms
- After fix #1: p95=800ms, p99=1500ms
- After fix #2: p95=400ms, p99=600ms
## Recommendation
Apply fix #1 today (10 min change), measure before #2.Workflow 4: Type Generation (TypeScript)
Scenario: API returns complex JSON, need types
vectora generate-types \
--from "src/responses/" \
--target "typescript" \
--model gemini-3-flash \
--output "src/types/generated.ts"Output (using common patterns in project):
// src/types/generated.ts
// Auto-generated by Gemini via Vectora Context Engine
export interface SearchResponse {
chunks: SearchChunk[];
metadata: {
retrieval_precision: number;
latency_ms: number;
total_searched: number;
};
// Pattern seen in src/types/search.ts
}
export interface SearchChunk {
file: string;
line_start: number;
line_end: number;
content: string;
precision: number; // 0.0 - 1.0
// Compatible with: src/__tests__/search.mock.ts
}
// Generated types 98% compatible with existing codebase
// Follows conventions from src/types/*
// Includes guardrail types (src/security/guardian.types.ts)
Advanced Configuration
Custom System Prompt
providers:
llm:
model: "gemini-3-flash"
system_prompt: |
You are an expert in TypeScript code.
Always cite line numbers when responding.
Keep explanations concise.
Focus on security and performance.Temperature & Parameters
providers:
llm:
model: "gemini-3-flash"
temperature: 0.7 # 0=deterministic, 1=creative
top_p: 0.95 # Nucleus sampling
max_tokens: 2048
stop_sequences:
- "\n---\n"Streaming
For real-time responses:
vectora analyze \
--query "Explain the authentication module" \
--stream # Output in real-time
--model gemini-proContext Engine Integration
Automatic Context Passing
Vectora automatically passes Context Engine chunks to Gemini:
Your prompt: "Why is this test failing?"
Context sent to Gemini:
{
"chunks": [
{"file": "spec/auth.test.ts", "content": "..."},
{"file": "src/auth/jwt.ts", "content": "..."},
{"file": "src/auth/guards.ts", "content": "..."}
],
"precision": 0.78,
"retrieval_time_ms": 234
}
Gemini reads context + your prompt → responseReranking with Gemini
Optionally use Gemini as reranker instead of Voyage:
providers:
reranker:
name: "gemini" # Instead of "voyage"
model: "gemini-3-flash"
# Gemini evaluates relevance of each chunk
# More accurate, but slowerSafety & Guardrails
Built-in Guardrails
Gemini has protections against:
- Generating malicious code
- Leaking sensitive data
- Offensive content
Custom Safety Rules
providers:
llm:
safety:
block_on_sensitive_patterns:
- "api_key"
- "password"
- "secret"
block_keywords:
- "exploit"
- "malware"Audit Logging
VECTORA_LOG_LEVEL=debug vectora analyze "..."
# Logs include:
# [Gemini Request] tokens=500, prompt_hash=abc123
# [Gemini Response] tokens=200, safety_rating=safePricing & Quotas
Free Tier
- Rate: 60 requests/minute
- Tokens: 1.5M tokens/month
- Models: gemini-3-flash only
Pro Tier
- Rate: 2000 req/min
- Tokens: Unlimited
- Models: All
Upgrade at: Google Cloud Console
Monitoring Costs
vectora cost-report --period month
# Output:
# Gemini API usage:
# gemini-3-flash: $0.00 (within free tier)
# Tokens: 1,234,567 / 1,500,000Troubleshooting
“API key invalid”
# Check if key is configured
echo $GEMINI_API_KEY
# If empty, configure
export GEMINI_API_KEY="your-value"
# Test if valid
curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-flash:generateContent" \
-H "Content-Type: application/json" \
-d "{\"contents\": [{\"parts\": [{\"text\": \"test\"}]}]}" \
-H "x-goog-api-key: $GEMINI_API_KEY"“Quota exceeded”
Cause: Reached free tier limit.
Solution:
- Upgrade to Pro plan
- Or wait for monthly reset (1st of month)
- Or use local fallback model:
providers:
llm:
name: "gemini"
fallback_model: "local-mistral" # Ollama local“Model not found”
Check available models:
vectora models list
# Shows: gemini-3-flash, gemini-pro, ...Comparison: Gemini vs Alternatives
| Model | Speed | Quality | Cost | Context |
|---|---|---|---|---|
| Gemini 3 Flash | Free | 1M tokens | ||
| Claude 3 Opus | $$ | 200K tokens | ||
| GPT-4 | $$ | 128K tokens | ||
| Llama 2 (local) | Free | 4K tokens |
Recommendation: Gemini is best for RAG (fast + efficient).
Next: MCP Tools Reference
Part of Vectora ecosystem · Open Source (MIT)