Search API, Hybrid Retrieval, Cross-Encoder Reranking
Public Search API with four-path hybrid retrieval combining vector similarity, entity matching, cluster context, and actor profiles. Three reranking modes (fast/balanced/thorough) let you tune quality vs latency.
Public Search API with hybrid retrieval and cross-encoder reranking for semantic code search
Search API
Perform semantic code search across your workspace's neural memory through a production-ready REST API. The /v1/search endpoint accepts natural language queries and returns ranked results with full metadata, entity extraction, and contextual information.
What's included:
POST /v1/search endpoint with Bearer token or session authentication
Pagination via limit and offset parameters (1-100 results per page)
Optional filters by source, type, and date range
Response includes latency breakdown for performance monitoring
Activity tracking for search analytics
Example:
Four-Path Hybrid Retrieval
Our approach to semantic code search combines four parallel retrieval strategies for comprehensive coverage. Unlike simple vector search, hybrid retrieval captures both semantic similarity and structured patterns.
What's included:
Vector Path: Pinecone semantic search using query embeddings
Entity Path: Pattern extraction for @mentions, #issues, API endpoints, and project references
Cluster Path: Topic-based context from workspace knowledge clusters
Actor Path: Contributor relevance based on expertise domains and activity
Score fusion strategy:
Vector results form the base with semantic similarity scores
Entity matches boost confirmed results by +0.2
New entity-only matches receive 0.85 × confidence score
Results merged and sorted by combined score
The entity search uses regex-based extraction rather than traditional BM25 keyword search, optimized for developer-centric patterns like @engineer, #123, and /api/endpoint.
Cross-Encoder Reranking
Three reranking modes let you tune the quality/latency tradeoff. Pass the mode parameter to select your strategy.
Modes:
Mode | Provider | Latency | Use Case |
|---|---|---|---|
fast | Passthrough | ~0ms | Real-time autocomplete, high-volume queries |
balanced | Cohere rerank-v3.5 | ~130ms | Default for most search use cases |
thorough | Claude Haiku 4.5 | ~300-500ms | When precision matters most |
How it works:
Search over-fetches candidates (2× limit) for reranking
Reranker scores each candidate against the query
balanced mode uses Cohere's cross-encoder model via API
thorough mode uses LLM semantic scoring with 60% LLM weight + 40% vector weight
Results filtered by relevance threshold (0.4 for thorough mode)
Example with mode selection:
Response Structure
The API returns structured results with optional context and detailed latency metrics.
Example response: