Changelog

0-3Dec 15, 2025

Search API, Hybrid Retrieval, Cross-Encoder Reranking

Public Search API with four-path hybrid retrieval combining vector similarity, entity matching, cluster context, and actor profiles. Three reranking modes (fast/balanced/thorough) let you tune quality vs latency.

Public Search API with hybrid retrieval and cross-encoder reranking for semantic code search

Search API

Perform semantic code search across your workspace's neural memory through a production-ready REST API. The /v1/search endpoint accepts natural language queries and returns ranked results with full metadata, entity extraction, and contextual information.

What's included:

POST /v1/search endpoint with Bearer token or session authentication
Pagination via limit and offset parameters (1-100 results per page)
Optional filters by source, type, and date range
Response includes latency breakdown for performance monitoring
Activity tracking for search analytics

Example:

bash

curl -X POST https://lightfast.ai/v1/search \ -H "Authorization: Bearer $API_KEY" \ -H "X-Workspace-ID: $WORKSPACE_ID" \ -H "Content-Type: application/json" \ -d '{ "query": "authentication flow for OAuth providers", "limit": 10, "mode": "balanced" }'

curl -X POST https://lightfast.ai/v1/search \ -H "Authorization: Bearer $API_KEY" \ -H "X-Workspace-ID: $WORKSPACE_ID" \ -H "Content-Type: application/json" \ -d '{ "query": "authentication flow for OAuth providers", "limit": 10, "mode": "balanced" }'

Four-Path Hybrid Retrieval

Our approach to semantic code search combines four parallel retrieval strategies for comprehensive coverage. Unlike simple vector search, hybrid retrieval captures both semantic similarity and structured patterns.

What's included:

Vector Path: Pinecone semantic search using query embeddings
Entity Path: Pattern extraction for @mentions, #issues, API endpoints, and project references
Cluster Path: Topic-based context from workspace knowledge clusters
Actor Path: Contributor relevance based on expertise domains and activity

Score fusion strategy:

Vector results form the base with semantic similarity scores
Entity matches boost confirmed results by +0.2
New entity-only matches receive 0.85 × confidence score
Results merged and sorted by combined score

The entity search uses regex-based extraction rather than traditional BM25 keyword search, optimized for developer-centric patterns like @engineer, #123, and /api/endpoint.

Cross-Encoder Reranking

Three reranking modes let you tune the quality/latency tradeoff. Pass the mode parameter to select your strategy.

Modes:

Mode	Provider	Latency	Use Case
fast	Passthrough	~0ms	Real-time autocomplete, high-volume queries
balanced	Cohere rerank-v3.5	~130ms	Default for most search use cases
thorough	Claude Haiku 4.5	~300-500ms	When precision matters most

How it works:

Search over-fetches candidates (2× limit) for reranking
Reranker scores each candidate against the query
balanced mode uses Cohere's cross-encoder model via API
thorough mode uses LLM semantic scoring with 60% LLM weight + 40% vector weight
Results filtered by relevance threshold (0.4 for thorough mode)

Example with mode selection:

typescript

const response = await fetch('/v1/search', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${apiKey}`,
    'X-Workspace-ID': workspaceId, // ensure workspaceId is defined as a string
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    query: 'database migration patterns',
    limit: 10,
    mode: 'thorough',
    includeContext: true // explicit flag for reranking and context
  })
});

const json = await response.json();
const { data, context, latency } = json;

// latency.rerank shows reranking time
// context.clusters shows relevant topics
// context.relevantActors shows domain experts

const response = await fetch('/v1/search', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${apiKey}`,
    'X-Workspace-ID': workspaceId, // ensure workspaceId is defined as a string
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    query: 'database migration patterns',
    limit: 10,
    mode: 'thorough',
    includeContext: true // explicit flag for reranking and context
  })
});

const json = await response.json();
const { data, context, latency } = json;

// latency.rerank shows reranking time
// context.clusters shows relevant topics
// context.relevantActors shows domain experts

Response Structure

The API returns structured results with optional context and detailed latency metrics.

Example response:

json

{
  "data": [
    {
      "id": "obs_abc123",
      "title": "Implement OAuth2 PKCE flow",
      "url": "https://github.com/org/repo/pull/42",
      "snippet": "Added PKCE support for OAuth2...",
      "score": 0.89,
      "source": "github",
      "type": "pull_request",
      "entities": ["@auth-team", "#security"]
    }
  ],
  "context": {
    "clusters": [
      {
        "topic": "Authentication",
        "keywords": ["oauth", "jwt", "session"]
      }
    ],
    "relevantActors": [
      {
        "displayName": "Alice",
        "expertiseDomains": ["auth", "security"]
      }
    ]
  },
  "latency": {
    "total": 245,
    "embedding": 45,
    "retrieval": 82,
    "entitySearch": 12,
    "rerank": 98,
    "maxParallel": 82
  }
}

{
  "data": [
    {
      "id": "obs_abc123",
      "title": "Implement OAuth2 PKCE flow",
      "url": "https://github.com/org/repo/pull/42",
      "snippet": "Added PKCE support for OAuth2...",
      "score": 0.89,
      "source": "github",
      "type": "pull_request",
      "entities": ["@auth-team", "#security"]
    }
  ],
  "context": {
    "clusters": [
      {
        "topic": "Authentication",
        "keywords": ["oauth", "jwt", "session"]
      }
    ],
    "relevantActors": [
      {
        "displayName": "Alice",
        "expertiseDomains": ["auth", "security"]
      }
    ]
  },
  "latency": {
    "total": 245,
    "embedding": 45,
    "retrieval": 82,
    "entitySearch": 12,
    "rerank": 98,
    "maxParallel": 82
  }
}

Resources

5 min read

0-2Dec 13, 2025

Entity Extraction, Observation Clusters, Multi-View Embeddings

Neural Memory now automatically extracts entities from your development activity, groups related observations into topic clusters, and generates specialized embeddings for better search relevance. These features work together to make your team's knowledge more discoverable.

Entity Extraction, Observation Clusters, Multi-View Embeddings

Entity Extraction

Neural Memory now automatically identifies and tracks meaningful references in your development activity. The hybrid extraction pipeline combines fast regex patterns with LLM-powered semantic extraction to capture entities that would otherwise be missed.

What's included:

Seven entity categories: engineers (@mentions), projects (#issues, ENG-123), API endpoints, environment variables, file paths, external services, and generic references
Dual extraction paths: Regex patterns run inline during capture (0.70-0.95 confidence); LLM extraction runs async for content >200 characters
Automatic deduplication: Entities are tracked by workspace with occurrence counts and "last seen" timestamps
Search integration: Entity mentions boost search results via the four-path retrieval system

Example entities extracted:

text

@sarah-dev → engineer (0.90 confidence)
#authentication → project (0.95 confidence)
POST /api/users → endpoint (0.95 confidence)
DATABASE_URL → config (0.85 confidence)
src/lib/auth.ts → definition (0.80 confidence)

@sarah-dev → engineer (0.90 confidence)
#authentication → project (0.95 confidence)
POST /api/users → endpoint (0.95 confidence)
DATABASE_URL → config (0.85 confidence)
src/lib/auth.ts → definition (0.80 confidence)

Limitations:

LLM extraction requires content >200 characters
Confidence threshold of 0.65 filters low-confidence extractions
Patterns optimized for English text
API endpoint detection requires HTTP verb prefix (GET, POST, etc.)

Observation Clusters

Related development events are now automatically grouped into topic clusters. Each observation is assigned to the most semantically similar cluster—or creates a new topic group if no good match exists.

What's included:

Four-signal affinity scoring: Embedding similarity (40pts), entity overlap (30pts), actor overlap (20pts), temporal proximity (10pts)
60-point threshold: Observations scoring 60+ join existing clusters; below that creates new clusters
Cluster context in search: Topic labels and keywords are returned as context in search results
Automatic tracking: Primary entities, actors, observation counts, and temporal bounds

Affinity calculation:

typescript

// Maximum score: 100 points
embeddingSimilarity * 40 // Semantic relatedness
+ entityOverlap * 30 // Shared @mentions, #issues
+ actorMatch * 20 // Same contributor
+ temporalProximity * 10 // Recent activity (decays over 10 hours)

// Maximum score: 100 points
embeddingSimilarity * 40 // Semantic relatedness
+ entityOverlap * 30 // Shared @mentions, #issues
+ actorMatch * 20 // Same contributor
+ temporalProximity * 10 // Recent activity (decays over 10 hours)

Current status:

Observation Clusters is in beta. Cluster assignment and search context are fully operational. LLM-generated cluster summaries are not yet available—observations are grouped but the summary generation pipeline requires a schema migration (Phase 5) to link observations to their assigned clusters.

Why we built it this way: We chose a multi-signal approach over pure embedding similarity because development context matters. A PR from the same author about the same feature should cluster together even if the semantic content differs slightly.

Multi-View Embeddings

Every observation now generates three specialized embedding vectors, each optimized for different query types. This improves search relevance by matching the right content perspective to your search intent.

The three views:

View	Text	Purpose
Title	Event headline (≤120 chars)	Broad topic discovery
Content	Full body text	Detailed, specific queries
Summary	Title + first 1000 chars	Balanced retrieval

What's included:

Cohere embed-english-v3.0: 1024-dimensional vectors with input type optimization
Batch generation: All three embeddings generated in a single API call
Smart deduplication: Search queries all views; results deduplicated by max score
Cluster assignment: Uses content embedding for best semantic matching

Search behavior:

typescript

// All views are queried in parallel
const results = await pinecone.query({ vector: queryEmbedding, filter: { layer: "observations" }, topK: 50
});

// Deduplicate by observation, keeping max score
// If title matches at 0.85 and content at 0.72,
// the observation appears once with score 0.85

// All views are queried in parallel
const results = await pinecone.query({ vector: queryEmbedding, filter: { layer: "observations" }, topK: 50
});

// Deduplicate by observation, keeping max score
// If title matches at 0.85 and content at 0.72,
// the observation appears once with score 0.85

Limitations:

Cohere provider only (no OpenAI or custom models)
English language only
Fixed 1024 dimensions (no dimension reduction for cost optimization)
3x vector storage per observation

Resources

4 min read

0-1Dec 11, 2025

Observation Pipeline, Semantic Classification, Webhook Architecture

Neural Memory foundation layer: real-time observation capture from GitHub and Vercel webhooks, AI-powered semantic classification with 14 engineering categories, and production-ready webhook architecture with signature verification and replay protection.

Real-time event capture, AI classification, and production-ready webhook infrastructure

Observation Pipeline

The neural observation pipeline captures engineering activity from your connected sources and transforms it into searchable memory. Events flow through significance scoring, AI classification, entity extraction, and multi-view embedding generation before storage.

What's included:

Significance scoring filters low-value events (threshold: 40/100). High-value events like releases (75), deployment failures (70), and PR merges (60) pass through automatically. Routine commits (30) and trivial changes are filtered.
Multi-view embeddings generate three vectors per observation: title-only for headline searches, full content for detailed queries, and a balanced summary view. All three are stored in Pinecone with pre-computed observation IDs for direct lookup.
Entity extraction identifies API endpoints, file paths, issue references, @mentions, and environment variables from event content. Entities are deduplicated and tracked with occurrence counts.
Cluster assignment groups related observations using embedding similarity (40 points), entity overlap (30 points), actor overlap (20 points), and temporal proximity (10 points). Threshold: 60/100 to join an existing cluster.

Example: Significance Scoring

typescript

// Event weights (base scores)
const weights = {
  "release.published": 75,
  "deployment.error": 70,
  "pull-request.merged": 60,
  "pull-request.opened": 50,
  "issue.opened": 45,
  "deployment.succeeded": 40,
  push: 30,
}

// Content signals (added to base score)
if (title.match(/breaking|security|CVE/i)) score += 20
if (title.match(/hotfix|emergency/i)) score += 15
if (title.match(/chore|deps|bump/i)) score -= 10

// Event weights (base scores)
const weights = {
  "release.published": 75,
  "deployment.error": 70,
  "pull-request.merged": 60,
  "pull-request.opened": 50,
  "issue.opened": 45,
  "deployment.succeeded": 40,
  push: 30,
}

// Content signals (added to base score)
if (title.match(/breaking|security|CVE/i)) score += 20
if (title.match(/hotfix|emergency/i)) score += 15
if (title.match(/chore|deps|bump/i)) score -= 10

Limitations:

Significance threshold (40) is global; per-workspace configuration planned
Entity extraction limited to 50 entities per observation
Cluster lookback window is 7 days

Semantic Classification

Every observation is classified into one of 14 engineering categories using Claude Haiku. Classification drives cluster organization, topic extraction, and future retrieval filtering.

Categories:

Category	Description
bug_fix	Bug fixes, patches, error corrections
feature	New features, additions, implementations
refactor	Code restructuring, cleanup
documentation	Docs, README, comments
testing	Tests, specs, coverage
infrastructure	CI/CD, pipelines, Docker
security	Security fixes, auth changes
performance	Optimizations, speed improvements
incident	Outages, emergencies, hotfixes
decision	ADRs, architecture decisions
discussion	RFCs, proposals, design discussions
release	Version releases, changelogs
deployment	Deployments, shipping to production
other	Doesn't fit other categories

How it works:

Claude Haiku receives event details (source, type, title, body truncated to 1000 chars)
Returns primary category, up to 3 secondary categories, up to 5 topics, and confidence score
Temperature 0.2 ensures deterministic classification across runs

Fallback:

If the LLM fails (timeout, rate limit), regex patterns classify events by matching keywords:

typescript

// Fallback patterns (first match wins)
const patterns = {
  bug_fix: /\bfix(es|ed|ing)?\b/i,
  feature: /\bfeat(ure)?[:\s]/i,
  security: /\bsecurity\b|CVE-\d+/i, // ... 10 more patterns
}

// Fallback patterns (first match wins)
const patterns = {
  bug_fix: /\bfix(es|ed|ing)?\b/i,
  feature: /\bfeat(ure)?[:\s]/i,
  security: /\bsecurity\b|CVE-\d+/i, // ... 10 more patterns
}

Limitations:

Classification results (category, confidence) are not stored in database; only topics array persists
No accuracy metrics tracked in production
Confidence threshold (0.6) defined but not enforced

Webhook Architecture

Production-ready webhook infrastructure receives events from GitHub and Vercel with cryptographic verification, replay protection, and complete audit trails.

Supported Events:

Source	Events
GitHub	push (default branch), pull_request (opened/closed/reopened/ready_for_review), issues (opened/closed/reopened), release (published), discussion (created/answered)
Vercel	deployment.created, deployment.succeeded, deployment.ready, deployment.error, deployment.canceled

Security measures:

Signature verification: HMAC SHA-256 (GitHub) and SHA-1 (Vercel) with timing-safe comparison
Replay protection: 5-minute timestamp validation window with 60-second clock skew tolerance
Audit trail: Raw JSON payloads stored permanently in workspace_webhook_payloads table

Processing architecture:

text

Webhook → Signature Verify → Timestamp Validate → Store Raw Payload ↓
Transform to SourceEvent → Emit Inngest Event ↓
Observation Capture Workflow (async) ├── Duplicate check ├── Event filter (source config) ├── Significance gate ├── Classification (parallel) ├── Embedding generation (parallel) ├── Entity extraction (parallel) ├── Cluster assignment ├── Pinecone upsert └── Database insert

Webhook → Signature Verify → Timestamp Validate → Store Raw Payload ↓
Transform to SourceEvent → Emit Inngest Event ↓
Observation Capture Workflow (async) ├── Duplicate check ├── Event filter (source config) ├── Significance gate ├── Classification (parallel) ├── Embedding generation (parallel) ├── Entity extraction (parallel) ├── Cluster assignment ├── Pinecone upsert └── Database insert

Example: SourceEvent structure

typescript

interface SourceEvent { source: 'github' | 'vercel'; sourceType: 'pull-request.opened' | 'deployment.succeeded' | ...; sourceId: 'pr:lightfastai/lightfast#123'; title: 'feat: add semantic search'; body: 'Full PR description...'; actor: { id: '12345678', name: 'username', email, avatarUrl }; occurredAt: '2025-12-17T10:30:00Z'; references: [{ type: 'commit', id: 'abc123' }]; metadata: { /* source-specific */ };
}

interface SourceEvent { source: 'github' | 'vercel'; sourceType: 'pull-request.opened' | 'deployment.succeeded' | ...; sourceId: 'pr:lightfastai/lightfast#123'; title: 'feat: add semantic search'; body: 'Full PR description...'; actor: { id: '12345678', name: 'username', email, avatarUrl }; occurredAt: '2025-12-17T10:30:00Z'; references: [{ type: 'commit', id: 'abc123' }]; metadata: { /* source-specific */ };
}

Cross-source correlation:

Vercel deployments are linked to GitHub users via commit SHA. When a GitHub push arrives with the same commit, the Vercel observation's actor is updated with the numeric GitHub user ID.

Limitations:

Only GitHub and Vercel sources implemented (Linear, Sentry, PagerDuty planned)
No circuit breaker for failing transformers
No rate limiting at webhook endpoint level
No manual reprocessing UI

Why We Built It This Way

The observation pipeline uses a significance scoring gate before AI classification to minimize LLM costs. Only events scoring 40+ undergo classification and embedding generation. This keeps costs predictable while ensuring high-value events like security patches and releases are always captured.

Multi-view embeddings (title, content, summary) optimize retrieval for different query types. When searching for "authentication bug", the title embedding finds headline matches, while the content embedding surfaces detailed discussions. The summary view balances both for general queries.

Raw webhook payload storage enables replay and debugging. When something goes wrong, you can inspect the exact JSON received, re-trigger processing, or audit what happened.

Resources

5 min read

Try Lightfast now.

Join Early Access

Changelog

Search API, Hybrid Retrieval, Cross-Encoder Reranking

Search API

Four-Path Hybrid Retrieval

Cross-Encoder Reranking

Response Structure

Resources

Improvements (3)

Infrastructure (3)

Entity Extraction, Observation Clusters, Multi-View Embeddings

Entity Extraction

Observation Clusters

Multi-View Embeddings

Resources

Infrastructure (3)

Observation Pipeline, Semantic Classification, Webhook Architecture

Observation Pipeline

Semantic Classification

Webhook Architecture

Why We Built It This Way

Resources

Infrastructure (4)

Try Lightfast now.