Changelog

0-3

Search API, Hybrid Retrieval, Cross-Encoder Reranking

Public Search API with four-path hybrid retrieval combining vector similarity, entity matching, cluster context, and actor profiles. Three reranking modes (fast/balanced/thorough) let you tune quality vs latency.

Public Search API with hybrid retrieval and cross-encoder reranking for semantic code search


Search API

Perform semantic code search across your workspace's neural memory through a production-ready REST API. The /v1/search endpoint accepts natural language queries and returns ranked results with full metadata, entity extraction, and contextual information.

What's included:

  • POST /v1/search endpoint with Bearer token or session authentication

  • Pagination via limit and offset parameters (1-100 results per page)

  • Optional filters by source, type, and date range

  • Response includes latency breakdown for performance monitoring

  • Activity tracking for search analytics

Example:

bash
curl -X POST https://lightfast.ai/v1/search \ -H "Authorization: Bearer $API_KEY" \ -H "X-Workspace-ID: $WORKSPACE_ID" \ -H "Content-Type: application/json" \ -d '{ "query": "authentication flow for OAuth providers", "limit": 10, "mode": "balanced" }'

Four-Path Hybrid Retrieval

Our approach to semantic code search combines four parallel retrieval strategies for comprehensive coverage. Unlike simple vector search, hybrid retrieval captures both semantic similarity and structured patterns.

What's included:

  • Vector Path: Pinecone semantic search using query embeddings

  • Entity Path: Pattern extraction for @mentions, #issues, API endpoints, and project references

  • Cluster Path: Topic-based context from workspace knowledge clusters

  • Actor Path: Contributor relevance based on expertise domains and activity

Score fusion strategy:

  • Vector results form the base with semantic similarity scores

  • Entity matches boost confirmed results by +0.2

  • New entity-only matches receive 0.85 × confidence score

  • Results merged and sorted by combined score

The entity search uses regex-based extraction rather than traditional BM25 keyword search, optimized for developer-centric patterns like @engineer, #123, and /api/endpoint.


Cross-Encoder Reranking

Three reranking modes let you tune the quality/latency tradeoff. Pass the mode parameter to select your strategy.

Modes:

Mode

Provider

Latency

Use Case

fast

Passthrough

~0ms

Real-time autocomplete, high-volume queries

balanced

Cohere rerank-v3.5

~130ms

Default for most search use cases

thorough

Claude Haiku 4.5

~300-500ms

When precision matters most

How it works:

  • Search over-fetches candidates (2× limit) for reranking

  • Reranker scores each candidate against the query

  • balanced mode uses Cohere's cross-encoder model via API

  • thorough mode uses LLM semantic scoring with 60% LLM weight + 40% vector weight

  • Results filtered by relevance threshold (0.4 for thorough mode)

Example with mode selection:

typescript
const response = await fetch('/v1/search', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${apiKey}`,
    'X-Workspace-ID': workspaceId, // ensure workspaceId is defined as a string
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    query: 'database migration patterns',
    limit: 10,
    mode: 'thorough',
    includeContext: true // explicit flag for reranking and context
  })
});

const json = await response.json();
const { data, context, latency } = json;

// latency.rerank shows reranking time
// context.clusters shows relevant topics
// context.relevantActors shows domain experts

Response Structure

The API returns structured results with optional context and detailed latency metrics.

Example response:

json
{
  "data": [
    {
      "id": "obs_abc123",
      "title": "Implement OAuth2 PKCE flow",
      "url": "https://github.com/org/repo/pull/42",
      "snippet": "Added PKCE support for OAuth2...",
      "score": 0.89,
      "source": "github",
      "type": "pull_request",
      "entities": ["@auth-team", "#security"]
    }
  ],
  "context": {
    "clusters": [
      {
        "topic": "Authentication",
        "keywords": ["oauth", "jwt", "session"]
      }
    ],
    "relevantActors": [
      {
        "displayName": "Alice",
        "expertiseDomains": ["auth", "security"]
      }
    ]
  },
  "latency": {
    "total": 245,
    "embedding": 45,
    "retrieval": 82,
    "entitySearch": 12,
    "rerank": 98,
    "maxParallel": 82
  }
}

Resources

5 min read
0-2

Entity Extraction, Observation Clusters, Multi-View Embeddings

Neural Memory now automatically extracts entities from your development activity, groups related observations into topic clusters, and generates specialized embeddings for better search relevance. These features work together to make your team's knowledge more discoverable.

Entity Extraction, Observation Clusters, Multi-View Embeddings


Entity Extraction

Neural Memory now automatically identifies and tracks meaningful references in your development activity. The hybrid extraction pipeline combines fast regex patterns with LLM-powered semantic extraction to capture entities that would otherwise be missed.

What's included:

  • Seven entity categories: engineers (@mentions), projects (#issues, ENG-123), API endpoints, environment variables, file paths, external services, and generic references

  • Dual extraction paths: Regex patterns run inline during capture (0.70-0.95 confidence); LLM extraction runs async for content >200 characters

  • Automatic deduplication: Entities are tracked by workspace with occurrence counts and "last seen" timestamps

  • Search integration: Entity mentions boost search results via the four-path retrieval system

Example entities extracted:

text
@sarah-dev → engineer (0.90 confidence)
#authentication → project (0.95 confidence)
POST /api/users → endpoint (0.95 confidence)
DATABASE_URL → config (0.85 confidence)
src/lib/auth.ts → definition (0.80 confidence)

Limitations:

  • LLM extraction requires content >200 characters

  • Confidence threshold of 0.65 filters low-confidence extractions

  • Patterns optimized for English text

  • API endpoint detection requires HTTP verb prefix (GET, POST, etc.)


Observation Clusters

Related development events are now automatically grouped into topic clusters. Each observation is assigned to the most semantically similar cluster—or creates a new topic group if no good match exists.

What's included:

  • Four-signal affinity scoring: Embedding similarity (40pts), entity overlap (30pts), actor overlap (20pts), temporal proximity (10pts)

  • 60-point threshold: Observations scoring 60+ join existing clusters; below that creates new clusters

  • Cluster context in search: Topic labels and keywords are returned as context in search results

  • Automatic tracking: Primary entities, actors, observation counts, and temporal bounds

Affinity calculation:

typescript
// Maximum score: 100 points
embeddingSimilarity * 40 // Semantic relatedness
+ entityOverlap * 30 // Shared @mentions, #issues
+ actorMatch * 20 // Same contributor
+ temporalProximity * 10 // Recent activity (decays over 10 hours)

Current status:

Observation Clusters is in beta. Cluster assignment and search context are fully operational. LLM-generated cluster summaries are not yet available—observations are grouped but the summary generation pipeline requires a schema migration (Phase 5) to link observations to their assigned clusters.

Why we built it this way: We chose a multi-signal approach over pure embedding similarity because development context matters. A PR from the same author about the same feature should cluster together even if the semantic content differs slightly.


Multi-View Embeddings

Every observation now generates three specialized embedding vectors, each optimized for different query types. This improves search relevance by matching the right content perspective to your search intent.

The three views:

View

Text

Purpose

Title

Event headline (≤120 chars)

Broad topic discovery

Content

Full body text

Detailed, specific queries

Summary

Title + first 1000 chars

Balanced retrieval

What's included:

  • Cohere embed-english-v3.0: 1024-dimensional vectors with input type optimization

  • Batch generation: All three embeddings generated in a single API call

  • Smart deduplication: Search queries all views; results deduplicated by max score

  • Cluster assignment: Uses content embedding for best semantic matching

Search behavior:

typescript
// All views are queried in parallel
const results = await pinecone.query({ vector: queryEmbedding, filter: { layer: "observations" }, topK: 50
});

// Deduplicate by observation, keeping max score
// If title matches at 0.85 and content at 0.72,
// the observation appears once with score 0.85

Limitations:

  • Cohere provider only (no OpenAI or custom models)

  • English language only

  • Fixed 1024 dimensions (no dimension reduction for cost optimization)

  • 3x vector storage per observation


Resources

4 min read
0-1

Observation Pipeline, Semantic Classification, Webhook Architecture

Neural Memory foundation layer: real-time observation capture from GitHub and Vercel webhooks, AI-powered semantic classification with 14 engineering categories, and production-ready webhook architecture with signature verification and replay protection.

Real-time event capture, AI classification, and production-ready webhook infrastructure


Observation Pipeline

The neural observation pipeline captures engineering activity from your connected sources and transforms it into searchable memory. Events flow through significance scoring, AI classification, entity extraction, and multi-view embedding generation before storage.

What's included:

  • Significance scoring filters low-value events (threshold: 40/100). High-value events like releases (75), deployment failures (70), and PR merges (60) pass through automatically. Routine commits (30) and trivial changes are filtered.

  • Multi-view embeddings generate three vectors per observation: title-only for headline searches, full content for detailed queries, and a balanced summary view. All three are stored in Pinecone with pre-computed observation IDs for direct lookup.

  • Entity extraction identifies API endpoints, file paths, issue references, @mentions, and environment variables from event content. Entities are deduplicated and tracked with occurrence counts.

  • Cluster assignment groups related observations using embedding similarity (40 points), entity overlap (30 points), actor overlap (20 points), and temporal proximity (10 points). Threshold: 60/100 to join an existing cluster.

Example: Significance Scoring

typescript
// Event weights (base scores)
const weights = {
  "release.published": 75,
  "deployment.error": 70,
  "pull-request.merged": 60,
  "pull-request.opened": 50,
  "issue.opened": 45,
  "deployment.succeeded": 40,
  push: 30,
}

// Content signals (added to base score)
if (title.match(/breaking|security|CVE/i)) score += 20
if (title.match(/hotfix|emergency/i)) score += 15
if (title.match(/chore|deps|bump/i)) score -= 10

Limitations:

  • Significance threshold (40) is global; per-workspace configuration planned

  • Entity extraction limited to 50 entities per observation

  • Cluster lookback window is 7 days


Semantic Classification

Every observation is classified into one of 14 engineering categories using Claude Haiku. Classification drives cluster organization, topic extraction, and future retrieval filtering.

Categories:

Category

Description

bug_fix

Bug fixes, patches, error corrections

feature

New features, additions, implementations

refactor

Code restructuring, cleanup

documentation

Docs, README, comments

testing

Tests, specs, coverage

infrastructure

CI/CD, pipelines, Docker

security

Security fixes, auth changes

performance

Optimizations, speed improvements

incident

Outages, emergencies, hotfixes

decision

ADRs, architecture decisions

discussion

RFCs, proposals, design discussions

release

Version releases, changelogs

deployment

Deployments, shipping to production

other

Doesn't fit other categories

How it works:

  1. Claude Haiku receives event details (source, type, title, body truncated to 1000 chars)

  2. Returns primary category, up to 3 secondary categories, up to 5 topics, and confidence score

  3. Temperature 0.2 ensures deterministic classification across runs

Fallback:

If the LLM fails (timeout, rate limit), regex patterns classify events by matching keywords:

typescript
// Fallback patterns (first match wins)
const patterns = {
  bug_fix: /\bfix(es|ed|ing)?\b/i,
  feature: /\bfeat(ure)?[:\s]/i,
  security: /\bsecurity\b|CVE-\d+/i, // ... 10 more patterns
}

Limitations:

  • Classification results (category, confidence) are not stored in database; only topics array persists

  • No accuracy metrics tracked in production

  • Confidence threshold (0.6) defined but not enforced


Webhook Architecture

Production-ready webhook infrastructure receives events from GitHub and Vercel with cryptographic verification, replay protection, and complete audit trails.

Supported Events:

Source

Events

GitHub

push (default branch), pull_request (opened/closed/reopened/ready_for_review), issues (opened/closed/reopened), release (published), discussion (created/answered)

Vercel

deployment.created, deployment.succeeded, deployment.ready, deployment.error, deployment.canceled

Security measures:

  • Signature verification: HMAC SHA-256 (GitHub) and SHA-1 (Vercel) with timing-safe comparison

  • Replay protection: 5-minute timestamp validation window with 60-second clock skew tolerance

  • Audit trail: Raw JSON payloads stored permanently in workspace_webhook_payloads table

Processing architecture:

text
Webhook → Signature Verify → Timestamp Validate → Store Raw Payload ↓
Transform to SourceEvent → Emit Inngest Event ↓
Observation Capture Workflow (async) ├── Duplicate check ├── Event filter (source config) ├── Significance gate ├── Classification (parallel) ├── Embedding generation (parallel) ├── Entity extraction (parallel) ├── Cluster assignment ├── Pinecone upsert └── Database insert

Example: SourceEvent structure

typescript
interface SourceEvent { source: 'github' | 'vercel'; sourceType: 'pull-request.opened' | 'deployment.succeeded' | ...; sourceId: 'pr:lightfastai/lightfast#123'; title: 'feat: add semantic search'; body: 'Full PR description...'; actor: { id: '12345678', name: 'username', email, avatarUrl }; occurredAt: '2025-12-17T10:30:00Z'; references: [{ type: 'commit', id: 'abc123' }]; metadata: { /* source-specific */ };
}

Cross-source correlation:

Vercel deployments are linked to GitHub users via commit SHA. When a GitHub push arrives with the same commit, the Vercel observation's actor is updated with the numeric GitHub user ID.

Limitations:

  • Only GitHub and Vercel sources implemented (Linear, Sentry, PagerDuty planned)

  • No circuit breaker for failing transformers

  • No rate limiting at webhook endpoint level

  • No manual reprocessing UI


Why We Built It This Way

The observation pipeline uses a significance scoring gate before AI classification to minimize LLM costs. Only events scoring 40+ undergo classification and embedding generation. This keeps costs predictable while ensuring high-value events like security patches and releases are always captured.

Multi-view embeddings (title, content, summary) optimize retrieval for different query types. When searching for "authentication bug", the title embedding finds headline matches, while the content embedding surfaces detailed discussions. The summary view balances both for general queries.

Raw webhook payload storage enables replay and debugging. When something goes wrong, you can inspect the exact JSON received, re-trigger processing, or audit what happened.


Resources

5 min read

Try Lightfast now.

Join Early Access