Citations & Sources
How Lightfast ensures every answer is verifiable with links to source documents
Citations & Sources
Every answer includes verifiable sources. This is a core principle of Lightfast—no summarization without evidence, no claims without citations. When you use the /v1/answer endpoint, responses always include references to the original documents where information was found.
Citation Structure
Responses include three key components:
1. Content — The Answer
The synthesized answer to your query, written in clear language:
"Authentication uses OAuth 2.0 with JWT tokens. The system supports refresh tokens
with a 7-day expiry and access tokens valid for 1 hour. The implementation follows
RFC 6749 and includes PKCE for additional security."
2. Citations — Source Documents
Source documents with metadata and snippets:
3. Rationale — Why These Sources
Explanation of why these sources were selected:
"These sources were chosen because PR #123 introduced the OAuth implementation with
specific token expiry details, and the Authentication Guide documents the security
rationale for including PKCE."
Verification Workflow
Citations enable you to verify every claim:
- Review the answer: Read the synthesized response
- Check citations: Examine the source snippets and relevance scores
- Verify claims: Click through to original documents to see full context
- Assess recency: Check timestamps to ensure information is current
- Trace decisions: Follow links to related PRs, issues, or discussions
This transparency builds trust and allows you to dive deeper when needed.
Citations respect your permissions. If a source document becomes inaccessible (repo made private, access revoked), Lightfast automatically excludes it from results. You only see citations for content you currently have permission to view.
Citation Metadata
Each citation includes rich metadata to help you assess relevance and trustworthiness:
Relevance Score
A value from 0.0 to 1.0 indicating how well the source matches your query:
- 0.9 - 1.0: Highly relevant, directly answers the query
- 0.8 - 0.9: Very relevant, contains substantial information
- 0.7 - 0.8: Relevant, provides supporting context
- < 0.7: Marginally relevant, included for completeness
Higher scores appear first in the citation list.
Timestamp
When the source was created or last modified:
- Shows recency of information
- Helps identify if content might be outdated
- Useful for understanding evolution of decisions
Snippet
Key excerpt from the source that contains relevant information:
- Pre-highlighted text saves you time
- Shows exactly where the answer came from
- Context for why the source was selected
Source Type
What kind of document the citation references:
- PR: Pull request with code changes and discussion
- Issue: Bug report, feature request, or tracking item
- Doc: Documentation file (README, guide, RFC)
- Discussion: GitHub Discussion thread
- Commit: Individual code commit
Multi-Source Answers
Complex questions often require multiple sources:
Query: "Why did we migrate from MongoDB to PostgreSQL and what were the performance impacts?"
Citation strategy:
- Decision RFC (relevance: 0.95) — Why the migration happened
- Benchmark PR (relevance: 0.92) — Performance comparison data
- Migration guide (relevance: 0.89) — Implementation details
- Post-migration analysis (relevance: 0.87) — Actual performance impacts
- Related issues (relevance: 0.78) — Problems solved and remaining challenges
Each citation contributes a different aspect of the complete answer.
Citation Quality
Lightfast ensures citation quality through:
Relevance Filtering
Only sources above a minimum relevance threshold (typically 0.65) are included. This prevents tangentially related content from cluttering citations.
Diversity
Citations represent diverse perspectives when possible:
- Multiple source types (PR + docs + discussions)
- Different time periods (historical context + recent updates)
- Various contributors (not just one person's view)
Faithfulness
The answer content must be grounded in the citations:
- No hallucinated claims
- No synthesis beyond what sources support
- Clear attribution when paraphrasing
Recency Bias
More recent sources are preferred when multiple sources cover the same information:
- Latest documentation overrides old
- Recent PRs show current implementation
- Updated discussions reflect evolved thinking
Using Citations Effectively
When to Trust Citations
High-confidence scenarios:
- Multiple citations agree (>3 sources, relevance >0.85)
- Recent timestamps (within last 6 months)
- Direct quotes or code snippets
- Authoritative sources (CODEOWNERS, official docs)
Verify when:
- Single citation with moderate relevance (0.65-0.80)
- Old timestamps (>2 years for fast-moving code)
- Conflicting information between citations
- Critical decisions or compliance questions
Following Citation Chains
Citations often reference other documents:
- Start with the primary citation from your answer
- Follow links in that document to design docs or RFCs
- Trace PRs back to issues they resolve
- Read discussion threads for full context
This deep dive reveals the complete decision-making process.
API Response Format
When using /v1/answer, citations are returned in structured format:
All fields are documented in the API reference.
Examples
Example 1: Technical Question
Query: "How do we handle rate limiting?"
Citations returned:
middleware/rate-limit.ts— Implementation code- PR #456 "Add rate limiting middleware" — Design discussion
docs/api.md— Public documentation- Issue #123 "Rate limit requests" — Original requirements
Rationale: "These sources show the implementation, design decisions, public interface, and original motivation for rate limiting."
Example 2: Decision Question
Query: "Why did we choose Redis over Memcached?"
Citations returned:
- RFC #78 "Caching Strategy" — Decision rationale
- PR #234 "Implement Redis caching" — Implementation
- Discussion thread — Team debate and alternatives
- Performance benchmark — Comparison data
Rationale: "RFC #78 documents the decision with Redis vs Memcached tradeoffs. PR #234 shows implementation. Discussion captures team alignment. Benchmark provides data that informed the choice."
Example 3: Ownership Question
Query: "Who owns the payment service?"
Citations returned:
services/payment/CODEOWNERS— Explicit ownership- Commit history analysis — Alice: 65% of commits
- PR review patterns — Alice: primary reviewer
- Recent activity — Last 10 PRs authored or reviewed by Alice
Rationale: "Alice is listed in CODEOWNERS, authored most commits in the last 6 months, and serves as primary reviewer for payment-related changes."
Next Steps
- Search & Retrieval — How sources are found
- Quality & Evaluation — How citation quality is measured
- POST /v1/answer — Generate answers with citations
- Examples — See citation patterns in practice