Fuzzy Matching in RAG: Thresholds, Tradeoffs, and Fallback Design

Jan 10, 2025by Nick Berens

RAGFuzzy MatchingKeyword SearchThresholdsRetrieval

Fuzzy matching is a pragmatic safety net for misspellings and out-of-vocabulary terms. In a hybrid RAG system, it’s usually a fallback—not the primary retrieval mode—but when it’s needed, tuning its thresholds matters.

Where Fuzzy Matching Fits

Primary: semantic retrieval (vector similarity)
Secondary: keyword retrieval with fuzzy matching for recall in typo-heavy queries
Combined: comprehensive fallbacks that try both semantic and keyword paths

The keyword fallback below reads enable_fuzzy_matching from SearchRetrievalSettings and exposes it in the runtime plan.

# backend/core/query_router.py (excerpt)
async def _keyword_matching_fallback(self, question: str, chat_history: Optional[List[Dict]]) -> Dict[str, Any]:
    try:
        from .settings_manager import get_settings_manager
        sr_settings = get_settings_manager().get_search_retrieval_settings()
        fuzzy_enabled = bool(getattr(sr_settings, "enable_fuzzy_matching", True))
    except Exception:
        fuzzy_enabled = True

    return {
        "strategy": "keyword_matching",
        "query_type": "keyword",
        "search_method": "keyword_only",
        "use_fuzzy_matching": fuzzy_enabled,
        "confidence": 0.4,
        "fallback_applied": True,
    }

Threshold Surfaces

There are two threshold concepts commonly used:

similarity_threshold: drives when semantic results are considered “good enough” to show
fuzzy_threshold: governs how permissive the fuzzy matcher should be (lower → more tolerant)

These live in QueryRoutingSettings and can be persisted via the admin.

# backend/core/settings_schemas.py (excerpt)
@dataclass
class QueryRoutingSettings:
    enable_smart_routing: bool = True
    similarity_threshold: float = 0.5
    fuzzy_threshold: float = 0.6
    enable_fuzzy_matching: bool = True
    max_search_results: int = 10

Calibration Strategy

Start fuzzy off for expert/internal users; enable it when you observe typo-driven misses.
Treat fuzzy_threshold as a domain knob: lower for casual queries, higher for precise/internal corpora.
Log hits where semantic fails but fuzzy succeeds; review delta to justify operational cost.
If fuzzy is enabled, cap result count tightly and post-rank by semantic similarity to reduce noise.

Operational Notes

Apply fuzzy only inside the keyword fallback, not the main path.
Bound latency with timeouts and small max_search_results for the keyword leg.
Cache negative results briefly to avoid repeated fuzzy work on the same misspelled term.