Fuzzy Matching in RAG: Thresholds, Tradeoffs, and Fallback Design
Fuzzy matching is a pragmatic safety net for misspellings and out-of-vocabulary terms. In a hybrid RAG system, it’s usually a fallback—not the primary retrieval mode—but when it’s needed, tuning its thresholds matters.
Where Fuzzy Matching Fits
- Primary: semantic retrieval (vector similarity)
- Secondary: keyword retrieval with fuzzy matching for recall in typo-heavy queries
- Combined: comprehensive fallbacks that try both semantic and keyword paths
The keyword fallback below reads enable_fuzzy_matching
from SearchRetrievalSettings
and exposes it in the runtime plan.
# backend/core/query_router.py (excerpt)
async def _keyword_matching_fallback(self, question: str, chat_history: Optional[List[Dict]]) -> Dict[str, Any]:
try:
from .settings_manager import get_settings_manager
sr_settings = get_settings_manager().get_search_retrieval_settings()
fuzzy_enabled = bool(getattr(sr_settings, "enable_fuzzy_matching", True))
except Exception:
fuzzy_enabled = True
return {
"strategy": "keyword_matching",
"query_type": "keyword",
"search_method": "keyword_only",
"use_fuzzy_matching": fuzzy_enabled,
"confidence": 0.4,
"fallback_applied": True,
}
Threshold Surfaces
There are two threshold concepts commonly used:
similarity_threshold
: drives when semantic results are considered “good enough” to showfuzzy_threshold
: governs how permissive the fuzzy matcher should be (lower → more tolerant)
These live in QueryRoutingSettings
and can be persisted via the admin.
# backend/core/settings_schemas.py (excerpt)
@dataclass
class QueryRoutingSettings:
enable_smart_routing: bool = True
similarity_threshold: float = 0.5
fuzzy_threshold: float = 0.6
enable_fuzzy_matching: bool = True
max_search_results: int = 10
Calibration Strategy
- Start fuzzy off for expert/internal users; enable it when you observe typo-driven misses.
- Treat
fuzzy_threshold
as a domain knob: lower for casual queries, higher for precise/internal corpora. - Log hits where semantic fails but fuzzy succeeds; review delta to justify operational cost.
- If fuzzy is enabled, cap result count tightly and post-rank by semantic similarity to reduce noise.
Operational Notes
- Apply fuzzy only inside the keyword fallback, not the main path.
- Bound latency with timeouts and small
max_search_results
for the keyword leg. - Cache negative results briefly to avoid repeated fuzzy work on the same misspelled term.