Smart Query Routing in RAG: Intent, Confidence, and Fallbacks
RAG systems benefit hugely from a router that can interpret intent, score confidence, and choose a suitable strategy (semantic, keyword, comprehensive) with sensible fallbacks. This post details a concrete implementation and the configuration surfaces you can tune in production.
Router Responsibilities
- Intent detection and query-type classification
- Confidence scoring and thresholding
- Strategy selection and fallback orchestration
- Respect runtime settings (feature flags, thresholds, timeouts)
The following snippets are from backend/core/query_router.py
and backend/core/settings_manager.py
.
Confidence-Driven Routing
The router computes a confidence score and applies a fallback when it falls below a configured threshold.
# backend/core/query_router.py
async def route_query_with_confidence(self, question: str, chat_history: Optional[List[Dict]] = None) -> Dict[str, Any]:
start_time = time.time()
routing_settings = get_settings_manager().get_routing_settings()
try:
routing_result = await self._smart_route_with_confidence(question, chat_history, routing_settings)
if routing_result["confidence"] < routing_settings.confidence_threshold:
logger.info(
f"Low confidence ({routing_result['confidence']:.2f}) for query: '{question[:50]}...', applying fallback"
)
routing_result = await self._apply_fallback_strategy(question, chat_history, routing_settings)
processing_time = time.time() - start_time
routing_result["processing_time"] = processing_time
routing_result["settings_applied"] = True
return routing_result
except Exception as e:
logger.error(f"Error in enhanced query routing: {e}")
return await self._emergency_fallback(question, chat_history)
This routing path surfaces a confidence
field, processing_time
, and a settings_applied
marker for downstream analytics.
Intent Analysis + Query Type
The router first categorizes the query to assign a coarse strategy.
# backend/core/query_router.py
def route_query(self, question: str) -> Tuple[QueryType, Optional[str]]:
settings_manager = get_settings_manager()
if not settings_manager.is_feature_enabled("enable_smart_routing"):
return QueryType.AI_TEXT_RESPONSE, None
search_term = self._check_specific_image_search(question)
if search_term:
return QueryType.SPECIFIC_IMAGE_SEARCH, search_term
if self._check_all_images_pattern(question):
return QueryType.ALL_IMAGES, "all"
search_term = self._check_show_me_pattern(question)
if search_term:
return QueryType.SHOW_ME_PATTERN, search_term
search_term = self._check_general_image_pattern(question)
if search_term:
return QueryType.GENERAL_IMAGE_PATTERN, search_term
return QueryType.AI_TEXT_RESPONSE, None
This example focuses on an illustrations use case, but the same pattern applies to RAG search vs. keyword vs. image vs. default response.
Fallback Strategies
Fallbacks are configurable, including a keyword-only path that honors fuzzy matching flags.
# backend/core/query_router.py
async def _keyword_matching_fallback(self, question: str, chat_history: Optional[List[Dict]]) -> Dict[str, Any]:
try:
from .settings_manager import get_settings_manager
sr_settings = get_settings_manager().get_search_retrieval_settings()
fuzzy_enabled = bool(getattr(sr_settings, "enable_fuzzy_matching", True))
except Exception:
fuzzy_enabled = True
return {
"strategy": "keyword_matching",
"query_type": "keyword",
"search_method": "keyword_only",
"use_fuzzy_matching": fuzzy_enabled,
"confidence": 0.4,
"fallback_applied": True,
}
Additional fallbacks include semantic_similarity
and comprehensive_search
, along with an emergency_fallback
for hard errors.
Tunable Settings
Routing behavior is controlled by QueryRoutingSettings
loaded via the SettingsManager
.
# backend/core/settings_schemas.py (excerpt)
@dataclass
class QueryRoutingSettings:
enable_smart_routing: bool = True
confidence_threshold: float = 0.75
fallback_strategy: str = "comprehensive_search"
enable_query_caching: bool = True
query_cache_ttl_seconds: int = 300
enable_parallel_processing: bool = True
max_retries: int = 3
# Search configuration
enable_fuzzy_matching: bool = True
similarity_threshold: float = 0.5
fuzzy_threshold: float = 0.6
max_search_results: int = 10
Practical Guidance
- Start with
confidence_threshold
≈ 0.7–0.8; review logs for borderline cases and adjust. - Prefer
comprehensive_search
fallback in early deployments; migrate to more selective fallbacks once signals are strong. - Keep
max_retries
≤ 3 with exponential backoff to bound latency. - Expose
enable_smart_routing
as a feature flag to A/B routing vs. direct semantic search.
Observability
The router logs include hash-based identifiers and timing. For production, feed routing_result
, confidence
, and fallback_applied
into your analytics to build win-rate dashboards per strategy and per domain segment.