Eyelash

Smart Query Routing in RAG: Intent, Confidence, and Fallbacks

by Nick Berens
RAGRoutingQuery UnderstandingConfidence ScoringFallbacksSettings

RAG systems benefit hugely from a router that can interpret intent, score confidence, and choose a suitable strategy (semantic, keyword, comprehensive) with sensible fallbacks. This post details a concrete implementation and the configuration surfaces you can tune in production.

Router Responsibilities

  • Intent detection and query-type classification
  • Confidence scoring and thresholding
  • Strategy selection and fallback orchestration
  • Respect runtime settings (feature flags, thresholds, timeouts)

The following snippets are from backend/core/query_router.py and backend/core/settings_manager.py.

Confidence-Driven Routing

The router computes a confidence score and applies a fallback when it falls below a configured threshold.

# backend/core/query_router.py
async def route_query_with_confidence(self, question: str, chat_history: Optional[List[Dict]] = None) -> Dict[str, Any]:
    start_time = time.time()
    routing_settings = get_settings_manager().get_routing_settings()

    try:
        routing_result = await self._smart_route_with_confidence(question, chat_history, routing_settings)

        if routing_result["confidence"] < routing_settings.confidence_threshold:
            logger.info(
                f"Low confidence ({routing_result['confidence']:.2f}) for query: '{question[:50]}...', applying fallback"
            )
            routing_result = await self._apply_fallback_strategy(question, chat_history, routing_settings)

        processing_time = time.time() - start_time
        routing_result["processing_time"] = processing_time
        routing_result["settings_applied"] = True
        return routing_result
    except Exception as e:
        logger.error(f"Error in enhanced query routing: {e}")
        return await self._emergency_fallback(question, chat_history)

This routing path surfaces a confidence field, processing_time, and a settings_applied marker for downstream analytics.

Intent Analysis + Query Type

The router first categorizes the query to assign a coarse strategy.

# backend/core/query_router.py
def route_query(self, question: str) -> Tuple[QueryType, Optional[str]]:
    settings_manager = get_settings_manager()
    if not settings_manager.is_feature_enabled("enable_smart_routing"):
        return QueryType.AI_TEXT_RESPONSE, None

    search_term = self._check_specific_image_search(question)
    if search_term:
        return QueryType.SPECIFIC_IMAGE_SEARCH, search_term

    if self._check_all_images_pattern(question):
        return QueryType.ALL_IMAGES, "all"

    search_term = self._check_show_me_pattern(question)
    if search_term:
        return QueryType.SHOW_ME_PATTERN, search_term

    search_term = self._check_general_image_pattern(question)
    if search_term:
        return QueryType.GENERAL_IMAGE_PATTERN, search_term

    return QueryType.AI_TEXT_RESPONSE, None

This example focuses on an illustrations use case, but the same pattern applies to RAG search vs. keyword vs. image vs. default response.

Fallback Strategies

Fallbacks are configurable, including a keyword-only path that honors fuzzy matching flags.

# backend/core/query_router.py
async def _keyword_matching_fallback(self, question: str, chat_history: Optional[List[Dict]]) -> Dict[str, Any]:
    try:
        from .settings_manager import get_settings_manager
        sr_settings = get_settings_manager().get_search_retrieval_settings()
        fuzzy_enabled = bool(getattr(sr_settings, "enable_fuzzy_matching", True))
    except Exception:
        fuzzy_enabled = True

    return {
        "strategy": "keyword_matching",
        "query_type": "keyword",
        "search_method": "keyword_only",
        "use_fuzzy_matching": fuzzy_enabled,
        "confidence": 0.4,
        "fallback_applied": True,
    }

Additional fallbacks include semantic_similarity and comprehensive_search, along with an emergency_fallback for hard errors.

Tunable Settings

Routing behavior is controlled by QueryRoutingSettings loaded via the SettingsManager.

# backend/core/settings_schemas.py (excerpt)
@dataclass
class QueryRoutingSettings:
    enable_smart_routing: bool = True
    confidence_threshold: float = 0.75
    fallback_strategy: str = "comprehensive_search"
    enable_query_caching: bool = True
    query_cache_ttl_seconds: int = 300
    enable_parallel_processing: bool = True
    max_retries: int = 3
    # Search configuration
    enable_fuzzy_matching: bool = True
    similarity_threshold: float = 0.5
    fuzzy_threshold: float = 0.6
    max_search_results: int = 10

Practical Guidance

  • Start with confidence_threshold ≈ 0.7–0.8; review logs for borderline cases and adjust.
  • Prefer comprehensive_search fallback in early deployments; migrate to more selective fallbacks once signals are strong.
  • Keep max_retries ≤ 3 with exponential backoff to bound latency.
  • Expose enable_smart_routing as a feature flag to A/B routing vs. direct semantic search.

Observability

The router logs include hash-based identifiers and timing. For production, feed routing_result, confidence, and fallback_applied into your analytics to build win-rate dashboards per strategy and per domain segment.