Blogs & Articles
>
Innovating AI-based search with Rerankers: Understanding context and reasoning
Blog
1/17/2025

Innovating AI-based search with Rerankers: Understanding context and reasoning

To minimize hallucinations in AI and improve accuracy, retrieval-augmented generation (RAG) relies on advanced retrieval methods. Allganize's "Alli" Reranker enhances RAG by combining Cross-Encoder precision with efficient multi-stage retrieval, balancing speed and contextual accuracy. Experiments show 14% accuracy gains with practical latency, revolutionizing context-driven information retrieval.

To reduce hallucinations in generative AI and provide more accurate answers, the performance of RAG (Retrieval-Augmented Generation) is critical. Significant technical advancements have been made to enhance the retrieval of the most accurate information from documents. Among these, Allganize's Alli has introduced a more efficient Reranking strategy while maintaining precision.

Allganize's RAG team, including Mr. Hanjoon Jo and Team Leader Junghoon Lee, explains their Reranker, which is designed to understand context and reason effectively.

1. Overview

1.1 Background

Advancements in Information Retrieval Technology
Information retrieval technology has continuously evolved in response to the exponential growth of data and changing user demands. In its early stages, retrieval relied on simple keyword matching, which revealed significant limitations in both efficiency and precision.

The Emergence of BM25
BM25 introduced a method for ranking search results by evaluating word importance using statistical techniques like TF-IDF. While it improved search efficiency, BM25's focus on word frequency and placement meant it could not understand the contextual meaning of queries or documents.

The Rise of Semantic Search
With advancements in deep learning, vector-based retrieval models such as Bi-Encoders enabled semantic search, going beyond simple keyword matching. Bi-Encoders transform both queries and documents into vectors to evaluate their similarity, providing efficiency in large-scale data processing. However, Bi-Encoders struggled to capture detailed interactions between queries and documents.

Introduction of Advanced Deep Learning Models
To address these limitations, sophisticated models like Cross-Encoders were developed. Cross-Encoders take both queries and documents as a single input, allowing for detailed evaluation of contextual interactions, thereby offering the highest level of precision. Consequently, the paradigm of information retrieval has shifted from merely "finding" data to "understanding" the contextual relationship between queries and documents.

1.2 Motivation

The Need for Reranking Strategies
While Cross-Encoders significantly improve search quality, their high computational cost makes it impractical to apply them to all retrieved documents. To address this challenge, Reranking strategies have been developed to balance efficiency and precision. Reranking optimizes search precision while conserving computational resources, making it an essential component in modern search system design.

2. Comparison and Analysis of Search Models

2.1 BM25: Traditional Statistical Keyword-Based Search Model

Overview
BM25 (Best Matching 25) is a classical information retrieval model based on statistical principles. It ranks documents according to their relevance to a query by evaluating the frequency and distribution of query terms within the documents. BM25 is an enhanced version of TF-IDF, offering refinements in term weighting and document length normalization.

How BM25 Works

BM25 is a keyword-matching statistical model that calculates the similarity between queries and documents based on the importance of words. It uses term frequency (TF) and inverse document frequency (IDF) to compute relevance scores. The model also adjusts scores based on document length to prevent excessively long documents from receiving disproportionately high scores.

Strengths

Limitations

2.2 Bi-Encoder: Deep Learning-Based Semantic Search

How Bi-Encoder Works

The Bi-Encoder independently converts both the query and document into vector representations, then calculates the similarity between the two vectors. It leverages language models (e.g., BERT, RoBERTa) to create embeddings that reflect the contextual meaning of words. Bi-Encoders support parallel processing, making them efficient even for large-scale datasets.

Strengths

Limitations

2.3 Hybrid Search: Combining BM25 and Bi-Encoder

How Hybrid Search Works

Hybrid Search combines the scores from BM25 and Bi-Encoder to generate search results. The combination formula is typically:

Hybrid Score = α * BM25 Score + (1 - α) * Bi-Encoder Score

This approach leverages BM25’s keyword matching ability and Bi-Encoder’s contextual semantic analysis simultaneously. The parameter α controls the weighting of the two models, allowing for customization based on specific domain requirements.

Strengths

Limitations

2.4 Cross-Encoder: Advanced Deep Learning Search Model

How Cross-Encoder Works

The Cross-Encoder combines the query and document into a single input to evaluate the interaction between the two texts. It assesses the relationships between individual words in the query and document, providing high-precision results. By using large language models like BERT, the Cross-Encoder maximizes the understanding of contextual meaning and semantic relationships.

Strengths

Limitations

2.5 Summary of Model Comparisons

2.6 Summary and Conclusion

BM25, Bi-Encoder, Hybrid Search, and Cross-Encoder each offer unique search strategies with distinct strengths and limitations:

Each model is suited to specific environments and objectives. Selecting the optimal combination of these models based on the use case and constraints (e.g., speed, precision, scalability) is essential for designing an efficient and accurate search system.

3. Reranking Strategies

3.1 Limitations of Cross-Encoder and the Need for Reranking

Strengths of Cross-Encoder

The Cross-Encoder combines the query and document into a single input, enabling it to evaluate detailed interactions between the two.

  1. Highest Precision:
    • Cross-Encoder provides the most accurate evaluation of the contextual relationship between the query and document.
    • For example, it can distinguish between subtle differences in meaning, such as identifying documents that precisely align with the intent behind a query.
  2. Contextual Understanding:
    • It captures the semantic connections between the query and document better than models like BM25 or Bi-Encoders.
    • This ability ensures that documents relevant to the query's deeper intent are prioritized.

Limitations of Cross-Encoder

  1. High Computational Cost:
    • Cross-Encoder requires pairwise computation for each query-document pair, which is computationally intensive.
    • Applying it to a large dataset is inefficient and resource-consuming.
  2. Real-Time Processing Challenges:
    • It struggles to return results quickly in real-time environments, especially when dealing with vast document collections.
  3. Bottleneck Issues:
    • Directly applying Cross-Encoder to an entire dataset creates a bottleneck, exhausting computational resources and slowing down the retrieval process.

Need for Reranking

Reranking strategies were introduced to leverage the strengths of Cross-Encoders while overcoming their limitations.

3.2 How Reranking Works

Reranking operates in a multi-stage process designed to enhance search precision while maintaining computational efficiency. The process includes the following steps:

1. Initial Candidate Generation

2. Reordering with Cross-Encoder

3. Final Result Delivery

3.3 How Reranking Addresses Cross-Encoder's Limitations

Reranking effectively mitigates the limitations of Cross-Encoders by introducing a multi-stage approach that balances efficiency and precision. Here’s how it resolves these challenges:

1. Reduction of Computational Costs

2. Preservation of Precision

3. Balance Between Efficiency and Precision

4. Experiment: Analyzing the Effectiveness of Reranking

4.1 Experiment Setup

1. Dataset

2. Model Configuration

3. Evaluation Metrics

4. Hardware Environment

4.2 Experiment Results

Effectiveness of Reranker

Balance Between Latency and Accuracy

Average Additional Processing Time:

4.3 Performance Improvement Summary by Customer Dataset

The table below highlights the performance improvements achieved by the Reranker compared to the Baseline (Hybrid Search) for each customer dataset. Metrics include Top-3, Top-5, and Top-10 Accuracy, demonstrating the precision gains across various domains.

4.4 Conclusion

Performance Improvement

Efficiency and Accuracy

Exceptional Cases

5. Reranker Operation Example

5.1 Handling Multiple Conditions and Contextual Interactions

Features:

Example Question:
"The contract states that schedule changes are possible. In this case, who should bear the additional costs?"

This type of query involves multiple conditions (schedule changes, additional costs, and responsibility) and requires a nuanced understanding of their interrelationships, which the Reranker model is designed to handle effectively.

In the above question, accurately identify the correlation between "schedule changes" and "additional cost burdens" and explore the relevant regulations.

5.2 Identifying Specific Provisions or Detailed Items

Features

Limitations of BM25 + Bi-encoder:

Example Question

The term "Article 12" is likely to lose its importance when using keyword-based or standalone embedding search methods. However, applying a Reranker ensures it is appropriately prioritized during retrieval.

5.3 Handling Ambiguous or Implicit Expressions

Features

Limitations of BM25 + Bi-encoder:

Example Question

The phrase "I think my account has been hacked" should be accurately interpreted as "data breach" or "unauthorized access," enabling the retrieval of relevant documents and providing appropriate resources.

5.4 Integrating Information Across Multiple Documents

Features

Limitations of BM25 + Bi-encoder:

Example Question

When laws and case precedents related to "refusal of information disclosure" are found in separate documents, this system should link them together to retrieve and present the most relevant information.

5.5 Intent-Centered Search

Features

Limitations of BM25 + Bi-encoder:

Example Question

Traditional search methods may overlook the user's intent of "Do I need to handle this myself?" and instead retrieve results about administrative procedures at the local government office. However, Reranker prioritizes documents that focus on the user's responsibility ("Do I") and provides relevant information accordingly.

5.6 Inference-Based Search

Features

Limitations of BM25 + Bi-encoder:

Example Question

If a document includes the phrase "within the first 12 weeks of pregnancy," the system needs to infer the connection between "12 weeks" and "two months before the due date" to retrieve the relevant document.

5.7 Handling Fragmented Information

Features

Limitations of BM25 + Bi-encoder:

Example Question

Traditional methods might only retrieve content related to "responsibility." However, Reranker searches for both "additional work requests" and "responsibility," combining these aspects for more accurate results.

6. Conclusion

The Reranking strategy has proven to be an effective method to maximize search result precision while addressing the high computational cost of Cross-Encoders. By using Hybrid Search for initial candidate selection and then evaluating finer interactions with Cross-Encoder models, this approach achieves a balance between accuracy and efficiency.

Experimental Results:
The Reranking strategy improved Top-3, Top-5, and Top-10 accuracy across various domains, with particularly significant performance gains in public, legal, and energy sectors. These findings demonstrate that Reranking plays a crucial role in enhancing information retrieval systems.