SFR-RAG: How Open AI Can Beat OpenAI
Advancing Contextual Understanding in Large Language Models
Retrieval Augmented Generation (RAG) has emerged as a critical paradigm for enhancing the capabilities of large language models (LLMs). The recently introduced SFR-RAG model, developed by researchers at Salesforce AI Research, represents a promising direction we’ve seen evolving this year: small(er) models closing the performance gap between proprietary and open AI.
This article will explore the key features of SFR-RAG, its performance on various benchmarks, and a step by step explanation of how it improves RAG performance.
Key Features of SFR-RAG
SFR-RAG is a 9-billion parameter language model specifically designed to excel in RAG applications. The model's primary goal is to faithfully and comprehensively understand provided context and user questions, avoid hallucination, handle challenging scenarios, perform complex reasoning, and produce reliable citations. Let's break down the key aspects of SFR-RAG and how it achieves these objectives.
Novel Chat Template
Traditional LLMs typically use three roles in their conversational structure: System, User, and Assistant. SFR-RAG expands on this by adding two new roles: Thought and Observation.
This comes with the following benefits:
a) Role Clarification: By introducing separate roles for Thought and Observation, SFR-RAG creates a clearer structure for different types of information. This helps the model distinguish between its internal reasoning process (Thought) and external information (Observation).
b) Easier Masking During Training: The new template allows for more precise control over which parts of the conversation should be included in the training loss. Specifically, System, User, and Observation turns can be masked out, while Thought and Assistant turns are included in the fine-tuning process.
c) Enhanced Security: The separation of roles facilitates better instruction hierarchy enforcement. This makes the model more resistant to potential jailbreaks or malicious instructions injected through User or Observation turns.
d) Improved Developer Control: The new template streamlines the process of building reliable and secure RAG applications. Developers can more easily control which parts of the internal processing to display or hide from end-users.
e) Consistent Function Calling: By designating a specific role (Thought) for internal reasoning and tool use syntax, SFR-RAG avoids the need to parse custom keywords from the Assistant output, leading to more reliable function calling.
Comprehensive Fine-tuning Process
The model underwent an extensive fine-tuning process designed to enhance its contextual understanding and generation abilities. This process focused on several key capabilities:
Extracting Relevant Information: SFR-RAG is trained to efficiently extract pertinent information from long contexts. This is crucial for RAG applications where the model needs to sift through large amounts of retrieved data.
Recognizing Information Gaps: The model is trained to identify when relevant information is lacking in the provided context. This helps prevent hallucination by encouraging the model to abstain from generating responses when it lacks sufficient information.
Handling Conflicting Information: SFR-RAG is equipped to recognize and deal with potentially conflicting information in contextual passages. This is essential for real-world applications where retrieved information may be inconsistent or contradictory.
Resilience to Distractions: The fine-tuning process includes exposure to distracting, counter-intuitive, or out-of-distribution content. This helps the model maintain focus on relevant information even in the presence of noise.
Diverse Instruction Following: By using extensive instruction-following data that mimics real-world retrieval question answering applications, SFR-RAG is trained to handle a wide variety of tasks and query types.
Model Performance
To evaluate SFR-RAG's performance, the researchers introduced ContextualBench, a comprehensive evaluation suite comprising seven popular contextual question-answering tasks. This standardized benchmark allows for consistent comparison across different models and studies.
Keep reading with a 7-day free trial
Subscribe to LLM Watch to keep reading this post and get 7 days of free access to the full post archives.