Putting the Techniques to the Test with FinanceBench Dataset
To assess the effectiveness of fine-tuning and iterative reasoning, the researchers conducted experiments using the FinanceBench dataset. This is an open-sourced subset of a comprehensive collection of 10,000 financial-analysis questions about publicly traded companies, based on public company filings with the U.S. Securities and Exchange Commission (SEC).
The experiments compared various Q&A system configurations, including generic retrieval-augmented generation (RAG), fine-tuned RAG, and RAG with OODA reasoning. The performance of each system was evaluated using several automated and human-evaluated metrics, including retrieval quality and answer correctness.
Key Findings: Fine-Tuning and Iterative Reasoning Deliver Impressive Results
The results showed that fine-tuning significantly improved retrieval accuracy and answer quality. Notably, fine-tuning the embedding model used in RAG’s retrieval step resulted in higher accuracy gains compared to fine-tuning the generative model.
Additionally, integrating iterative reasoning with the OODA loop yielded the highest performance improvements. The generic RAG with OODA reasoning configuration outperformed even the fully fine-tuned RAG, highlighting the critical role of iterative reasoning in enhancing Q&A systems.
Understanding and Applying What We Learned
The AI Alliance aims to empower the AI community by providing a structured analysis of these techniques and their contributions to Q&A performance, offering clear best practices for developing domain-specific Q&A systems.
- Prioritize Fine-Tuning of Embedding Models: This technique offers superior performance and resource efficiency compared to fine-tuning generative models.
- Employ Iterative Reasoning Mechanisms: Use OODA reasoning or other iterative methods to significantly enhance the Q&A system's ability to combine information from multiple sources and improve informational consistency.
- Map Out a Structured Technical Design Space: Identify the components with the most significant impact on Q&A system performance. Create a structured design space to capture possible configurations and make informed decisions based on quantitative results.
The Power of Open Innovation and Collaboration: A Future of Precise Answers and Progress