DEV Community

Cover image for Doing Multihop on HotPotQA Using Qwen 2.5 72B
Aryan Kargwal for Tune AI

Posted on

Doing Multihop on HotPotQA Using Qwen 2.5 72B

When dealing with complex question-answering tasks, a single-hop retrieval approach might not be enough. Questions often require synthesizing information from multiple sources. That’s where MultiHop Question Answering (QA) comes into play, requiring more advanced tools for retrieval and reasoning. In this post, I’ll describe how I built a multi-hop QA pipeline using DSPy, ColBERT, TuneAPI, and Qwen 2.5 72B to handle multi-step reasoning over a knowledge base.

Understanding the Key Tools

Before diving into the code, let’s first break down the key tools and libraries that power this pipeline:

1. DSPy (Data Structure Processing)

DSPy is a Python library that helps structure multi-step processes for tasks like retrieval-augmented generation and multi-hop question answering. It allows us to define a clear, modular flow for handling complex information retrieval tasks and integrate language models effectively.

2. ColBERT (Contextualized Late Interaction over BERT)

ColBERT is a dense retrieval model designed to retrieve passages efficiently from large corpora. It works by encoding both the query and documents in a low-dimensional space and comparing them to find relevant matches. For multi-hop QA, ColBERT helps identify the most pertinent passages to answer complex questions.

3. TuneAPI (API Proxy for LLMs)

TuneAPI acts as a proxy API to interact with LLMs such as Qwen. This lets us access the powerful inference capabilities of LLMs and customize how they process inputs and generate responses.

4. Qwen 2.5 72B (Alibaba’s Vision-Language Model)

Qwen 2.5 72B is a state-of-the-art large language model developed by Alibaba. While it’s primarily known for its vision-language tasks, Qwen excels in natural language reasoning, making it a great choice for multi-hop QA tasks where nuanced reasoning over text is required.

5. HotPotQA (Dataset for Multi-Hop QA)

HotPotQA is a dataset designed specifically for multi-hop question answering. It contains questions that require information from multiple documents to arrive at an accurate answer, making it ideal for training and evaluating multi-hop QA systems.


Setting Up the MultiHopQA Pipeline

The goal here is to build an end-to-end pipeline that can retrieve relevant documents using ColBERT, pass the retrieved contexts to Qwen 2.5 72B for reasoning, and finally output the predicted answer.

Code Walkthrough

Let’s break down the process into manageable steps. Here’s the code for building the pipeline:

1. Importing Required Libraries

import requests
from dsp import LM
from dspy.datasets import HotPotQA
import dspy
from dsp.utils import deduplicate
Enter fullscreen mode Exit fullscreen mode

We start by importing necessary libraries: DSPy for data handling, ColBERT for retrieval, and requests to interact with the TuneAPI. The HotPotQA dataset is also loaded to provide multi-hop questions for the pipeline.

2. Creating a Custom Language Model Client

To use Qwen, we need a custom class to handle API requests. We interact with Qwen via the TuneAPI to submit prompts and retrieve responses.

class CustomLMClient(LM):
    def __init__(self, model, api_key):
        self.model = model
        self.api_key = api_key
        self.base_url = "https://proxy.tune.app/chat/completions"
        self.history = []
        self.kwargs = {}

    def basic_request(self, prompt: str, **kwargs):
        headers = {
            "Authorization": f"{self.api_key}",
            "Content-Type": "application/json"
        }
        data = {
            "model": self.model,
            "messages": [
                {"role": "system", "content": "You are TuneStudio, answer the question based on the context given to you."},
                {"role": "user", "content": prompt}
            ],
            "temperature": kwargs.get("temperature", 0.9),
            "max_tokens": kwargs.get("max_tokens", 100),
            "frequency_penalty": kwargs.get("frequency_penalty", 0.2),
            "stream": kwargs.get("stream", False)
        }
        response = requests.post(self.base_url, headers=headers, json=data)
        return response.json()

custom_lm = CustomLMClient(model='qwen/qwen-2.5-72b', api_key='your_api_key_here')
Enter fullscreen mode Exit fullscreen mode

This class wraps around the Qwen model and formats the prompt, handles API communication, and processes the response. The basic_request method takes care of sending requests to the TuneAPI.

3. Configuring Retrieval and Language Model

Next, we configure ColBERT and set up DSPy to use our custom language model client for inference.

colbertv2_wiki17_abstracts = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')
dspy.settings.configure(lm=custom_lm, rm=colbertv2_wiki17_abstracts)
Enter fullscreen mode Exit fullscreen mode

Here, ColBERTv2 retrieves relevant Wikipedia abstracts. These abstracts will be passed to the language model for deeper reasoning.

4. Loading HotPotQA Dataset

dataset = HotPotQA(train_seed=1, train_size=20, eval_seed=2023, dev_size=50, test_size=0)
trainset = [x.with_inputs('question') for x in dataset.train]
devset = [x.with_inputs('question') for x in dataset.dev]
Enter fullscreen mode Exit fullscreen mode

We load a small subset of the HotPotQA dataset for testing. This dataset will provide multi-hop questions for the pipeline.

5. Simplified Baleen for Multi-Hop Retrieval

The Simplified Baleen class handles the multi-hop retrieval process. It repeatedly retrieves passages, feeds them into the language model, and finally generates an answer.

class SimplifiedBaleen(dspy.Module):
    def __init__(self, lm_client, passages_per_hop=3, max_hops=1):
        super().__init__()
        self.retrieve = dspy.Retrieve(k=passages_per_hop)
        self.max_hops = max_hops
        self.lm_client = lm_client

    def generate_query(self, context, question, **kwargs):
        query = f"{question} Context: {' '.join(context)}"
        return query

    def generate_answer(self, context, question, **kwargs):
        context_str = " ".join(context)
        prompt = f"Given the following information: {context_str} \n\nAnswer the question: {question}"
        response = self.lm_client(prompt, **kwargs)
        return response[0]

    def forward(self, question, **kwargs):
        context = []
        for _ in range(self.max_hops):
            query = self.generate_query(context, question, **kwargs)
            passages = self.retrieve(query).passages
            context = deduplicate(context + passages)
        answer = self.generate_answer(context, question, **kwargs)
        return dspy.Prediction(context=context, answer=answer)
Enter fullscreen mode Exit fullscreen mode

This is the core of our pipeline. It:

  • Generates queries based on previously retrieved context.
  • Retrieves relevant documents using ColBERT.
  • Passes the final context to Qwen to generate the answer.

6. Running the Pipeline

We define a question and pass it through the pipeline to retrieve the answer.

my_question = "What position on the Billboard Top 100 did Alison Moyet's late summer hit achieve?"
uncompiled_baleen = SimplifiedBaleen(lm_client=custom_lm)
pred = uncompiled_baleen(my_question, temperature=0.9, max_tokens=100)

print(f"Question: {my_question}")
print(f"Predicted Answer: {pred.answer}")
print(f"Retrieved Contexts (truncated): {[c[:200] + '...' for c in pred.context]}")
Enter fullscreen mode Exit fullscreen mode

This question is answered using multiple passages retrieved in successive hops and is reasoned over by Qwen 2.5 72B. The final answer is printed alongside the retrieved contexts.


Final Thoughts

This project highlights the growing importance of multi-hop question answering and how combining modern tools like ColBERT for retrieval and Qwen for reasoning can provide powerful solutions. By leveraging datasets like HotPotQA, it’s easier to experiment and fine-tune these pipelines for real-world QA systems.

Future Plans:

  • Experiment with more retrieval-augmented generation tasks.
  • Extend this pipeline to support more languages and domain-specific datasets.

For more NLP tutorials and walkthroughs, feel free to check out my YouTube Channel.

Top comments (0)