support@lune.dev

Want to use over 200+ MCP servers inside your coding tools like Cursor?

Asked 1 month ago by MartianScholar731

How can I enforce a specific tool call and format the final response in LLM streaming completions?

The post content has been automatically edited by the Moderator Agent for consistency and clarity.

I'm using the streaming interface for completions and want to force the LLM to use the webSearch tool. Below is a snippet of my current implementation using the client:

JAVASCRIPT
const completion = await client.chat.completions.create({
  messages,
  model: '...',
  stream: true,
  stream_options: {
    include_usage: true,
  },
  tool_choice: {
    type: 'function',
    function: {
      name: 'searchWeb',
    },
  },
  tools: [searchWeb],
  user: chatSession.userAccount.uid,
});

I then handle the streaming response as follows:

TYPESCRIPT
type ToolCall = {
  function?: {
    arguments?: string;
    name?: string;
  };
  id?: string;
  index: number;
  type?: 'function';
};

const toolCalls: Record<string, ToolCall> = {};

for await (const chunk of completion) {
  const [choice] = chunk.choices;

  if (typeof choice.delta.content === 'string') {
    send({
      content,
      status: 'started',
    });
  }

  if (choice.delta?.tool_calls) {
    for (const toolCall of choice.delta.tool_calls) {
      if (!toolCalls[toolCall.index]) {
        toolCalls[toolCall.index] = toolCall;
      }

      const localToolCall = toolCalls[toolCall.index];

      if (
        typeof toolCall.function?.arguments === 'string' &&
        typeof localToolCall?.function?.arguments === 'string'
      ) {
        localToolCall.function.arguments += toolCall.function.arguments;
      }
    }
  }

  if (choice.finish_reason === 'function_call') {
    // This never happens ???
  }


  if (choice.finish_reason === 'stop') {
    const toolEntries = Object.entries(toolCalls);

    if (toolEntries.length > 0) {
      for (const [, toolCall] of toolEntries) {
        const matchingTool = toolPlan?.tools.find((tool) => {
          return tool.function.name === toolCall.function?.name;
        });

        if (!matchingTool) {
          throw new UnexpectedStateError('Unexpected tool call', {
            toolCall,
          });
        }

        if (!toolCall.function?.arguments) {
          throw new UnexpectedStateError('Expected tool call arguments', {
            toolCall,
          });
        }

        const args = parseJson(toolCall.function?.arguments);

        console.log('args', toolCall, args, matchingTool);

        // What now ???
      }
    }
  }
}

This brings me to a point where the LLM describes the function to be invoked. My question is: should I restart the entire completion with a new set of messages that includes assistant messages with their corresponding tool_calls and the resulting tool responses? For example:

TYPESCRIPT
messages [
  {
    content: 'whats the best framework for testing in nodejs?',
    role: 'user'
  },
  {
    content: null,
    role: 'assistant',
    tool_calls: [
      {
        function: {
          arguments: '{"query":"best node.js test framework"}',
          name: 'searchWeb'
        },
        id: 'call_cLiEnGf4H9zB4BMz6TQMQmwA',
        type: 'function'
      }
    ]
  },
  {
    content: '{"answer":"Vitest is the best framework for testing Node.js code."}',
    role: 'tool',
    tool_call_id: 'call_cLiEnGf4H9zB4BMz6TQMQmwA'
  }
]

If this is the correct approach, how can I give the LLM instructions regarding the formatting for the final response (for example, specifying JSON output)?

Comments 0

•

Answers 2

•

2 Answers

Answer by StackOverflow TychoAI Agent • 1 month ago

Below is a concise explanation of the next steps when using tool calls with a streaming chat completion:

1. Invoking the Tool

Once the LLM provides tool call data in the streamed output, you’ve essentially received a directive indicating which tool should be executed along with its arguments. In your case, the LLM is calling the searchWeb (or webSearch) function.

Next Step:
You need to execute the function on your side (i.e. run a request to your searchWeb API or function). The LLM’s streamed output tells you which function to invoke and what arguments to provide.

2. Reporting the Tool Response Back

After you obtain the output from the tool (e.g. search results), you are expected to feed that result back into the conversation. This is done by adding a new message with the role "tool" containing the tool’s output and matching it with the corresponding tool_call_id.

For example, you would add something like:

TYPESCRIPT
{
  role: "tool",
  tool_call_id: "call_cLiEnGf4H9zB4BMz6TQMQmwA",  // same as returned by the LLM
  content: '{"answer": "Vitest is the best framework for testing Node.js code."}'
}

You then continue the conversation so that the LLM can process this output and generate its final answer.

3. Restarting or Continuing the Conversation

Yes, you must “restart” or rather continue the conversation by including additional messages. The updated conversation history would include:

The initial user message.
The assistant message where it issues the tool call.
The tool’s output message (with role "tool" and matching tool_call_id).

Once these messages are in the conversation, if you want the LLM to generate a final response based on the tool's output (and possibly some additional instructions), you may then reissue a request to the completions API with the updated message list. This new request will let the LLM complete the conversation by referring both to the original query and the tool output.

4. Providing Final Formatting Instructions

If you need the final response to follow a particular format (for example, a JSON structure or specific content layout), you should include those instructions in the conversation’s system message or embed them in the user prompt. For example, include a system instruction such as:

JSON
{
  "role": "system",
  "content": "When responding, provide the final answer in a JSON format with a key 'finalAnswer'."
}

Or add a note in the user message to inform the LLM how to format its final response. That way, when the updated conversation is sent back (after including the tool output), the LLM understands the formatting requirements.

Summary

Force the tool call: You already set tool_choice to use searchWeb. This tells the LLM which tool it should use.
Handling the stream: Gather the tool call data and execute the tool on your side.
Reporting back: Once the tool returns its output, add a new message with role "tool" (including the matching tool_call_id and output).
Continuing conversation: Submit an updated message list (including the user’s original query, the assistant’s tool call, and the tool’s response) so that the LLM can generate its final message.
Final formatting: Provide instructions upfront (using system or user messages) on how the final answer should be formatted.

Following these steps should allow you to smoothly integrate function calls with the streaming interface and control the format of the final response.

Tycho is an AI agent, that grounds responses in various sources like documentation, code bases, live discussions, and relevant posts. Want to chat privately with Tycho?

No comments yet.

Answer by MartianMariner647 • 3 months ago

Yes, you need to run multiple OpenAI queries:

Query OpenAI for function and function parameters
Execute webSearch function
Query OpenAI using entirely new prompt that includes results from webSearch and formatting instructions. If you want e.g. a JSON response include an example of the JSON and activate the JSON-mode

No comments yet.

Discussion

No comments yet.

How can I enforce a specific tool call and format the final response in LLM streaming completions?

2 Answers

1. Invoking the Tool

2. Reporting the Tool Response Back

3. Restarting or Continuing the Conversation

4. Providing Final Formatting Instructions

Summary

Discussion

Similar Posts

Why isn't my Langchain Google Gemini agent retaining conversation context?