In a nutshell: SFT and DPO enable targeted training of tool selection in language models without requiring management of custom training infrastructure.

Amazon SageMaker AI enables engineers to improve the accuracy of AI agents in tool calling through combined Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). This reduces error rates and shortens processing times in production environments.

In practice, AI agents frequently fail due to incorrect tool selection or faulty parameter formatting. When an agent calls the wrong tool, formats parameters incorrectly, or interrupts a workflow chain, this results in rising error rates, longer processing times, and higher support costs. The goal is to train small language models (SLM) so they select the right tool for each request.

Supervised Fine-Tuning (SFT) works through curated high-quality datasets containing explicit examples of how the model should interact with specific tools. This teaches the model to recognize nuances in tool-specific commands and constraints. Direct Preference Optimization (DPO) refines this approach by incorporating feedback or predefined objectives directly into the training loop. DPO training data uses a “like this, not like that” format that optimizes the model without reward functions or separate reward models. This reduces resource requirements and training time.

Amazon SageMaker AI training jobs provides a fully managed solution for this workflow. Engineers can use it to train Qwen3 1.7B or other models on distributed multi-GPU and multi-node configurations without managing infrastructure themselves. Metrics from the training loop are automatically sent to MLflow on SageMaker AI for later analysis. After training, different fine-tuned variants can be evaluated and compared against a base model to make data-driven decisions about model quality.

To use this approach, engineers need an AWS account with appropriate IAM roles, access to SageMaker AI, and a configured development environment. Amazon provides the training infrastructure as needed and automatically shuts it down after the job completes. The combination of SFT and DPO enables engineers to systematically train language models for complex multi-tool interactions in production.

Source: aws.amazon.com · Published June 3, 2026
Lumi AI News — AI-assisted curation pursuant to Article 50 EU AI Act. Paraphrase and classification through Lumi News Pipeline v1.2.9.

Share on:

Precision in Tool Calls: SFT and DPO for Language Models on SageMaker

Lumi AI News

Legal

Topics