The OpenAI-compatible API provides a unified request and response structure for different AI providers. Instead of dealing with dozens of different formats, you work with a single standard that is compatible with SDKs, libraries, and UIs.
Consider two native APIs: agent.hostman.com and a hypothetical api.somerandomai.com. Let’s see how message sending is implemented.

Even for the same task, request structures can differ significantly. For example, one API uses parent_message_id to maintain context, while another uses a nested object:
"context": {
"thread_id": "abc123",
"reply_to": "msg789"
}
Additionally, with our API, settings are configured via a separate request, whereas in the third-party API, they are included in the settings object with every message.
As a result, you would need to:
The OpenAI-compatible API solves this problem. The structure is always the same: a messages array with role and content fields, plus standard parameters (model, temperature, max_tokens). Differences remain only in the URL and model name.

Benefits:
The current implementation is not fully OpenAI-compatible.
Not supported:
/v1/embeddings endpoint is unavailableImplementation notes:
model parameter in requests is ignored, the model defined in the agent settings is used.Let’s look at how to use the OpenAI-compatible API.
You can find all available API methods in the documentation.
Regardless of the API type (private or public), an API token is required for requests.
Include the token in the request header:
Authorization: Bearer $TOKEN
In cURL examples, you can:
$TOKEN with your actual token in each request, orexport TOKEN=your_access_token
The $TOKEN variable will then be automatically substituted.
In Python and Node.js examples, the token is represented as {{token}}. We recommend storing it in environment variables or configuration files instead of in code to prevent leaks.
The API also requires a base URL, which can be found in the Dashboard tab of the agent’s control panel.

Two methods are supported: Chat Completions and Text Completions. The Text Completions method is deprecated and only maintained for backward compatibility; it is not recommended.
POST /api/v1/cloud-ai/agents/{{agent_id}}/v1/chat/completions
cURL:
curl --request POST \
--url https://agent.hostman.com/api/v1/cloud-ai/agents/{{agent_id}}/v1/chat/completions \
--header 'authorization: Bearer $TOKEN' \
--header 'content-type: application/json' \
--data '{ "model": "gpt-4.1", "messages": [ { "role": "user", "content": "Hello!" } ], "temperature": 1, "max_tokens": 100, "stream": false }'
Python:
import requests
url = "https://agent.hostman.com/api/v1/cloud-ai/agents/{{agent_id}}/v1/chat/completions"
payload = {
"model": "gpt.1",
"messages": [
{
"role": "user",
"content": "Hello!"
}
],
"temperature": 1,
"max_tokens": 100,
"stream": False
}
headers = {
"authorization": "Bearer {{token}}",
"content-type": "application/json"
}
response = requests.post(url, json=payload, headers=headers)
print(response.json())
Node.js:
const request = require('request');
const options = {
method: 'POST',
url: 'https://agent.hostman.com/api/v1/cloud-ai/agents/{{agent_id}}/v1/chat/completions',
headers: {authorization: 'Bearer {{token}}', 'content-type': 'application/json'},
body: {
model: 'gpt.1',
messages: [{role: 'user', content: 'Hello!'}],
temperature: 1,
max_tokens: 100,
stream: false
},
json: true
};
request(options, function (error, response, body) {
if (error) throw new Error(error);
console.log(body);
});
Parameters:
model: optional, ignored (for compatibility)messages: array of messages:
role: sender role (user, assistant, system)content: message texttemperature: creativity of the responsemax_tokens: response length limitstream: stream output (true/false)For GPT-5 models,
max_tokensis replaced withmax_completion_tokens, and usingtemperaturewill trigger an error.
Additional parameters may vary depending on the model. When constructing a request, refer to the parameters available in the control panel for the selected model: if a parameter is present in the panel, it is supported when accessed through the API.
Example Response:
{
"id": "fc8cd652-af12-4a89-8ef7-490a0526e8d3",
"object": "chat.completion",
"created": 1757601532,
"model": "gpt-4.1",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you? 😊"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 8,
"total_tokens": 18
}
}
POST /api/v1/cloud-ai/agents/{{agent_id}}/v1/completions
cURL:
curl --request POST \
--url https://agent.hostman.com/api/v1/cloud-ai/agents/{{agent_id}}/v1/completions \
--header 'authorization: Bearer $TOKEN' \
--header 'content-type: application/json' \
--data '{ "prompt": "Hello!", "model": "gpt-4.1", "max_tokens": 100, "temperature": 0.7, "top_p": 0.9, "n": 1, "stream": false, "logprobs": null, "echo": false, "stop": [ "\n" ], "presence_penalty": 0, "frequency_penalty": 0, "best_of": 1, "user": "hostman" }'
Python:
import requests
url = "https://agent.hostman.com/api/v1/cloud-ai/agents/{{agent_id}}/v1/completions"
payload = {
"prompt": "Hello!",
"model": "4.1",
"max_tokens": 100,
"temperature": 0.7,
"top_p": 0.9,
"n": 1,
"stream": False,
"logprobs": None,
"echo": False,
"stop": [" "],
"presence_penalty": 0,
"frequency_penalty": 0,
"best_of": 1,
"user": "hostman"
}
headers = {
"authorization": "Bearer {{token}}",
"content-type": "application/json"
}
response = requests.post(url, json=payload, headers=headers)
print(response.json())
Node.js:
const request = require('request');
const options = {
method: 'POST',
url: 'https://agent.hostman.com/api/v1/cloud-ai/agents/{{agent_id}}/v1/completions',
headers: {authorization: 'Bearer {{token}}', 'content-type': 'application/json'},
body: {
prompt: 'Hello!',
model: 'gpt-4.1',
max_tokens: 100,
temperature: 0.7,
top_p: 0.9,
n: 1,
stream: false,
logprobs: null,
echo: false,
stop: ['\n'],
presence_penalty: 0,
frequency_penalty: 0,
best_of: 1,
user: 'hostman'
},
json: true
};
request(options, function (error, response, body) {
if (error) throw new Error(error);
console.log(body);
});
Parameters:
prompt: text of the requestmodel: ignored, included for compatibilitymax_tokens: response length limittemperature: creativity leveltop_p: probability samplingn: number of responses (ignored)stream: stream outputOther parameters (logprobs, echo, stop, presence_penalty, frequency_penalty, best_of, user) are partially supported, mainly for compatibility.
Example Response:
{
"id": "7f967459-428c-46ad-87f0-213ff951024d",
"object": "text_completion",
"created": 1757601892,
"model": "gpt-4.1",
"choices": [
{
"text": "Hello! How can I help you? 😊",
"index": 0,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 10,
"completion_tokens": 8,
"total_tokens": 18
},
"response_id": "ea9aa124-0c51-467c-9ab6-218ab4ec65e7"
}
In Chat Completions, each message includes a role. There are three roles:
user: user requestassistant: AI responsesystem: system promptThe user role is used to send regular user requests to the AI.
Example:
{
"role": "user",
"content": "What is 2+5?"
}
This role indicates the AI’s reply to a previous message.
Important: In OpenAI-compatible API, dialogue history must be included in every request. Include both
userrequests andassistantresponses to maintain context.
Example:
curl --request POST \
--url https://agent.hostman.com/api/v1/cloud-ai/agents/{{agent_id}}/v1/chat/completions \
--header 'authorization: Bearer $TOKEN' \
--header 'content-type: application/json' \
--data '{
"model": "gpt-4",
"messages": [
{ "role": "user", "content": "What is 2+5? Provide only the answer, no formatting." },
{ "role": "assistant", "content": "7" },
{ "role": "user", "content": "Now multiply the result by 2. Provide only the answer, no formatting." }
]
}'
The request passes the previous question and answer, marking messages as "role": "user" or "role": "assistant". The last user message is left without a response; this is what the model will generate a new answer for.
Example Response:
{
"index": 0,
"message": {
"role": "assistant",
"content": "14",
"refusal": null,
"annotations": []
},
"finish_reason": "stop"
}
The system role defines the agent’s behavior: style, tone, constraints, and goals. It is usually the first message in the messages array. See the article on system prompts for details.
Looking at the example of using the assistant role, you can notice that the instruction is repeated in the user requests:
Provide only the answer, no formatting
This duplication can be avoided by specifying the instruction once in the system prompt.
Example:
curl --request POST \
--url https://agent.hostman.com/api/v1/cloud-ai/agents/{{agent_id}}/v1/chat/completions \
--header 'authorization: Bearer $TOKEN' \
--header 'content-type: application/json' \
--data '{
"model": "gpt-4",
"messages": [
{ "role": "system", "content": "When answering questions, provide only the calculation result, without any formatting." },
{ "role": "user", "content": "What is 2+5?" },
{ "role": "assistant", "content": "7" },
{ "role": "user", "content": "Now multiply the result by 2" }
]
}'
Now the model will follow the instruction even if it’s not repeated in every user message.