OpenAI-compatible API

Last updated June 18, 2026

The OpenAI-compatible API provides a unified request and response structure for different AI providers. Instead of dealing with dozens of different formats, you work with a single standard that is compatible with SDKs, libraries, and UIs.

Why Unification Matters

Consider two native APIs: agent.hostman.com and a hypothetical api.somerandomai.com. Let’s see how message sending is implemented.

Even for the same task, request structures can differ significantly. For example, one API uses parent_message_id to maintain context, while another uses a nested object:

"context": {
 "thread_id": "abc123",
 "reply_to": "msg789"
}

Additionally, with our API, settings are configured via a separate request, whereas in the third-party API, they are included in the settings object with every message.

As a result, you would need to:

Write a separate client for each API
Learn each API’s field names and behavior
Test everything from scratch

The OpenAI-compatible API solves this problem. The structure is always the same: a messages array with role and content fields, plus standard parameters (model, temperature, max_tokens). Differences remain only in the URL and model name.

Benefits:

Fast integration with any software supporting the OpenAI API
Support for popular libraries (e.g., LangChain)
Works with ready-made UIs (e.g., Open WebUI)
Minimal effort when migrating from OpenAI

Limitations

The current implementation is not fully OpenAI-compatible.

Not supported:

Embeddings: the /v1/embeddings endpoint is unavailable
Fine-tuning: training and customization of models
Images API: image generation
Audio API: speech-to-text and text-to-speech conversion (except for audio messages in chat)
Files API: file upload and management
Assistants API: creation and management of assistants
Web search: searching and retrieving data from the internet

Implementation notes:

The model parameter in requests is ignored, the model defined in the agent settings is used.
Some parameters may be ignored depending on the agent’s selected model.
The prompt defined in the agent’s settings is applied to requests without explicit prompts.

Usage

Let’s look at how to use the OpenAI-compatible API.

Authentication

An API token is required for requests.

Include the token in the request header:

Authorization: Bearer $TOKEN

In cURL examples, you can:

Provide the token manually by replacing $TOKEN with your actual token in each request, or
Use an environment variable to avoid inserting it every time:

export TOKEN=your_access_token

The $TOKEN variable will then be automatically substituted.

In Python and Node.js examples, the token is represented as {{token}}. We recommend storing it in environment variables or configuration files instead of in code to prevent leaks.

Base URL

The API also requires a base URL, which can be found in the agent's Dashboard tab.

Sending Messages to the Agent

Two methods are supported: Chat Completions and Text Completions. The Text Completions method is deprecated and only maintained for backward compatibility; it is not recommended.

Chat Completions Usage Example

POST /api/v1/cloud-ai/agents/{{agent_id}}/v1/chat/completions

cURL:

curl --request POST \
  --url https://agent.hostman.com/api/v1/cloud-ai/agents/{{agent_id}}/v1/chat/completions \
  --header 'authorization: Bearer $TOKEN' \
  --header 'content-type: application/json' \
  --data '{ "model": "gpt-4.1", "messages": [ { "role": "user", "content": "Hello!" } ], "temperature": 1, "max_tokens": 100, "stream": false }'

Python:

import requests

url = "https://agent.hostman.com/api/v1/cloud-ai/agents/{{agent_id}}/v1/chat/completions"

payload = {
    "model": "gpt.1",
    "messages": [
        {
            "role": "user",
            "content": "Hello!"
        }
    ],
    "temperature": 1,
    "max_tokens": 100,
    "stream": False
}
headers = {
    "authorization": "Bearer {{token}}",
    "content-type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print(response.json())

Node.js:

const request = require('request');

const options = {
  method: 'POST',
  url: 'https://agent.hostman.com/api/v1/cloud-ai/agents/{{agent_id}}/v1/chat/completions',
  headers: {authorization: 'Bearer {{token}}', 'content-type': 'application/json'},
  body: {
    model: 'gpt.1',
    messages: [{role: 'user', content: 'Hello!'}],
    temperature: 1,
    max_tokens: 100,
    stream: false
  },
  json: true
};

request(options, function (error, response, body) {
  if (error) throw new Error(error);

  console.log(body);
});

Parameters:

model: optional, ignored (for compatibility)
messages: array of messages:
- role: sender role (user, assistant, system)
- content: message text
temperature: creativity of the response
max_tokens: response length limit
stream: stream output (true/false)

Additional parameters may vary depending on the model. When constructing a request, refer to the parameters available in the dashboard for the selected model: if a parameter is present in the panel, it is supported when accessed through the API.

Example Response:

{
  "id": "fc8cd652-af12-4a89-8ef7-490a0526e8d3",
  "object": "chat.completion",
  "created": 1757601532,
  "model": "gpt-4.1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you? 😊"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 8,
    "total_tokens": 18
  }
}

Text Completions Usage Example

POST /api/v1/cloud-ai/agents/{{agent_id}}/v1/completions

cURL:

curl --request POST \
  --url https://agent.hostman.com/api/v1/cloud-ai/agents/{{agent_id}}/v1/completions \
  --header 'authorization: Bearer $TOKEN' \
  --header 'content-type: application/json' \
  --data '{ "prompt": "Hello!", "model": "gpt-4.1", "max_tokens": 100, "temperature": 0.7, "top_p": 0.9, "n": 1, "stream": false, "logprobs": null, "echo": false, "stop": [ "\n" ], "presence_penalty": 0, "frequency_penalty": 0, "best_of": 1, "user": "hostman" }'

Python:

import requests

url = "https://agent.hostman.com/api/v1/cloud-ai/agents/{{agent_id}}/v1/completions"

payload = {
    "prompt": "Hello!",
    "model": "4.1",
    "max_tokens": 100,
    "temperature": 0.7,
    "top_p": 0.9,
    "n": 1,
    "stream": False,
    "logprobs": None,
    "echo": False,
    "stop": [" "],
    "presence_penalty": 0,
    "frequency_penalty": 0,
    "best_of": 1,
    "user": "hostman"
}
headers = {
    "authorization": "Bearer {{token}}",
    "content-type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print(response.json())

Node.js:

const request = require('request');

const options = {
  method: 'POST',
  url: 'https://agent.hostman.com/api/v1/cloud-ai/agents/{{agent_id}}/v1/completions',
  headers: {authorization: 'Bearer {{token}}', 'content-type': 'application/json'},
  body: {
    prompt: 'Hello!',
    model: 'gpt-4.1',
    max_tokens: 100,
    temperature: 0.7,
    top_p: 0.9,
    n: 1,
    stream: false,
    logprobs: null,
    echo: false,
    stop: ['\n'],
    presence_penalty: 0,
    frequency_penalty: 0,
    best_of: 1,
    user: 'hostman'
  },
  json: true
};

request(options, function (error, response, body) {
  if (error) throw new Error(error);

  console.log(body);
});

Parameters:

prompt: text of the request
model: ignored, included for compatibility
max_tokens: response length limit
temperature: creativity level
top_p: probability sampling
n: number of responses (ignored)
stream: stream output

Other parameters (logprobs, echo, stop, presence_penalty, frequency_penalty, best_of, user) are partially supported, mainly for compatibility.

Example Response:

{
  "id": "7f967459-428c-46ad-87f0-213ff951024d",
  "object": "text_completion",
  "created": 1757601892,
  "model": "gpt-4.1",
  "choices": [
    {
      "text": "Hello! How can I help you? 😊",
      "index": 0,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 8,
    "total_tokens": 18
  },
  "response_id": "ea9aa124-0c51-467c-9ab6-218ab4ec65e7"
}

Roles

In Chat Completions, each message includes a role. There are three roles:

user: user request
assistant: AI response
system: system prompt

user

The user role is used to send regular user requests to the AI.

Example:

 {
  "role": "user",
  "content": "What is 2+5?"
}

assistant

This role indicates the AI’s reply to a previous message.

Example:

curl --request POST \
  --url https://agent.hostman.com/api/v1/cloud-ai/agents/{{agent_id}}/v1/chat/completions \
  --header 'authorization: Bearer $TOKEN' \
  --header 'content-type: application/json' \
  --data '{
    "model": "gpt-4",
    "messages": [
      { "role": "user", "content": "What is 2+5? Provide only the answer, no formatting." },
      { "role": "assistant", "content": "7" },
      { "role": "user", "content": "Now multiply the result by 2. Provide only the answer, no formatting." }
    ]
  }'

The request passes the previous question and answer, marking messages as "role": "user" or "role": "assistant". The last user message is left without a response; this is what the model will generate a new answer for.

Example Response:

{
  "index": 0,
  "message": {
    "role": "assistant",
    "content": "14",
    "refusal": null,
    "annotations": []
  },
  "finish_reason": "stop"
}

system

The system role defines the agent’s behavior: style, tone, constraints, and goals. It is usually the first message in the messages array. See the article on system prompts for details.

Looking at the example of using the assistant role, you can notice that the instruction is repeated in the user requests:

Provide only the answer, no formatting

This duplication can be avoided by specifying the instruction once in the system prompt.

Example:

curl --request POST \
  --url https://agent.hostman.com/api/v1/cloud-ai/agents/{{agent_id}}/v1/chat/completions \
  --header 'authorization: Bearer $TOKEN' \
  --header 'content-type: application/json' \
  --data '{
    "model": "gpt-4",
    "messages": [
      { "role": "system", "content": "When answering questions, provide only the calculation result, without any formatting." },
      { "role": "user", "content": "What is 2+5?" },
      { "role": "assistant", "content": "7" },
      { "role": "user", "content": "Now multiply the result by 2" }
    ]
  }'

Now the model will follow the instruction even if it’s not repeated in every user message.

Why Unification Matters Copy link

Limitations Copy link

Usage Copy link

Authentication Copy link

Base URL Copy link

Sending Messages to the Agent Copy link

Chat Completions Usage Example

Text Completions Usage Example

Roles Copy link

user Copy link

assistant Copy link

system Copy link

Why Unification Matters

Limitations

Usage

Authentication

Base URL

Sending Messages to the Agent

Roles

user

assistant

system