Sign In
Sign In

OpenAI-сompatible API

Updated on 20 February 2026

The OpenAI-compatible API provides a unified request and response structure for different AI providers. Instead of dealing with dozens of different formats, you work with a single standard that is compatible with SDKs, libraries, and UIs.

Why Unification Matters

Consider two native APIs: agent.hostman.com and a hypothetical api.somerandomai.com. Let’s see how message sending is implemented.

1

Even for the same task, request structures can differ significantly. For example, one API uses parent_message_id to maintain context, while another uses a nested object:

"context": {
 "thread_id": "abc123",
 "reply_to": "msg789"
}

Additionally, with our API, settings are configured via a separate request, whereas in the third-party API, they are included in the settings object with every message.

As a result, you would need to:

  • Write a separate client for each API
  • Learn each API’s field names and behavior
  • Test everything from scratch

The OpenAI-compatible API solves this problem. The structure is always the same: a messages array with role and content fields, plus standard parameters (model, temperature, max_tokens). Differences remain only in the URL and model name.

2

Benefits:

  • Fast integration with any software supporting the OpenAI API
  • Support for popular libraries (e.g., LangChain)
  • Works with ready-made UIs (e.g., Open WebUI)
  • Minimal effort when migrating from OpenAI

Limitations

The current implementation is not fully OpenAI-compatible.

Not supported:

  • Embeddings: the /v1/embeddings endpoint is unavailable
  • Fine-tuning: training and customization of models
  • Images API: image generation
  • Audio API: speech-to-text and text-to-speech conversion (except for audio messages in chat)
  • Files API: file upload and management
  • Assistants API: creation and management of assistants
  • Web search: searching and retrieving data from the internet

Implementation notes:

  • The model parameter in requests is ignored, the model defined in the agent settings is used.
  • Some parameters may be ignored depending on the agent’s selected model.
  • The prompt defined in the agent’s settings is applied to requests without explicit prompts.

Usage

Let’s look at how to use the OpenAI-compatible API.

You can find all available API methods in the documentation.

Authentication

Regardless of the API type (private or public), an API token is required for requests.

Include the token in the request header:

Authorization: Bearer $TOKEN

In cURL examples, you can:

  • Provide the token manually by replacing $TOKEN with your actual token in each request, or
  • Use an environment variable to avoid inserting it every time:
export TOKEN=your_access_token

The $TOKEN variable will then be automatically substituted.

In Python and Node.js examples, the token is represented as {{token}}. We recommend storing it in environment variables or configuration files instead of in code to prevent leaks.

Base URL

The API also requires a base URL, which can be found in the Dashboard tab of the agent’s control panel.

9b7115d2 3895 4087 9d62 Fa964eed8d19

Sending Messages to the Agent

Two methods are supported: Chat Completions and Text Completions. The Text Completions method is deprecated and only maintained for backward compatibility; it is not recommended.

Chat Completions Usage Example

POST /api/v1/cloud-ai/agents/{{agent_id}}/v1/chat/completions

cURL:

curl --request POST \
  --url https://agent.hostman.com/api/v1/cloud-ai/agents/{{agent_id}}/v1/chat/completions \
  --header 'authorization: Bearer $TOKEN' \
  --header 'content-type: application/json' \
  --data '{ "model": "gpt-4.1", "messages": [ { "role": "user", "content": "Hello!" } ], "temperature": 1, "max_tokens": 100, "stream": false }'

Python:

import requests

url = "https://agent.hostman.com/api/v1/cloud-ai/agents/{{agent_id}}/v1/chat/completions"

payload = {
    "model": "gpt.1",
    "messages": [
        {
            "role": "user",
            "content": "Hello!"
        }
    ],
    "temperature": 1,
    "max_tokens": 100,
    "stream": False
}
headers = {
    "authorization": "Bearer {{token}}",
    "content-type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print(response.json())

Node.js:

const request = require('request');

const options = {
  method: 'POST',
  url: 'https://agent.hostman.com/api/v1/cloud-ai/agents/{{agent_id}}/v1/chat/completions',
  headers: {authorization: 'Bearer {{token}}', 'content-type': 'application/json'},
  body: {
    model: 'gpt.1',
    messages: [{role: 'user', content: 'Hello!'}],
    temperature: 1,
    max_tokens: 100,
    stream: false
  },
  json: true
};

request(options, function (error, response, body) {
  if (error) throw new Error(error);

  console.log(body);
});

Parameters:

  • model: optional, ignored (for compatibility)
  • messages: array of messages:
    • role: sender role (user, assistant, system)
    • content: message text
  • temperature: creativity of the response
  • max_tokens: response length limit
  • stream: stream output (true/false)

For GPT-5 models, max_tokens is replaced with max_completion_tokens, and using temperature will trigger an error.

Additional parameters may vary depending on the model. When constructing a request, refer to the parameters available in the control panel for the selected model: if a parameter is present in the panel, it is supported when accessed through the API.

Example Response:

{
  "id": "fc8cd652-af12-4a89-8ef7-490a0526e8d3",
  "object": "chat.completion",
  "created": 1757601532,
  "model": "gpt-4.1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you? 😊"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 8,
    "total_tokens": 18
  }
}

Text Completions Usage Example

POST /api/v1/cloud-ai/agents/{{agent_id}}/v1/completions

cURL:

curl --request POST \
  --url https://agent.hostman.com/api/v1/cloud-ai/agents/{{agent_id}}/v1/completions \
  --header 'authorization: Bearer $TOKEN' \
  --header 'content-type: application/json' \
  --data '{ "prompt": "Hello!", "model": "gpt-4.1", "max_tokens": 100, "temperature": 0.7, "top_p": 0.9, "n": 1, "stream": false, "logprobs": null, "echo": false, "stop": [ "\n" ], "presence_penalty": 0, "frequency_penalty": 0, "best_of": 1, "user": "hostman" }'

Python:

import requests

url = "https://agent.hostman.com/api/v1/cloud-ai/agents/{{agent_id}}/v1/completions"

payload = {
    "prompt": "Hello!",
    "model": "4.1",
    "max_tokens": 100,
    "temperature": 0.7,
    "top_p": 0.9,
    "n": 1,
    "stream": False,
    "logprobs": None,
    "echo": False,
    "stop": [" "],
    "presence_penalty": 0,
    "frequency_penalty": 0,
    "best_of": 1,
    "user": "hostman"
}
headers = {
    "authorization": "Bearer {{token}}",
    "content-type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print(response.json())

Node.js:

const request = require('request');

const options = {
  method: 'POST',
  url: 'https://agent.hostman.com/api/v1/cloud-ai/agents/{{agent_id}}/v1/completions',
  headers: {authorization: 'Bearer {{token}}', 'content-type': 'application/json'},
  body: {
    prompt: 'Hello!',
    model: 'gpt-4.1',
    max_tokens: 100,
    temperature: 0.7,
    top_p: 0.9,
    n: 1,
    stream: false,
    logprobs: null,
    echo: false,
    stop: ['\n'],
    presence_penalty: 0,
    frequency_penalty: 0,
    best_of: 1,
    user: 'hostman'
  },
  json: true
};

request(options, function (error, response, body) {
  if (error) throw new Error(error);

  console.log(body);
});

Parameters:

  • prompt: text of the request
  • model: ignored, included for compatibility
  • max_tokens: response length limit
  • temperature: creativity level
  • top_p: probability sampling
  • n: number of responses (ignored)
  • stream: stream output

Other parameters (logprobs, echo, stop, presence_penalty, frequency_penalty, best_of, user) are partially supported, mainly for compatibility.

Example Response:

{
  "id": "7f967459-428c-46ad-87f0-213ff951024d",
  "object": "text_completion",
  "created": 1757601892,
  "model": "gpt-4.1",
  "choices": [
    {
      "text": "Hello! How can I help you? 😊",
      "index": 0,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 8,
    "total_tokens": 18
  },
  "response_id": "ea9aa124-0c51-467c-9ab6-218ab4ec65e7"
}

Roles

In Chat Completions, each message includes a role. There are three roles:

  • user: user request
  • assistant: AI response
  • system: system prompt

user

The user role is used to send regular user requests to the AI.

Example:

 {
  "role": "user",
  "content": "What is 2+5?"
}

assistant

This role indicates the AI’s reply to a previous message.

Important: In OpenAI-compatible API, dialogue history must be included in every request. Include both user requests and assistant responses to maintain context.

Example:

curl --request POST \
  --url https://agent.hostman.com/api/v1/cloud-ai/agents/{{agent_id}}/v1/chat/completions \
  --header 'authorization: Bearer $TOKEN' \
  --header 'content-type: application/json' \
  --data '{
    "model": "gpt-4",
    "messages": [
      { "role": "user", "content": "What is 2+5? Provide only the answer, no formatting." },
      { "role": "assistant", "content": "7" },
      { "role": "user", "content": "Now multiply the result by 2. Provide only the answer, no formatting." }
    ]
  }'

The request passes the previous question and answer, marking messages as "role": "user" or "role": "assistant". The last user message is left without a response; this is what the model will generate a new answer for.

Example Response:

{
  "index": 0,
  "message": {
    "role": "assistant",
    "content": "14",
    "refusal": null,
    "annotations": []
  },
  "finish_reason": "stop"
}

system

The system role defines the agent’s behavior: style, tone, constraints, and goals. It is usually the first message in the messages array. See the article on system prompts for details.

Looking at the example of using the assistant role, you can notice that the instruction is repeated in the user requests:

Provide only the answer, no formatting

This duplication can be avoided by specifying the instruction once in the system prompt.

Example:

curl --request POST \
  --url https://agent.hostman.com/api/v1/cloud-ai/agents/{{agent_id}}/v1/chat/completions \
  --header 'authorization: Bearer $TOKEN' \
  --header 'content-type: application/json' \
  --data '{
    "model": "gpt-4",
    "messages": [
      { "role": "system", "content": "When answering questions, provide only the calculation result, without any formatting." },
      { "role": "user", "content": "What is 2+5?" },
      { "role": "assistant", "content": "7" },
      { "role": "user", "content": "Now multiply the result by 2" }
    ]
  }'

Now the model will follow the instruction even if it’s not repeated in every user message.

Was this page helpful?
Updated on 20 February 2026

Do you have questions,
comments, or concerns?

Our professionals are available to assist you at any moment,
whether you need help or are just unsure of where to start.
Email us
Hostman's Support