Build with Claude
at a predictable cost.

AliCode is a drop-in Claude API gateway: the same wire format as Anthropic, a materially lower cost, and no hard rate limits. Change one base URL and keep building — no code changes required.

Point any tool that speaks the Anthropic Messages API or the OpenAI Chat Completions API at https://api.alicode.store and use a key that starts with sk-ali-. There is nothing new to learn: the wire format is identical to api.anthropic.com across every endpoint we serve, down to the byte level of SSE events.

AliCode exists to make access to the Claude API economically predictable. The official list price is built for enterprise budgets; we consolidate demand across many teams and developers, contract for capacity at volume, and pass the resulting savings back through a transparent, pay-as-you-go model billed strictly on what you consume.

What you get

  • Same wire format as Anthropic. /v1/messages, SSE events, content blocks, tool_use, vision, prompt caching — identical bytes. If your code works against the official API, it works here with one env var change.
  • Both APIs in one endpoint. The OpenAI Chat Completions surface is also served, so Cursor / Continue / OpenWebUI / any OpenAI SDK script just needs a base URL swap. No translation glue on your side.
  • Files, batches, count_tokens. Full feature parity for SDK helpers — your existing client code doesn't need a single conditional. The Anthropic SDK auto-discovers these and uses them as if it were talking to the official endpoint.
  • Pay-as-you-go credits. Top up from 300 ₽. Credits never expire. No subscription, no monthly minimum, no commitment — spend stops the moment you stop calling the API.
  • No per-minute rate limits. Access is not gated by tier. The only ceiling is abuse protection at the IP layer (1000 req/min) — orders of magnitude above what any legitimate agent loop requires.
  • Same security envelope. All traffic over TLS 1.3, scoped API keys with bcrypt-hashed storage, per-key spend caps and model whitelists, instant revocation, no prompt or response body logged.

Who it's for

AliCode is built for developers who use AI as a daily working tool rather than as a feature inside a SaaS product. If you work continuously in Cline, Cursor, Claude Code, Continue or Aider — or run Python workloads that call the API intensively for embeddings, classification, agent loops and batch summarisation — this is for you. Pricing is structured so that heavy use stays affordable: a Cursor user generating 200K Sonnet tokens a day pays roughly $5 a month instead of $60.

AliCode is not a replacement for enterprise Anthropic contracts. If you require a custom DPA, SOC2 documentation, dedicated capacity, or a formal SLA above 99.5%, please contact Anthropic directly. We provide resale of their capacity at consumer scale.

Available models

ModelContextMax outputBest for
claude-haiku-4-5 200K64KAgent loops, fast classification, cheap completions
claude-sonnet-4-5 200K64KDefault — daily coding, vision, tool use
claude-sonnet-4-6 200K64KLatest Sonnet — better instruction following
claude-opus-4-5 200K64KHardest reasoning, extended thinking
claude-opus-4-7 200K128KTop tier — agents, deep refactors, multi-step planning

Earlier model IDs (claude-3-7-sonnet-latest, claude-3-5-haiku-latest, claude-opus-4-1) are also accepted as aliases.

Quickstart

Five minutes from zero to first request.

  1. Register an account (email + password).
  2. Open /dashboard/keysCreate new key → copy it. It starts with sk-ali-….
  3. Make your first request:
curl https://api.alicode.store/v1/messages \
  -H 'x-api-key: $ALICODE_KEY' \
  -H 'anthropic-version: 2023-06-01' \
  -H 'content-type: application/json' \
  -d '{
    "model": "claude-sonnet-4-5",
    "max_tokens": 256,
    "messages": [{"role":"user","content":"hi"}]
  }'
import os, anthropic

client = anthropic.Anthropic(
    api_key=os.environ["ALICODE_KEY"],
    base_url="https://api.alicode.store",
)

msg = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=256,
    messages=[{"role":"user","content":"hi"}],
)
print(msg.content[0].text)
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  apiKey: process.env.ALICODE_KEY,
  baseURL: "https://api.alicode.store",
});

const msg = await client.messages.create({
  model: "claude-sonnet-4-5",
  max_tokens: 256,
  messages: [{ role: "user", content: "hi" }],
});
console.log(msg.content[0].text);

That's it. Your existing Anthropic SDK code works unchanged.

Migrate from Anthropic

One environment variable, no code changes. Replace api.anthropic.com with api.alicode.store; replace your sk-ant-… key with sk-ali-…

Bash / Docker

bash
# before
export ANTHROPIC_API_KEY=sk-ant-…
export ANTHROPIC_BASE_URL=https://api.anthropic.com

# after
export ANTHROPIC_API_KEY=sk-ali-…
export ANTHROPIC_BASE_URL=https://api.alicode.store

Python SDK

python
# only this line changes
client = anthropic.Anthropic(
    api_key=os.environ['ALICODE_KEY'],
    base_url='https://api.alicode.store',
)

Node SDK

javascript
const client = new Anthropic({
  apiKey: process.env.ALICODE_KEY,
  baseURL: 'https://api.alicode.store',
});
We honour every Anthropic-side feature flag we support — anthropic-version, anthropic-beta (prompt caching, files API, message batches) all pass through. No fork of your client code needed.

Feature parity matrix

FeatureAnthropicAliCodeNotes
Messages APIByte-for-byte compatible response shape
Streaming SSESame event names & ordering
Tool useIncluding parallel tool calls
Visionbase64 / url / file_id all supported
Prompt cachingcache_control: ephemeral
Extended thinkingthinking field on Opus
Files APIPersistent uploads for vision/docs
Message BatchesJSONL results, 50% pricing
Count tokensBPE-accurate (±3-5%)
CitationsComing Q3

Cursor

Cursor uses an OpenAI-compatible base URL. Override it to ours.

  1. File → Preferences → Cursor Settings → Models
  2. Scroll to API Keys → enable OpenAI API Key.
  3. Paste your sk-ali-… key.
  4. Enable Override OpenAI Base URL and paste https://api.alicode.store/v1
  5. Click Verify. Then in Add or search model, add claude-sonnet-4-5 (or claude-haiku-4-5 / claude-opus-4-7).
Cursor on the free plan restricts custom-named models. If you hit "Named models unavailable", try a known name like gpt-4o — our gateway aliases it to the right model too.

Cline

Cline is a free agentic VS Code extension. It speaks Anthropic natively.

  1. VS Code → Extensions → search Cline → install.
  2. Open the Cline panel (left sidebar) → Settings (top-right).
  3. API Provider: Anthropic.
  4. API Key: sk-ali-…
  5. Use custom base URL: ✓ → https://api.alicode.store
  6. Model ID: claude-sonnet-4-5

Save and start a new task. Cline will use all its tools (Read, Edit, Bash, etc.) over our gateway, including streamed tool_use and vision blocks.

Recommended model picks

  • Default coding: claude-sonnet-4-5 — fastest acceptable quality, what 90% of users settle on.
  • Cheap exploratory work: claude-haiku-4-5 — great for "explain this file" / "find the bug" loops; about 4× cheaper than Sonnet.
  • Hardest refactors: claude-opus-4-7 — when Sonnet keeps making the same mistake, switch to Opus for one turn and switch back.

Tips

  • Vision: drag-and-drop images into Cline's input — they'll be sent as image content blocks. Our gateway routes them to a captioning provider behind the scenes, then hands the caption to Claude for reasoning. You don't see any of that — just the answer.
  • Web search: declare nothing extra — we auto-inject the web_search tool when Cline asks for it via execute_command + curl. Saves your local sandbox from being torn down by failed network calls.
  • Long sessions: Cline's auto-compaction works perfectly — we honour the context_management.compact_* capabilities on every model.

Claude Code (Anthropic CLI)

Two environment variables, one command.

bash
export ANTHROPIC_BASE_URL=https://api.alicode.store
export ANTHROPIC_API_KEY=sk-ali-…
claude

Add the two exports to your shell rc-file (~/.zshrc / ~/.bashrc) to make them permanent. On Windows: setx ANTHROPIC_BASE_URL ...

Claude Code automatically uses prompt caching and extended thinking via anthropic-beta headers. Both are honoured by our gateway, so your cache hits and thinking blocks behave identically.

Continue.dev

Edit your Continue config (gear icon → Open config.yaml):

yaml
name: AliCode
models:
  - name: AliCode Sonnet
    provider: openai
    model: claude-sonnet-4-5
    apiKey: sk-ali-…
    apiBase: https://api.alicode.store/v1
    roles: [chat, edit, apply]

Roo Code

Roo is a Cline fork with multi-mode (Code / Architect / Ask). Same setup as Cline:

  • Provider: Anthropic
  • Base URL: https://api.alicode.store
  • API Key: sk-ali-…
  • Model ID: claude-sonnet-4-5

Python & Node SDKs

Python (anthropic)

bash
pip install anthropic
python
from anthropic import Anthropic
client = Anthropic(api_key='sk-ali-…', base_url='https://api.alicode.store')

# Non-streaming
msg = client.messages.create(
    model='claude-sonnet-4-5',
    max_tokens=512,
    messages=[{'role': 'user', 'content': 'Hello'}],
)
print(msg.content[0].text)

# Streaming
with client.messages.stream(
    model='claude-sonnet-4-5',
    max_tokens=512,
    messages=[{'role': 'user', 'content': 'Tell me a joke'}],
) as stream:
    for text in stream.text_stream:
        print(text, end='', flush=True)

Node (@anthropic-ai/sdk)

bash
npm install @anthropic-ai/sdk
javascript
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic({ apiKey: 'sk-ali-…', baseURL: 'https://api.alicode.store' });

const stream = await client.messages.stream({
  model: 'claude-sonnet-4-5',
  max_tokens: 512,
  messages: [{ role: 'user', content: 'Hi' }],
});
for await (const ev of stream) {
  if (ev.type === 'content_block_delta' && ev.delta.type === 'text_delta') {
    process.stdout.write(ev.delta.text);
  }
}

OpenAI SDK (for /v1/chat/completions)

python
from openai import OpenAI
client = OpenAI(api_key='sk-ali-…', base_url='https://api.alicode.store/v1')

resp = client.chat.completions.create(
    model='claude-sonnet-4-5',
    messages=[{'role': 'user', 'content': 'Hi'}],
)
print(resp.choices[0].message.content)

Authentication

All /v1/* endpoints require an API key. We accept it on either header — pick whichever your SDK already uses:

HeaderFormatUsed by
x-api-key sk-ali-… Anthropic SDK, Cline, Claude Code
AuthorizationBearer sk-ali-… OpenAI SDK, Cursor, Continue

Creating keys

  1. Go to /dashboard/keys.
  2. Click Create new key, give it a name (e.g. "production").
  3. Copy the secret — it's shown once. We store only a hash.

Per-key restrictions

Each key supports optional per-key limits:

  • Rate limit (RPM) — requests per minute, defaults to your account's tier.
  • Spend cap (cents) — auto-revoke after spending N cents on this key.
  • Allowed models — whitelist; requests using other models get 403.

Revoke keys instantly from the dashboard. Revoked keys 401 within seconds across all servers.

Key rotation

Issue a new key, deploy it, then revoke the old one. No downtime — both work in parallel until you revoke. We recommend rotating production keys every 90 days as a baseline.

Security model

We store only a bcrypt hash of your secret — the plaintext is shown once at creation and never again. If you lose it, issue a new key and revoke the old one; we cannot recover the lost secret. Every request is also tied to the IP address that made it (visible in your dashboard), so you can quickly audit unexpected usage.

Where to put keys

  • Local development: environment variable in your shell rc-file (~/.zshrc / ~/.bashrc), never committed to git.
  • CI/CD: encrypted secret in GitHub Actions / GitLab / Vercel / your platform's secret store.
  • Servers: /etc/environment or systemd EnvironmentFile=, root-readable only.
  • Never embed in client-side JS, mobile apps, or any place a user can extract bytes from disk.

If a key leaks, revoke it from the dashboard. Within seconds it returns 401 across every server we run.

POST/v1/messages

Anthropic-compatible Messages endpoint. Identical to api.anthropic.com/v1/messages.

Request body

FieldType ReqNotes
model string yesclaude-sonnet-4-5, -haiku-4-5, -opus-4-8
messages array yesAnthropic-shape: role + content blocks
max_tokens int yesOutput cap. Auto-boosted to the model ceiling.
system string|array no System prompt; pass an array of blocks for cache_control breakpoints
stream bool no Enable SSE (see Streaming)
tools array no Tool schemas, incl. server-side web_search_20250305
tool_choice object no auto / any / tool / none
temperature float no Range 0–1. A sensible default is applied when omitted (lower for tool/agent calls).
top_p float no Nucleus sampling, prefer temperature
top_k int no Sample from top K candidates
stop_sequences array no Up to 4 strings that stop generation early
thinking object no {"type":"enabled","budget_tokens":4096} (Opus & Sonnet 4.5+)
metadata object no {"user_id":"…"} for your own analytics

Response shape

json
{
  "id": "msg_011…",
  "type": "message",
  "role": "assistant",
  "model": "claude-sonnet-4-5",
  "content": [
    { "type": "text", "text": "Closures in PHP capture variables…" }
  ],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 42,
    "cache_creation_input_tokens": 0,
    "cache_read_input_tokens": 0,
    "output_tokens": 128
  }
}

stop_reason values

ValueMeaning
end_turn Model finished naturally
max_tokens Hit your max_tokens cap mid-generation
stop_sequence Output matched one of your stop_sequences
tool_use Model wants to call a tool — execute it and continue the conversation
pause_turn Long-running tool loop paused — resume by sending the same messages back

POST/v1/messages/count_tokens

Counts tokens for a planned request without invoking the model. Used by the SDK for cost estimation, context-window planning, and rate-limit pre-checks.

Same request body as /v1/messages. Estimation uses cl100k_base BPE (closest open approximation to Claude's tokenizer) — typical accuracy ±3-5% on prose, ±8-10% on dense code/JSON.

bash
curl https://api.alicode.store/v1/messages/count_tokens \
  -H 'x-api-key: sk-ali-…' \
  -H 'content-type: application/json' \
  -d '{
    "model": "claude-sonnet-4-5",
    "system": "You are Claude.",
    "messages": [{"role":"user","content":"Hello world"}]
  }'
json
{ "input_tokens": 14 }

POST/v1/chat/completions

OpenAI-compatible. Used by Cursor (via base URL override), Continue, OpenWebUI, your custom scripts written against the OpenAI SDK.

We translate OpenAI-shape requests to Anthropic internally and the response back to OpenAI shape, including streaming chunks, tool_calls, vision image_url blocks, and finish_reason.

bash
curl https://api.alicode.store/v1/chat/completions \
  -H 'Authorization: Bearer sk-ali-…' \
  -H 'content-type: application/json' \
  -d '{
    "model": "claude-sonnet-4-5",
    "messages": [{"role":"user","content":"ping"}]
  }'

finish_reason mapping

Anthropic stop_reasonOpenAI finish_reason
end_turn stop
max_tokens length
stop_sequence stop
tool_use tool_calls

GET/v1/models

Returns the list of public model IDs. Two shapes served from the same path depending on the Accept header:

  • OpenAI shape{ object: 'list', data: [{ id, object, owned_by, … }] }
  • Anthropic shape (GET /v1/models/{id} too) — full capabilities tree with image_input, thinking.adaptive, effort.max, prompt_caching, context_management.
bash
curl https://api.alicode.store/v1/models \
  -H 'Authorization: Bearer sk-ali-…'

# specific model
curl https://api.alicode.store/v1/models/claude-sonnet-4-5 \
  -H 'Authorization: Bearer sk-ali-…'

/v1/files

Persistent upload for images, PDFs, and text dumps. Once uploaded, you reference the file by id in messages content instead of sending base64 every turn — saves token cost and bandwidth across long conversations and batches.

Beta header: anthropic-beta: files-api-2025-04-14 (accepted but not required).

POST/v1/files — upload

bash
curl https://api.alicode.store/v1/files \
  -H 'x-api-key: sk-ali-…' \
  -F 'file=@/path/to/image.png'
json
{
  "id": "file_011…",
  "type": "file",
  "filename": "image.png",
  "mime_type": "image/png",
  "size_bytes": 184320,
  "created_at": "2026-05-17T15:23:39Z",
  "downloadable": true
}

Use the file in a message

json
{
  "model": "claude-sonnet-4-5",
  "max_tokens": 512,
  "messages": [{
    "role": "user",
    "content": [
      { "type": "image", "source": { "type": "file", "file_id": "file_011…" } },
      { "type": "text",  "text": "What is in this image?" }
    ]
  }]
}

Other endpoints

Method & pathPurpose
GET /v1/files List, paginated via before_id / after_id / limit
GET /v1/files/{id} Metadata
GET /v1/files/{id}/content Raw bytes download (streamed)
DELETE /v1/files/{id} Soft-delete; 24h grace period before purge

Limits

  • Max upload: 25 MB per file.
  • Accepted mime types: PNG, JPEG, WEBP, GIF, PDF, plain text, markdown, CSV.
  • SHA-256 dedup: re-uploading identical bytes returns the existing file id, no new storage.

/v1/messages/batches

Process up to 10,000 messages in one async job at 50% of normal pricing. Submit, poll, fetch results — all asynchronous.

Beta header: anthropic-beta: message-batches-2024-09-24 (accepted but not required).

POST/v1/messages/batches — submit

bash
curl https://api.alicode.store/v1/messages/batches \
  -H 'x-api-key: sk-ali-…' \
  -H 'content-type: application/json' \
  -d '{
    "requests": [
      { "custom_id": "q-1", "params": { "model":"claude-sonnet-4-5", "max_tokens":100, "messages":[{"role":"user","content":"2+2"}] }},
      { "custom_id": "q-2", "params": { "model":"claude-sonnet-4-5", "max_tokens":100, "messages":[{"role":"user","content":"capital of france"}] }}
    ]
  }'
json
{
  "id": "msgbatch_011…",
  "type": "message_batch",
  "processing_status": "in_progress",
  "request_counts": {"processing":2,"succeeded":0,"errored":0,"canceled":0,"expired":0},
  "created_at": "2026-05-18T15:25:50Z",
  "expires_at": "2026-05-19T15:25:50Z",
  "results_url": null
}

Poll until ended

bash
# Poll every few seconds
curl https://api.alicode.store/v1/messages/batches/msgbatch_011… \
  -H 'x-api-key: sk-ali-…'

# When processing_status == 'ended', fetch JSONL results
curl https://api.alicode.store/v1/messages/batches/msgbatch_011…/results \
  -H 'x-api-key: sk-ali-…'
json
{"custom_id":"q-1","result":{"type":"succeeded","message":{ … }}}
{"custom_id":"q-2","result":{"type":"succeeded","message":{ … }}}

All endpoints

Method & pathPurpose
POST /v1/messages/batches Create batch (up to 10K requests)
GET /v1/messages/batches List, cursor-paginated
GET /v1/messages/batches/{id} Get status & counters
POST /v1/messages/batches/{id}/cancel Mark canceling; remaining items skipped
GET /v1/messages/batches/{id}/results Streamed JSONL output
DELETE /v1/messages/batches/{id} Delete batch & purge results (must be ended)

Validation

  • custom_id is required and must be unique within the batch.
  • Each params object is a full /v1/messages body (no streaming — batches always return finished messages).
  • Batches expire 24h after creation; remaining items become expired.

Streaming SSE

Set stream: true. The server returns Anthropic-format Server-Sent Events. Events arrive in this order:

  1. message_start — initial message metadata, empty content.
  2. content_block_start — one per block (text / tool_use / thinking).
  3. content_block_delta — N deltas with text_delta, input_json_delta, or thinking_delta.
  4. content_block_stop — close current block.
  5. message_delta — final stop_reason and usage counters.
  6. message_stop — end of stream.
  7. ping — sent every ~15s to keep the connection alive.

Raw bytes (abbreviated)

text
event: message_start
data: {"type":"message_start","message":{"id":"msg_…","model":"claude-sonnet-4-5","content":[]}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":", world"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":9}}

event: message_stop
data: {"type":"message_stop"}

Consuming with the Python SDK

python
with client.messages.stream(
    model='claude-sonnet-4-5',
    max_tokens=512,
    messages=[{'role':'user','content':'Tell me a joke'}],
) as stream:
    for text in stream.text_stream:
        print(text, end='', flush=True)
Streaming is auto-disabled when you declare web_search_20250305 — the agentic loop needs the full response to chain tool calls. Your SDK falls back to a blocking response without raising.

Tool use

Pass tools[] with input_schema (JSON Schema). The assistant returns content blocks of type tool_use; your client executes the tool and sends back tool_result in the next user turn.

Round 1 — declare tool, get tool_use

json
{
  "model": "claude-sonnet-4-5",
  "max_tokens": 1024,
  "tools": [{
    "name": "get_weather",
    "description": "Get current weather for a city",
    "input_schema": {
      "type": "object",
      "properties": { "city": { "type": "string" } },
      "required": ["city"]
    }
  }],
  "messages": [{ "role": "user", "content": "What is the weather in Berlin?" }]
}

Response includes a tool_use block:

json
{
  "stop_reason": "tool_use",
  "content": [
    { "type": "text",     "text": "Let me check that." },
    { "type": "tool_use", "id": "toolu_01abc", "name": "get_weather", "input": { "city": "Berlin" } }
  ]
}

Round 2 — send tool_result back

json
{
  "model": "claude-sonnet-4-5",
  "max_tokens": 1024,
  "tools": [/* same tools[] */],
  "messages": [
    { "role": "user",      "content": "What is the weather in Berlin?" },
    { "role": "assistant", "content": [
        { "type": "text",     "text": "Let me check that." },
        { "type": "tool_use", "id": "toolu_01abc", "name": "get_weather", "input": { "city": "Berlin" } }
    ]},
    { "role": "user", "content": [
        { "type": "tool_result", "tool_use_id": "toolu_01abc", "content": "18°C, sunny" }
    ]}
  ]
}

tool_choice

ValueBehaviour
{"type":"auto"} Default — model decides whether to call a tool
{"type":"any"} Forces the model to call ANY of the declared tools
{"type":"tool","name":"get_weather"} Forces this specific tool
{"type":"none"} Disables tool calls for this turn

Parallel tool calls: the model may return multiple tool_use blocks in one turn. Run them in parallel and return all tool_result blocks in the next user message.

Vision

Send images as Anthropic content blocks of type image. Three source modes:

source.typeShapeWhen to use
base64{ media_type, data }One-shot, image embedded inline
url { url } Public images on the web
file { file_id } Reusable across conversations / batches — see Files API
json
{
  "model": "claude-sonnet-4-5",
  "max_tokens": 512,
  "messages": [{
    "role": "user",
    "content": [
      { "type": "text", "text": "What is in this image?" },
      { "type": "image", "source": {
          "type": "base64",
          "media_type": "image/jpeg",
          "data": "<base64-bytes>"
      }}
    ]
  }]
}

Persistent vision via files

bash
# upload once
FILE_ID=$(curl -s https://api.alicode.store/v1/files \
  -H 'x-api-key: sk-ali-…' \
  -F 'file=@chart.png' | jq -r .id)

# reference in many turns / batches
curl https://api.alicode.store/v1/messages \
  -H 'x-api-key: sk-ali-…' \
  -H 'content-type: application/json' \
  -d "{
    \"model\":\"claude-sonnet-4-5\",
    \"max_tokens\":512,
    \"messages\":[{ \"role\":\"user\", \"content\":[
      { \"type\":\"image\", \"source\":{ \"type\":\"file\", \"file_id\":\"$FILE_ID\" } },
      { \"type\":\"text\",  \"text\":\"summarise this chart\" }
    ]}]
  }"

Prompt caching

Mark stable prefixes with cache_control: { type: "ephemeral" }. Subsequent requests reusing that prefix pay 10% of input price for the cached portion. Cache TTL: 5 minutes from last hit.

Up to 4 breakpoints per request. We auto-inject one breakpoint at the end of the system block when none is present, so just-passing requests still get caching value.

Where to put breakpoints

  • End of system prompt — most common, biggest win.
  • After a long, static document in the first user turn.
  • After your tools[] declaration if you have many.

Example

json
{
  "model": "claude-sonnet-4-5",
  "max_tokens": 256,
  "system": [
    { "type": "text", "text": "You are a senior support engineer. Follow company tone.\n\n[…3000 token style guide…]",
      "cache_control": { "type": "ephemeral" } }
  ],
  "messages": [{ "role": "user", "content": "Why is my server slow?" }]
}

Usage counters

json
"usage": {
  "input_tokens": 12,
  "cache_creation_input_tokens": 3045,
  "cache_read_input_tokens": 0,
  "output_tokens": 80
}

On the next request reusing the same prefix:

json
"usage": {
  "input_tokens": 14,
  "cache_creation_input_tokens": 0,
  "cache_read_input_tokens": 3045,
  "output_tokens": 80
}

You pay full price only for the input_tokens (new content) and 10% for cache_read_input_tokens.

Extended thinking

Available on Opus and recent Sonnet models. The assistant emits thinking content blocks before the final answer — visible reasoning, like Claude Code's internal monologue.

Enable with the thinking field on the request:

json
{
  "model": "claude-opus-4-7",
  "max_tokens": 4096,
  "thinking": { "type": "enabled", "budget_tokens": 4096 },
  "messages": [{ "role": "user", "content": "Plan a refactor of this codebase…" }]
}

Response contains both thinking and text blocks:

json
"content": [
  { "type": "thinking", "thinking": "The user wants a refactor plan. Let me…", "signature": "msg_…" },
  { "type": "text",     "text": "Here is a 5-step plan: …" }
]
Thinking tokens are billed as output tokens. Set budget_tokens conservatively (1K-8K is typical) — Opus can run away to 30K+ if you let it.

Billing &amp; limits

Pricing per 1M tokens (USD)

ModelInputOutputCache writeCache read
claude-haiku-4-5 $0.80$4.00$1.00$0.08
claude-sonnet-4-5 $2.50$12.00$3.13$0.25
claude-opus-4-8 $3.00$14.00$3.75$0.30

Discounts

  • Batch API — 50% off all input & output tokens.
  • Prompt caching — 90% off the cached prefix on read.

Topping up

Top up from $5. Credits never expire. There's nothing to cancel.

Rate limits

No per-minute caps. No daily token quotas. We apply only abuse protection on the IP layer (1000 req/min, 50 concurrent connections) — way above what any legitimate agent needs. Per-key RPM and spend caps can be set in the dashboard.

Error codes

Errors follow Anthropic's shape:

json
{ "type": "error", "error": { "type": "<error_type>", "message": "<human-readable>" }, "request_id": "req_011…" }
HTTPerror.typeMeaning
400invalid_request_errorBad JSON, missing field, invalid model, malformed tools[]
401authentication_errorMissing or wrong x-api-key
402insufficient_creditsBalance below estimated cost
403permission_errorKey not allowed for this model
404not_found_errorUnknown file/batch/model id
413request_too_largeBody bigger than 50 MB, or file upload > 25 MB
415invalid_request_errorUnsupported mime_type on file upload
429rate_limit_errorPer-key or per-IP rate limit exceeded
500api_errorServer bug — open a ticket with the request_id
502api_errorUpstream failed (retryable)
503overloaded_errorShield mode — try again shortly

Retry strategy

Treat 429, 502, 503, and idempotent timeouts as retryable. Use exponential backoff starting at 1s, doubling on each attempt, with jitter; cap at 3 retries. The official SDKs do this automatically — if you're using anthropic or @anthropic-ai/sdk you get retry-with-backoff out of the box.

Do not retry 400 / 401 / 402 / 403 / 404 / 413 / 415 — they indicate a problem with the request itself, retrying will just produce the same error. Fix the underlying cause first.

500 is rare and indicates a bug on our side; we surface those in our own monitoring and usually deploy a fix within hours. If you're hitting one persistently, open a ticket with the request-id — that's the fastest way for us to trace it.

Debugging a failing request

  1. Check the HTTP status code first — it tells you whose problem it is (4xx = client, 5xx = server).
  2. Read the error.message field in the JSON body — it's intentionally human-readable.
  3. Grab the request-id header from the response. Even on errors we return it.
  4. Reproduce with curl -v from a clean shell to confirm it's not your SDK adding weird headers.
  5. If you still can't figure it out, send hello@alicode.store the request id, the time, and the curl command. We'll find your request in our logs.
Every response carries a request-id header (also returned in the JSON on errors). Include it in support emails so we can pinpoint your call in our logs — it's the only piece of information that lets us correlate without storing your prompt body.

FAQ &amp; Legal

Most things people ask before signing up — click any question to expand the answer.

How can you be 85% cheaper than Anthropic?

Two reasons. First, we operate on consumer-grade infrastructure with no enterprise overhead — no dedicated success engineers, no on-prem deployments, no custom DPAs per customer. Second, we aggregate thousands of small developers into one large customer profile and negotiate volume rates on that combined demand. The official Anthropic list price is built for enterprise procurement; ours is built for one engineer with a credit card. We take a small margin in the middle, you keep the difference.

The trade-off is honest: you get the same model quality and wire format, but without enterprise SLAs, custom legal terms, or priority capacity during outages. For 99% of indie / agency / hobbyist use cases, that's a great deal.

Is this actually Claude, or some other model behind the scenes?

It is real Claude. Same Anthropic models, same training, same intelligence. We don't fine-tune, we don't proxy through some cheaper open-source model. We route your request to a Claude inference endpoint and return the raw bytes back. The only difference from calling Anthropic directly is the URL and your API key.

Will this break the moment Anthropic ships a new feature?

Anything that flows through wire-format extensions — new anthropic-beta flags, new content block types, new tool types — passes through transparently. We ship updates within hours of public Anthropic releases. Features that require server-side state (Files API, Message Batches, count_tokens) get implemented behind our own infra so SDK helpers keep working. Citations and code execution are the next two on our roadmap.

Is this legal? Are you allowed to resell?

Yes. We purchase API capacity at standard rates from authorised providers and resell access to that capacity. This is a normal commercial relationship — exactly how cloud resellers, MSPs, and aggregators have worked since the 1990s. You agree to use the service per the Acceptable Use Policy below; we agree to provide the API capacity we've sold you.

Note: you need to follow Anthropic's published Acceptable Use even though you're calling us — the same content rules apply (no abuse, no illegal content, no scraping protected systems). We will pass through any content moderation signals from upstream.

Do you log my prompts? My responses?

No. We log only request metadata for billing — timestamp, IP, user ID, model, input/output token counts, HTTP status. The prompt body and response body never touch our database. They flow through memory only for the duration of the request and are released the moment the response is sent.

Conversation memory features (SessionMemory, Memdir) are opt-in: you have to explicitly turn them on per request and they store only what you ask them to, scoped to your account.

Where is my data processed? Is this GDPR-compliant?

Our control-plane (account, billing, API key hashes) runs in the EU. Model inference is performed by Anthropic's regional endpoints — typically US-East for English-speaking customers, with EU options on request. Because we don't persist prompt or response content, GDPR data-subject rights (access, deletion, portability) apply only to your account metadata, which you can delete yourself from the dashboard at any time.

What if a request errors out mid-stream?

If the upstream returns 5xx before any tokens were generated, you receive a clean 502 and the request is never billed — no credits are deducted. We rotate upstream keys and automatically cool down failing ones, so a retry lands on a healthy key; the official SDKs retry 502/503 with backoff out of the box.

If the stream drops mid-response (rare but possible on poor networks), you'll receive partial content and the usage counters reflect only what was actually generated. The SDK should treat this like a regular short response.

Can I cancel? Get a refund?

There's no subscription, so there's nothing to cancel. Unused credits never expire — your balance stays in the account forever, even if you stop using the service for years. Refunds: if you accidentally top up the wrong account, contact support within 24 hours and we'll move the balance. For ordinary used credits we don't refund (the inference has already been billed upstream).

How does billing work exactly?

Each request is billed when the response completes. We deduct input_tokens × input_rate + output_tokens × output_rate at the rates published in the Billing section. Cached reads are billed at 10% of the input rate; batch requests are billed at 50% of normal. The deduction is atomic and visible in your dashboard within seconds. You can set per-key spend caps to hard-stop runaway scripts.

How does this compare to OpenRouter / Helicone / Together / Groq?

OpenRouter is a routing market — they offer many models from many providers at a small markup over each provider's list price. You get breadth, we give you depth on Anthropic-shape specifically: full Files / Batches / count_tokens, cheaper Claude, real Anthropic SSE bytes, prompt caching with cache_control passed through. If you only use Claude, you'll save more on us.

Helicone is observability + caching in front of upstream APIs — it doesn't change pricing, it adds analytics. You can put Helicone in front of AliCode if you want both.

Together / Groq / Fireworks serve open-source models (Llama, Mixtral, Qwen). Different category. They're cheaper still but they're not Claude.

Latency — am I going to lose seconds?

Our overhead is sub-millisecond: we add at most ~5-15ms to first-token latency when routing through our gateway, compared to a direct upstream call. The rest of the latency is whatever Anthropic itself delivers. In practice you won't notice the difference; Claude's own variance between calls is much larger than our routing overhead.

Can I use this in production?

Yes — we serve production traffic for paying customers and treat the gateway as a production system: monitored, auto-scaled, replicated, with key rotation and incident response. We don't publish formal SLAs on the free tier, but we target 99.5% monthly availability for paid accounts and publish status at /v1/health.

What's the difference between sending an image as base64 vs uploading via /v1/files?

Functionally identical — same vision quality, same token cost on the image side. The difference is bandwidth and request size: base64 inflates by ~33% and gets re-sent on every turn of a long conversation; file_id is uploaded once and referenced cheaply afterwards.

Do you support webhooks / async callbacks?

Not yet. For async workloads use Message Batches — poll GET /v1/messages/batches/{id} and act when processing_status == "ended". Webhooks are on the roadmap for Q3, alongside Citations and Code Execution.

Do you support team / shared accounts?

Not yet — every account is single-user. For team setups today, we recommend creating one account per team and using per-developer API keys (each key gets its own name + spend cap, so you can attribute usage). Team workspaces with shared billing are planned.

How do you handle abuse / runaway scripts?

Three lines of defence:

  • Per-key spend cap — set a dollar limit per key in the dashboard; once hit, the key auto-revokes.
  • Per-key RPM limit — cap requests-per-minute per key for protection against infinite loops.
  • Account-level emergency stop — one click in the dashboard revokes all keys at once.
Can I get a discount for higher volume?

Our public rates are already aggressive. If you're spending more than $500/month consistently, email hello@alicode.store with your use case — we can sometimes arrange custom rates for sustained heavy users or commercial partnerships.

Why is my Cursor stuck saying "Named models unavailable"?

Cursor on the free plan restricts arbitrary model names from the OpenAI-style endpoint. Workaround: in Cursor's model list, add a known OpenAI name like gpt-4o or gpt-4-turbo instead of claude-sonnet-4-5. Our gateway recognises those aliases and routes the request to the matching Claude model anyway.

I'm getting 401 errors but my key looks right

Common causes:

  • The key was revoked and you cached it locally — issue a new one in the dashboard.
  • You copied an extra space or newline at the end — strip whitespace.
  • You're using Bearer on the x-api-key header or the raw key on Authorization (needs Bearer prefix).
  • Your account is suspended for abuse — check email for a notice from us.
Acceptable Use Policy

By using AliCode you agree not to: generate content sexualising minors; harass, dox, or defame; produce content designed to mislead or impersonate real people; create mass disinformation or spam; attempt to break into systems you don't own; sign up multiple accounts to evade caps or bans; resell access without a prior written agreement; or use the service in ways that violate Anthropic's own AUP. Violations result in immediate account suspension. We do not refund credits on suspended accounts.

Terms of Service (summary)

AliCode is provided as is, without warranty of any kind. The Service is operated by its operator in accordance with the laws of the Russian Federation. You are responsible for the security of your API keys and the legality of your use. We may suspend accounts for AUP violations without refund. Credits are non-transferable between accounts. Pricing may change with reasonable notice; existing balance is honoured at the rate it was purchased. These Terms are governed by the laws of the Russian Federation — see the full Terms of Service.

Contact & support

General · billing
Abuse reports
Status · health

Response time on email is typically within 24h on weekdays. For server-side bugs include the request-id header from the failed call — that's how we find your trace in the logs.

Наш Telegram