Build with Claude
at a predictable cost.
AliCode is a drop-in Claude API gateway: the same wire format as Anthropic, a materially lower cost, and no hard rate limits. Change one base URL and keep building — no code changes required.
Point any tool that speaks the Anthropic Messages API or the OpenAI Chat Completions API at
https://api.alicode.store and use a key that starts with sk-ali-. There is nothing new to learn: the wire format is identical to api.anthropic.com across every endpoint we serve, down to the byte level of SSE events.
AliCode exists to make access to the Claude API economically predictable. The official list price is built for enterprise budgets; we consolidate demand across many teams and developers, contract for capacity at volume, and pass the resulting savings back through a transparent, pay-as-you-go model billed strictly on what you consume.
What you get
- Same wire format as Anthropic.
/v1/messages, SSE events, content blocks,tool_use, vision, prompt caching — identical bytes. If your code works against the official API, it works here with one env var change. - Both APIs in one endpoint. The OpenAI Chat Completions surface is also served, so Cursor / Continue / OpenWebUI / any OpenAI SDK script just needs a base URL swap. No translation glue on your side.
- Files, batches, count_tokens. Full feature parity for SDK helpers — your existing client code doesn't need a single conditional. The Anthropic SDK auto-discovers these and uses them as if it were talking to the official endpoint.
- Pay-as-you-go credits. Top up from 300 ₽. Credits never expire. No subscription, no monthly minimum, no commitment — spend stops the moment you stop calling the API.
- No per-minute rate limits. Access is not gated by tier. The only ceiling is abuse protection at the IP layer (1000 req/min) — orders of magnitude above what any legitimate agent loop requires.
- Same security envelope. All traffic over TLS 1.3, scoped API keys with bcrypt-hashed storage, per-key spend caps and model whitelists, instant revocation, no prompt or response body logged.
Who it's for
AliCode is built for developers who use AI as a daily working tool rather than as a feature inside a SaaS product. If you work continuously in Cline, Cursor, Claude Code, Continue or Aider — or run Python workloads that call the API intensively for embeddings, classification, agent loops and batch summarisation — this is for you. Pricing is structured so that heavy use stays affordable: a Cursor user generating 200K Sonnet tokens a day pays roughly $5 a month instead of $60.
AliCode is not a replacement for enterprise Anthropic contracts. If you require a custom DPA, SOC2 documentation, dedicated capacity, or a formal SLA above 99.5%, please contact Anthropic directly. We provide resale of their capacity at consumer scale.
Available models
| Model | Context | Max output | Best for |
|---|---|---|---|
| claude-haiku-4-5 | 200K | 64K | Agent loops, fast classification, cheap completions |
| claude-sonnet-4-5 | 200K | 64K | Default — daily coding, vision, tool use |
| claude-sonnet-4-6 | 200K | 64K | Latest Sonnet — better instruction following |
| claude-opus-4-5 | 200K | 64K | Hardest reasoning, extended thinking |
| claude-opus-4-7 | 200K | 128K | Top tier — agents, deep refactors, multi-step planning |
Earlier model IDs (claude-3-7-sonnet-latest, claude-3-5-haiku-latest, claude-opus-4-1) are also accepted as aliases.
Quickstart
Five minutes from zero to first request.
- Register an account (email + password).
- Open /dashboard/keys → Create new key → copy it. It starts with
sk-ali-…. - Make your first request:
curl https://api.alicode.store/v1/messages \
-H 'x-api-key: $ALICODE_KEY' \
-H 'anthropic-version: 2023-06-01' \
-H 'content-type: application/json' \
-d '{
"model": "claude-sonnet-4-5",
"max_tokens": 256,
"messages": [{"role":"user","content":"hi"}]
}'
import os, anthropic
client = anthropic.Anthropic(
api_key=os.environ["ALICODE_KEY"],
base_url="https://api.alicode.store",
)
msg = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=256,
messages=[{"role":"user","content":"hi"}],
)
print(msg.content[0].text)
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({
apiKey: process.env.ALICODE_KEY,
baseURL: "https://api.alicode.store",
});
const msg = await client.messages.create({
model: "claude-sonnet-4-5",
max_tokens: 256,
messages: [{ role: "user", content: "hi" }],
});
console.log(msg.content[0].text);
That's it. Your existing Anthropic SDK code works unchanged.
Migrate from Anthropic
One environment variable, no code changes. Replace api.anthropic.com with api.alicode.store; replace your sk-ant-… key with sk-ali-…
Bash / Docker
# before
export ANTHROPIC_API_KEY=sk-ant-…
export ANTHROPIC_BASE_URL=https://api.anthropic.com
# after
export ANTHROPIC_API_KEY=sk-ali-…
export ANTHROPIC_BASE_URL=https://api.alicode.store
Python SDK
# only this line changes
client = anthropic.Anthropic(
api_key=os.environ['ALICODE_KEY'],
base_url='https://api.alicode.store',
)
Node SDK
const client = new Anthropic({
apiKey: process.env.ALICODE_KEY,
baseURL: 'https://api.alicode.store',
});
Feature parity matrix
| Feature | Anthropic | AliCode | Notes |
|---|---|---|---|
| Messages API | ✓ | ✓ | Byte-for-byte compatible response shape |
| Streaming SSE | ✓ | ✓ | Same event names & ordering |
| Tool use | ✓ | ✓ | Including parallel tool calls |
| Vision | ✓ | ✓ | base64 / url / file_id all supported |
| Prompt caching | ✓ | ✓ | cache_control: ephemeral |
| Extended thinking | ✓ | ✓ | thinking field on Opus |
| Files API | ✓ | ✓ | Persistent uploads for vision/docs |
| Message Batches | ✓ | ✓ | JSONL results, 50% pricing |
| Count tokens | ✓ | ✓ | BPE-accurate (±3-5%) |
| Citations | ✓ | — | Coming Q3 |
Cursor
Cursor uses an OpenAI-compatible base URL. Override it to ours.
- File → Preferences → Cursor Settings → Models
- Scroll to API Keys → enable OpenAI API Key.
- Paste your
sk-ali-…key. - Enable Override OpenAI Base URL and paste
https://api.alicode.store/v1 - Click Verify. Then in Add or search model, add
claude-sonnet-4-5(orclaude-haiku-4-5/claude-opus-4-7).
Cline
Cline is a free agentic VS Code extension. It speaks Anthropic natively.
- VS Code → Extensions → search Cline → install.
- Open the Cline panel (left sidebar) → Settings (top-right).
- API Provider: Anthropic.
- API Key:
sk-ali-… - Use custom base URL: ✓ →
https://api.alicode.store - Model ID:
claude-sonnet-4-5
Save and start a new task. Cline will use all its tools (Read, Edit, Bash, etc.) over our gateway, including streamed tool_use and vision blocks.
Recommended model picks
- Default coding:
claude-sonnet-4-5— fastest acceptable quality, what 90% of users settle on. - Cheap exploratory work:
claude-haiku-4-5— great for "explain this file" / "find the bug" loops; about 4× cheaper than Sonnet. - Hardest refactors:
claude-opus-4-7— when Sonnet keeps making the same mistake, switch to Opus for one turn and switch back.
Tips
- Vision: drag-and-drop images into Cline's input — they'll be sent as
imagecontent blocks. Our gateway routes them to a captioning provider behind the scenes, then hands the caption to Claude for reasoning. You don't see any of that — just the answer. - Web search: declare nothing extra — we auto-inject the
web_searchtool when Cline asks for it viaexecute_command + curl. Saves your local sandbox from being torn down by failed network calls. - Long sessions: Cline's auto-compaction works perfectly — we honour the
context_management.compact_*capabilities on every model.
Claude Code (Anthropic CLI)
Two environment variables, one command.
export ANTHROPIC_BASE_URL=https://api.alicode.store
export ANTHROPIC_API_KEY=sk-ali-…
claude
Add the two exports to your shell rc-file (~/.zshrc / ~/.bashrc) to make them permanent. On Windows: setx ANTHROPIC_BASE_URL ...
Continue.dev
Edit your Continue config (gear icon → Open config.yaml):
name: AliCode
models:
- name: AliCode Sonnet
provider: openai
model: claude-sonnet-4-5
apiKey: sk-ali-…
apiBase: https://api.alicode.store/v1
roles: [chat, edit, apply]
Roo Code
Roo is a Cline fork with multi-mode (Code / Architect / Ask). Same setup as Cline:
- Provider: Anthropic
- Base URL:
https://api.alicode.store - API Key:
sk-ali-… - Model ID:
claude-sonnet-4-5
Python & Node SDKs
Python (anthropic)
pip install anthropic
from anthropic import Anthropic
client = Anthropic(api_key='sk-ali-…', base_url='https://api.alicode.store')
# Non-streaming
msg = client.messages.create(
model='claude-sonnet-4-5',
max_tokens=512,
messages=[{'role': 'user', 'content': 'Hello'}],
)
print(msg.content[0].text)
# Streaming
with client.messages.stream(
model='claude-sonnet-4-5',
max_tokens=512,
messages=[{'role': 'user', 'content': 'Tell me a joke'}],
) as stream:
for text in stream.text_stream:
print(text, end='', flush=True)
Node (@anthropic-ai/sdk)
npm install @anthropic-ai/sdk
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic({ apiKey: 'sk-ali-…', baseURL: 'https://api.alicode.store' });
const stream = await client.messages.stream({
model: 'claude-sonnet-4-5',
max_tokens: 512,
messages: [{ role: 'user', content: 'Hi' }],
});
for await (const ev of stream) {
if (ev.type === 'content_block_delta' && ev.delta.type === 'text_delta') {
process.stdout.write(ev.delta.text);
}
}
OpenAI SDK (for /v1/chat/completions)
from openai import OpenAI
client = OpenAI(api_key='sk-ali-…', base_url='https://api.alicode.store/v1')
resp = client.chat.completions.create(
model='claude-sonnet-4-5',
messages=[{'role': 'user', 'content': 'Hi'}],
)
print(resp.choices[0].message.content)
Authentication
All /v1/* endpoints require an API key. We accept it on either header — pick whichever your SDK already uses:
| Header | Format | Used by |
|---|---|---|
| x-api-key | sk-ali-… | Anthropic SDK, Cline, Claude Code |
| Authorization | Bearer sk-ali-… | OpenAI SDK, Cursor, Continue |
Creating keys
- Go to /dashboard/keys.
- Click Create new key, give it a name (e.g. "production").
- Copy the secret — it's shown once. We store only a hash.
Per-key restrictions
Each key supports optional per-key limits:
- Rate limit (RPM) — requests per minute, defaults to your account's tier.
- Spend cap (cents) — auto-revoke after spending N cents on this key.
- Allowed models — whitelist; requests using other models get
403.
Revoke keys instantly from the dashboard. Revoked keys 401 within seconds across all servers.
Key rotation
Issue a new key, deploy it, then revoke the old one. No downtime — both work in parallel until you revoke. We recommend rotating production keys every 90 days as a baseline.
Security model
We store only a bcrypt hash of your secret — the plaintext is shown once at creation and never again. If you lose it, issue a new key and revoke the old one; we cannot recover the lost secret. Every request is also tied to the IP address that made it (visible in your dashboard), so you can quickly audit unexpected usage.
Where to put keys
- Local development: environment variable in your shell rc-file (
~/.zshrc/~/.bashrc), never committed to git. - CI/CD: encrypted secret in GitHub Actions / GitLab / Vercel / your platform's secret store.
- Servers:
/etc/environmentor systemdEnvironmentFile=, root-readable only. - Never embed in client-side JS, mobile apps, or any place a user can extract bytes from disk.
If a key leaks, revoke it from the dashboard. Within seconds it returns 401 across every server we run.
POST/v1/messages
Anthropic-compatible Messages endpoint. Identical to api.anthropic.com/v1/messages.
Request body
| Field | Type | Req | Notes |
|---|---|---|---|
| model | string | yes | claude-sonnet-4-5, -haiku-4-5, -opus-4-8 |
| messages | array | yes | Anthropic-shape: role + content blocks |
| max_tokens | int | yes | Output cap. Auto-boosted to the model ceiling. |
| system | string|array | no | System prompt; pass an array of blocks for cache_control breakpoints |
| stream | bool | no | Enable SSE (see Streaming) |
| tools | array | no | Tool schemas, incl. server-side web_search_20250305 |
| tool_choice | object | no | auto / any / tool / none |
| temperature | float | no | Range 0–1. A sensible default is applied when omitted (lower for tool/agent calls). |
| top_p | float | no | Nucleus sampling, prefer temperature |
| top_k | int | no | Sample from top K candidates |
| stop_sequences | array | no | Up to 4 strings that stop generation early |
| thinking | object | no | {"type":"enabled","budget_tokens":4096} (Opus & Sonnet 4.5+) |
| metadata | object | no | {"user_id":"…"} for your own analytics |
Response shape
{
"id": "msg_011…",
"type": "message",
"role": "assistant",
"model": "claude-sonnet-4-5",
"content": [
{ "type": "text", "text": "Closures in PHP capture variables…" }
],
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 42,
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 0,
"output_tokens": 128
}
}
stop_reason values
| Value | Meaning |
|---|---|
| end_turn | Model finished naturally |
| max_tokens | Hit your max_tokens cap mid-generation |
| stop_sequence | Output matched one of your stop_sequences |
| tool_use | Model wants to call a tool — execute it and continue the conversation |
| pause_turn | Long-running tool loop paused — resume by sending the same messages back |
POST/v1/messages/count_tokens
Counts tokens for a planned request without invoking the model. Used by the SDK for cost estimation, context-window planning, and rate-limit pre-checks.
Same request body as /v1/messages. Estimation uses cl100k_base BPE (closest open approximation to Claude's tokenizer) — typical accuracy ±3-5% on prose, ±8-10% on dense code/JSON.
curl https://api.alicode.store/v1/messages/count_tokens \
-H 'x-api-key: sk-ali-…' \
-H 'content-type: application/json' \
-d '{
"model": "claude-sonnet-4-5",
"system": "You are Claude.",
"messages": [{"role":"user","content":"Hello world"}]
}'
{ "input_tokens": 14 }
POST/v1/chat/completions
OpenAI-compatible. Used by Cursor (via base URL override), Continue, OpenWebUI, your custom scripts written against the OpenAI SDK.
We translate OpenAI-shape requests to Anthropic internally and the response back to OpenAI shape, including streaming chunks, tool_calls, vision image_url blocks, and finish_reason.
curl https://api.alicode.store/v1/chat/completions \
-H 'Authorization: Bearer sk-ali-…' \
-H 'content-type: application/json' \
-d '{
"model": "claude-sonnet-4-5",
"messages": [{"role":"user","content":"ping"}]
}'
finish_reason mapping
| Anthropic stop_reason | OpenAI finish_reason |
|---|---|
| end_turn | stop |
| max_tokens | length |
| stop_sequence | stop |
| tool_use | tool_calls |
GET/v1/models
Returns the list of public model IDs. Two shapes served from the same path depending on the Accept header:
- OpenAI shape —
{ object: 'list', data: [{ id, object, owned_by, … }] } - Anthropic shape (
GET /v1/models/{id}too) — full capabilities tree with image_input, thinking.adaptive, effort.max, prompt_caching, context_management.
curl https://api.alicode.store/v1/models \
-H 'Authorization: Bearer sk-ali-…'
# specific model
curl https://api.alicode.store/v1/models/claude-sonnet-4-5 \
-H 'Authorization: Bearer sk-ali-…'
/v1/files
Persistent upload for images, PDFs, and text dumps. Once uploaded, you reference the file by id in messages content instead of sending base64 every turn — saves token cost and bandwidth across long conversations and batches.
Beta header: anthropic-beta: files-api-2025-04-14 (accepted but not required).
POST/v1/files — upload
curl https://api.alicode.store/v1/files \
-H 'x-api-key: sk-ali-…' \
-F 'file=@/path/to/image.png'
{
"id": "file_011…",
"type": "file",
"filename": "image.png",
"mime_type": "image/png",
"size_bytes": 184320,
"created_at": "2026-05-17T15:23:39Z",
"downloadable": true
}
Use the file in a message
{
"model": "claude-sonnet-4-5",
"max_tokens": 512,
"messages": [{
"role": "user",
"content": [
{ "type": "image", "source": { "type": "file", "file_id": "file_011…" } },
{ "type": "text", "text": "What is in this image?" }
]
}]
}
Other endpoints
| Method & path | Purpose |
|---|---|
| GET /v1/files | List, paginated via before_id / after_id / limit |
| GET /v1/files/{id} | Metadata |
| GET /v1/files/{id}/content | Raw bytes download (streamed) |
| DELETE /v1/files/{id} | Soft-delete; 24h grace period before purge |
Limits
- Max upload: 25 MB per file.
- Accepted mime types: PNG, JPEG, WEBP, GIF, PDF, plain text, markdown, CSV.
- SHA-256 dedup: re-uploading identical bytes returns the existing file id, no new storage.
/v1/messages/batches
Process up to 10,000 messages in one async job at 50% of normal pricing. Submit, poll, fetch results — all asynchronous.
Beta header: anthropic-beta: message-batches-2024-09-24 (accepted but not required).
POST/v1/messages/batches — submit
curl https://api.alicode.store/v1/messages/batches \
-H 'x-api-key: sk-ali-…' \
-H 'content-type: application/json' \
-d '{
"requests": [
{ "custom_id": "q-1", "params": { "model":"claude-sonnet-4-5", "max_tokens":100, "messages":[{"role":"user","content":"2+2"}] }},
{ "custom_id": "q-2", "params": { "model":"claude-sonnet-4-5", "max_tokens":100, "messages":[{"role":"user","content":"capital of france"}] }}
]
}'
{
"id": "msgbatch_011…",
"type": "message_batch",
"processing_status": "in_progress",
"request_counts": {"processing":2,"succeeded":0,"errored":0,"canceled":0,"expired":0},
"created_at": "2026-05-18T15:25:50Z",
"expires_at": "2026-05-19T15:25:50Z",
"results_url": null
}
Poll until ended
# Poll every few seconds
curl https://api.alicode.store/v1/messages/batches/msgbatch_011… \
-H 'x-api-key: sk-ali-…'
# When processing_status == 'ended', fetch JSONL results
curl https://api.alicode.store/v1/messages/batches/msgbatch_011…/results \
-H 'x-api-key: sk-ali-…'
{"custom_id":"q-1","result":{"type":"succeeded","message":{ … }}}
{"custom_id":"q-2","result":{"type":"succeeded","message":{ … }}}
All endpoints
| Method & path | Purpose |
|---|---|
| POST /v1/messages/batches | Create batch (up to 10K requests) |
| GET /v1/messages/batches | List, cursor-paginated |
| GET /v1/messages/batches/{id} | Get status & counters |
| POST /v1/messages/batches/{id}/cancel | Mark canceling; remaining items skipped |
| GET /v1/messages/batches/{id}/results | Streamed JSONL output |
| DELETE /v1/messages/batches/{id} | Delete batch & purge results (must be ended) |
Validation
custom_idis required and must be unique within the batch.- Each
paramsobject is a full/v1/messagesbody (no streaming — batches always return finished messages). - Batches expire 24h after creation; remaining items become
expired.
Streaming SSE
Set stream: true. The server returns Anthropic-format Server-Sent Events. Events arrive in this order:
message_start— initial message metadata, empty content.content_block_start— one per block (text / tool_use / thinking).content_block_delta— N deltas withtext_delta,input_json_delta, orthinking_delta.content_block_stop— close current block.message_delta— finalstop_reasonand usage counters.message_stop— end of stream.ping— sent every ~15s to keep the connection alive.
Raw bytes (abbreviated)
event: message_start
data: {"type":"message_start","message":{"id":"msg_…","model":"claude-sonnet-4-5","content":[]}}
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":", world"}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn"},"usage":{"output_tokens":9}}
event: message_stop
data: {"type":"message_stop"}
Consuming with the Python SDK
with client.messages.stream(
model='claude-sonnet-4-5',
max_tokens=512,
messages=[{'role':'user','content':'Tell me a joke'}],
) as stream:
for text in stream.text_stream:
print(text, end='', flush=True)
Tool use
Pass tools[] with input_schema (JSON Schema). The assistant returns content blocks of type tool_use; your client executes the tool and sends back tool_result in the next user turn.
Round 1 — declare tool, get tool_use
{
"model": "claude-sonnet-4-5",
"max_tokens": 1024,
"tools": [{
"name": "get_weather",
"description": "Get current weather for a city",
"input_schema": {
"type": "object",
"properties": { "city": { "type": "string" } },
"required": ["city"]
}
}],
"messages": [{ "role": "user", "content": "What is the weather in Berlin?" }]
}
Response includes a tool_use block:
{
"stop_reason": "tool_use",
"content": [
{ "type": "text", "text": "Let me check that." },
{ "type": "tool_use", "id": "toolu_01abc", "name": "get_weather", "input": { "city": "Berlin" } }
]
}
Round 2 — send tool_result back
{
"model": "claude-sonnet-4-5",
"max_tokens": 1024,
"tools": [/* same tools[] */],
"messages": [
{ "role": "user", "content": "What is the weather in Berlin?" },
{ "role": "assistant", "content": [
{ "type": "text", "text": "Let me check that." },
{ "type": "tool_use", "id": "toolu_01abc", "name": "get_weather", "input": { "city": "Berlin" } }
]},
{ "role": "user", "content": [
{ "type": "tool_result", "tool_use_id": "toolu_01abc", "content": "18°C, sunny" }
]}
]
}
tool_choice
| Value | Behaviour |
|---|---|
| {"type":"auto"} | Default — model decides whether to call a tool |
| {"type":"any"} | Forces the model to call ANY of the declared tools |
| {"type":"tool","name":"get_weather"} | Forces this specific tool |
| {"type":"none"} | Disables tool calls for this turn |
Parallel tool calls: the model may return multiple tool_use blocks in one turn. Run them in parallel and return all tool_result blocks in the next user message.
Web search
Anthropic's server-side web_search_20250305 tool, fully emulated. Declare it in tools[] — the model decides when to search, our gateway runs the query, results land back in the model's context, you get a final answer with cited URLs.
{
"model": "claude-sonnet-4-5",
"max_tokens": 512,
"tools": [{ "type": "web_search_20250305", "name": "web_search", "max_uses": 3 }],
"messages": [{ "role": "user", "content": "What is the latest stable PHP version?" }]
}
Vision
Send images as Anthropic content blocks of type image. Three source modes:
| source.type | Shape | When to use |
|---|---|---|
| base64 | { media_type, data } | One-shot, image embedded inline |
| url | { url } | Public images on the web |
| file | { file_id } | Reusable across conversations / batches — see Files API |
{
"model": "claude-sonnet-4-5",
"max_tokens": 512,
"messages": [{
"role": "user",
"content": [
{ "type": "text", "text": "What is in this image?" },
{ "type": "image", "source": {
"type": "base64",
"media_type": "image/jpeg",
"data": "<base64-bytes>"
}}
]
}]
}
Persistent vision via files
# upload once
FILE_ID=$(curl -s https://api.alicode.store/v1/files \
-H 'x-api-key: sk-ali-…' \
-F 'file=@chart.png' | jq -r .id)
# reference in many turns / batches
curl https://api.alicode.store/v1/messages \
-H 'x-api-key: sk-ali-…' \
-H 'content-type: application/json' \
-d "{
\"model\":\"claude-sonnet-4-5\",
\"max_tokens\":512,
\"messages\":[{ \"role\":\"user\", \"content\":[
{ \"type\":\"image\", \"source\":{ \"type\":\"file\", \"file_id\":\"$FILE_ID\" } },
{ \"type\":\"text\", \"text\":\"summarise this chart\" }
]}]
}"
Prompt caching
Mark stable prefixes with cache_control: { type: "ephemeral" }. Subsequent requests reusing that prefix pay 10% of input price for the cached portion. Cache TTL: 5 minutes from last hit.
Up to 4 breakpoints per request. We auto-inject one breakpoint at the end of the system block when none is present, so just-passing requests still get caching value.
Where to put breakpoints
- End of system prompt — most common, biggest win.
- After a long, static document in the first user turn.
- After your tools[] declaration if you have many.
Example
{
"model": "claude-sonnet-4-5",
"max_tokens": 256,
"system": [
{ "type": "text", "text": "You are a senior support engineer. Follow company tone.\n\n[…3000 token style guide…]",
"cache_control": { "type": "ephemeral" } }
],
"messages": [{ "role": "user", "content": "Why is my server slow?" }]
}
Usage counters
"usage": {
"input_tokens": 12,
"cache_creation_input_tokens": 3045,
"cache_read_input_tokens": 0,
"output_tokens": 80
}
On the next request reusing the same prefix:
"usage": {
"input_tokens": 14,
"cache_creation_input_tokens": 0,
"cache_read_input_tokens": 3045,
"output_tokens": 80
}
You pay full price only for the input_tokens (new content) and 10% for cache_read_input_tokens.
Extended thinking
Available on Opus and recent Sonnet models. The assistant emits thinking content blocks before the final answer — visible reasoning, like Claude Code's internal monologue.
Enable with the thinking field on the request:
{
"model": "claude-opus-4-7",
"max_tokens": 4096,
"thinking": { "type": "enabled", "budget_tokens": 4096 },
"messages": [{ "role": "user", "content": "Plan a refactor of this codebase…" }]
}
Response contains both thinking and text blocks:
"content": [
{ "type": "thinking", "thinking": "The user wants a refactor plan. Let me…", "signature": "msg_…" },
{ "type": "text", "text": "Here is a 5-step plan: …" }
]
Billing & limits
Pricing per 1M tokens (USD)
| Model | Input | Output | Cache write | Cache read |
|---|---|---|---|---|
| claude-haiku-4-5 | $0.80 | $4.00 | $1.00 | $0.08 |
| claude-sonnet-4-5 | $2.50 | $12.00 | $3.13 | $0.25 |
| claude-opus-4-8 | $3.00 | $14.00 | $3.75 | $0.30 |
Discounts
- Batch API — 50% off all input & output tokens.
- Prompt caching — 90% off the cached prefix on read.
Topping up
Top up from $5. Credits never expire. There's nothing to cancel.
Rate limits
No per-minute caps. No daily token quotas. We apply only abuse protection on the IP layer (1000 req/min, 50 concurrent connections) — way above what any legitimate agent needs. Per-key RPM and spend caps can be set in the dashboard.
Error codes
Errors follow Anthropic's shape:
{ "type": "error", "error": { "type": "<error_type>", "message": "<human-readable>" }, "request_id": "req_011…" }
| HTTP | error.type | Meaning |
|---|---|---|
| 400 | invalid_request_error | Bad JSON, missing field, invalid model, malformed tools[] |
| 401 | authentication_error | Missing or wrong x-api-key |
| 402 | insufficient_credits | Balance below estimated cost |
| 403 | permission_error | Key not allowed for this model |
| 404 | not_found_error | Unknown file/batch/model id |
| 413 | request_too_large | Body bigger than 50 MB, or file upload > 25 MB |
| 415 | invalid_request_error | Unsupported mime_type on file upload |
| 429 | rate_limit_error | Per-key or per-IP rate limit exceeded |
| 500 | api_error | Server bug — open a ticket with the request_id |
| 502 | api_error | Upstream failed (retryable) |
| 503 | overloaded_error | Shield mode — try again shortly |
Retry strategy
Treat 429, 502, 503, and idempotent timeouts as retryable. Use exponential backoff starting at 1s, doubling on each attempt, with jitter; cap at 3 retries. The official SDKs do this automatically — if you're using anthropic or @anthropic-ai/sdk you get retry-with-backoff out of the box.
Do not retry 400 / 401 / 402 / 403 / 404 / 413 / 415 — they indicate a problem with the request itself, retrying will just produce the same error. Fix the underlying cause first.
500 is rare and indicates a bug on our side; we surface those in our own monitoring and usually deploy a fix within hours. If you're hitting one persistently, open a ticket with the request-id — that's the fastest way for us to trace it.
Debugging a failing request
- Check the HTTP status code first — it tells you whose problem it is (4xx = client, 5xx = server).
- Read the
error.messagefield in the JSON body — it's intentionally human-readable. - Grab the
request-idheader from the response. Even on errors we return it. - Reproduce with
curl -vfrom a clean shell to confirm it's not your SDK adding weird headers. - If you still can't figure it out, send
hello@alicode.storethe request id, the time, and the curl command. We'll find your request in our logs.
FAQ & Legal
Most things people ask before signing up — click any question to expand the answer.
How can you be 85% cheaper than Anthropic?
Two reasons. First, we operate on consumer-grade infrastructure with no enterprise overhead — no dedicated success engineers, no on-prem deployments, no custom DPAs per customer. Second, we aggregate thousands of small developers into one large customer profile and negotiate volume rates on that combined demand. The official Anthropic list price is built for enterprise procurement; ours is built for one engineer with a credit card. We take a small margin in the middle, you keep the difference.
The trade-off is honest: you get the same model quality and wire format, but without enterprise SLAs, custom legal terms, or priority capacity during outages. For 99% of indie / agency / hobbyist use cases, that's a great deal.
Is this actually Claude, or some other model behind the scenes?
It is real Claude. Same Anthropic models, same training, same intelligence. We don't fine-tune, we don't proxy through some cheaper open-source model. We route your request to a Claude inference endpoint and return the raw bytes back. The only difference from calling Anthropic directly is the URL and your API key.
Will this break the moment Anthropic ships a new feature?
Anything that flows through wire-format extensions — new anthropic-beta flags, new content block types, new tool types — passes through transparently. We ship updates within hours of public Anthropic releases. Features that require server-side state (Files API, Message Batches, count_tokens) get implemented behind our own infra so SDK helpers keep working. Citations and code execution are the next two on our roadmap.
Is this legal? Are you allowed to resell?
Yes. We purchase API capacity at standard rates from authorised providers and resell access to that capacity. This is a normal commercial relationship — exactly how cloud resellers, MSPs, and aggregators have worked since the 1990s. You agree to use the service per the Acceptable Use Policy below; we agree to provide the API capacity we've sold you.
Note: you need to follow Anthropic's published Acceptable Use even though you're calling us — the same content rules apply (no abuse, no illegal content, no scraping protected systems). We will pass through any content moderation signals from upstream.
Do you log my prompts? My responses?
No. We log only request metadata for billing — timestamp, IP, user ID, model, input/output token counts, HTTP status. The prompt body and response body never touch our database. They flow through memory only for the duration of the request and are released the moment the response is sent.
Conversation memory features (SessionMemory, Memdir) are opt-in: you have to explicitly turn them on per request and they store only what you ask them to, scoped to your account.
Where is my data processed? Is this GDPR-compliant?
Our control-plane (account, billing, API key hashes) runs in the EU. Model inference is performed by Anthropic's regional endpoints — typically US-East for English-speaking customers, with EU options on request. Because we don't persist prompt or response content, GDPR data-subject rights (access, deletion, portability) apply only to your account metadata, which you can delete yourself from the dashboard at any time.
What if a request errors out mid-stream?
If the upstream returns 5xx before any tokens were generated, you receive a clean 502 and the request is never billed — no credits are deducted. We rotate upstream keys and automatically cool down failing ones, so a retry lands on a healthy key; the official SDKs retry 502/503 with backoff out of the box.
If the stream drops mid-response (rare but possible on poor networks), you'll receive partial content and the usage counters reflect only what was actually generated. The SDK should treat this like a regular short response.
Can I cancel? Get a refund?
There's no subscription, so there's nothing to cancel. Unused credits never expire — your balance stays in the account forever, even if you stop using the service for years. Refunds: if you accidentally top up the wrong account, contact support within 24 hours and we'll move the balance. For ordinary used credits we don't refund (the inference has already been billed upstream).
How does billing work exactly?
Each request is billed when the response completes. We deduct input_tokens × input_rate + output_tokens × output_rate at the rates published in the Billing section. Cached reads are billed at 10% of the input rate; batch requests are billed at 50% of normal. The deduction is atomic and visible in your dashboard within seconds. You can set per-key spend caps to hard-stop runaway scripts.
How does this compare to OpenRouter / Helicone / Together / Groq?
OpenRouter is a routing market — they offer many models from many providers at a small markup over each provider's list price. You get breadth, we give you depth on Anthropic-shape specifically: full Files / Batches / count_tokens, cheaper Claude, real Anthropic SSE bytes, prompt caching with cache_control passed through. If you only use Claude, you'll save more on us.
Helicone is observability + caching in front of upstream APIs — it doesn't change pricing, it adds analytics. You can put Helicone in front of AliCode if you want both.
Together / Groq / Fireworks serve open-source models (Llama, Mixtral, Qwen). Different category. They're cheaper still but they're not Claude.
Latency — am I going to lose seconds?
Our overhead is sub-millisecond: we add at most ~5-15ms to first-token latency when routing through our gateway, compared to a direct upstream call. The rest of the latency is whatever Anthropic itself delivers. In practice you won't notice the difference; Claude's own variance between calls is much larger than our routing overhead.
Can I use this in production?
Yes — we serve production traffic for paying customers and treat the gateway as a production system: monitored, auto-scaled, replicated, with key rotation and incident response. We don't publish formal SLAs on the free tier, but we target 99.5% monthly availability for paid accounts and publish status at /v1/health.
What's the difference between sending an image as base64 vs uploading via /v1/files?
Functionally identical — same vision quality, same token cost on the image side. The difference is bandwidth and request size: base64 inflates by ~33% and gets re-sent on every turn of a long conversation; file_id is uploaded once and referenced cheaply afterwards.
Do you support webhooks / async callbacks?
Not yet. For async workloads use Message Batches — poll GET /v1/messages/batches/{id} and act when processing_status == "ended". Webhooks are on the roadmap for Q3, alongside Citations and Code Execution.
Do you support team / shared accounts?
Not yet — every account is single-user. For team setups today, we recommend creating one account per team and using per-developer API keys (each key gets its own name + spend cap, so you can attribute usage). Team workspaces with shared billing are planned.
How do you handle abuse / runaway scripts?
Three lines of defence:
- Per-key spend cap — set a dollar limit per key in the dashboard; once hit, the key auto-revokes.
- Per-key RPM limit — cap requests-per-minute per key for protection against infinite loops.
- Account-level emergency stop — one click in the dashboard revokes all keys at once.
Can I get a discount for higher volume?
Our public rates are already aggressive. If you're spending more than $500/month consistently, email hello@alicode.store with your use case — we can sometimes arrange custom rates for sustained heavy users or commercial partnerships.
Why is my Cursor stuck saying "Named models unavailable"?
Cursor on the free plan restricts arbitrary model names from the OpenAI-style endpoint. Workaround: in Cursor's model list, add a known OpenAI name like gpt-4o or gpt-4-turbo instead of claude-sonnet-4-5. Our gateway recognises those aliases and routes the request to the matching Claude model anyway.
I'm getting 401 errors but my key looks right
Common causes:
- The key was revoked and you cached it locally — issue a new one in the dashboard.
- You copied an extra space or newline at the end — strip whitespace.
- You're using
Beareron thex-api-keyheader or the raw key onAuthorization(needsBearerprefix). - Your account is suspended for abuse — check email for a notice from us.
Acceptable Use Policy
By using AliCode you agree not to: generate content sexualising minors; harass, dox, or defame; produce content designed to mislead or impersonate real people; create mass disinformation or spam; attempt to break into systems you don't own; sign up multiple accounts to evade caps or bans; resell access without a prior written agreement; or use the service in ways that violate Anthropic's own AUP. Violations result in immediate account suspension. We do not refund credits on suspended accounts.
Terms of Service (summary)
AliCode is provided as is, without warranty of any kind. The Service is operated by its operator in accordance with the laws of the Russian Federation. You are responsible for the security of your API keys and the legality of your use. We may suspend accounts for AUP violations without refund. Credits are non-transferable between accounts. Pricing may change with reasonable notice; existing balance is honoured at the rate it was purchased. These Terms are governed by the laws of the Russian Federation — see the full Terms of Service.
Contact & support
Response time on email is typically within 24h on weekdays. For server-side bugs include the request-id header from the failed call — that's how we find your trace in the logs.