v0.3.1 · stable · zero telemetry

Your demo didn't fail
because of the model.
It failed because your API key
got rate-limited.

AKM sits quietly in your backend and makes sure your LLM calls don't randomly collapse — right when it matters. Smart rotation, auto-failover, zero proxy.

$ npm install ai-key-manager

★ Star on GitHub View on npm See the code →

🔒Zero Telemetry — local-first

⚡Auto-Failover — across providers

🔑Secrets Redacted — even in logs

🧠Smart LRU — no manual rotation

The Problem

It always happens
at the worst time.

❌ Without AKM

# Your demo, live, investors watching
await openai.chat(...)
 
Error 429: Rate limit exceeded
Retry after: 60s
 
# Silence.
# Your key is cooling.
# Demo is dead.

✓ With AKM

# Same demo. Different outcome.
await scheduler.withRetry({
  execute: async ({ apiKey, model }) => {
    return callLLM({ apiKey,
                            model })
  }
})
 
✓ Key-1 rate-limited → rotated to Key-2
✓ Response in 320ms
✓ Demo running smoothly.

One key hits limit → everything breaks. Fallback logic scattered everywhere. Retry logic = guesswork. Logs accidentally exposing secrets. You've been there. AKM fixes that entire layer — in one wrapper.

Built For

Anyone who can't afford
a failed request.

Whether you're shipping in 48 hours or in production — AKM keeps your AI layer stable.

🏗️

MVP Builders

Ship fast without worrying about rate limits breaking your demo at 2am.

🏆

Hackathon Teams

Judge demos never fail. Multiple keys, multiple providers — fully covered.

🎓

Students & Researchers

Free tier keys spread intelligently. Never lose an experiment to a 429.

🚀

Indie Hackers

Multi-provider setups managed automatically. Focus on the product, not the plumbing.

How It Works

One wrapper.
Everything handled.

You write only the generation logic. AKM handles key selection, retries, cooldowns, failover, and health tracking automatically.

01 🎯

Pick the Best Key

Greedy LRU selection picks the least-recently-used healthy key. Rate-limited keys cool in a min-heap and return to pool automatically.

02 🔄

Auto-Rotate on 429

Hit a rate limit? AKM marks that key, rotates to the next healthy one, and retries — all inside withRetry(). You never see the failure.

03 🛣️

Provider Failover

If an entire provider route breaks (404, 403, UPSTREAM_ERROR), AKM cascades to the next configured provider/model group automatically.

04 🧠

Route Memory

After a successful fallback, AKM remembers that route and prefers it on future calls. Smarter with every request.

05 🔒

Secret Redaction

API keys wrapped in SecretString. console.log, JSON, inspect — all show [REDACTED]. Zero leaks.

06 📂

State Persistence

Non-secret state (key IDs, cooldowns, health) persists across restarts via FileStateAdapter. Raw keys never touch disk.

Mental Model

Two layers.
Total control.

AKM thinks in two levels — providers and their keys. Each level has independent logic.

Provider / Model Level

route failover

Each provider + model pair is a route. AKM tries routes in order, remembers which one works, and falls back if one breaks. You can configure OpenRouter, Google, Vercel AI Gateway, or any custom endpoint side-by-side.

openrouter / gemma-4

google / gemini-2.5-flash

vercel / claude-sonnet

Key Level (per provider)

LRU + cooldown heap

Each provider has multiple keys. AKM picks the least-recently-used healthy key, tracks health scores, and moves rate-limited keys to a cooldown heap sorted by reset time. Rate limits on one key never block another.

or-a7f3 · healthy · 0ms ago

or-k2m9 · cooling · 38s left

or-q4x8 · healthy · 12s ago

✨

Auto-Pick Mode

zero config selection

Don't specify any provider or model. AKM tries all configured groups in order, cascading on failure. The simplest way — and the most resilient.

const result =
                                await scheduler.withRetry({
                            
  // No provider. No model. Just the
                                    logic.
  execute: async ({ apiKey, provider, model }) => {
    return callLLM({
                                apiKey, provider, model, prompt })
  }
})

Features

Everything you'd have
to build yourself.

93 tests. Battle-tested. Every edge case covered.

🔁

Rate-Limit Aware Cooldowns

Detects 429, "rate limit", "quota", "exhausted" — marks key, waits for cooldown, retries automatically. Uses a min-heap for O(log n) scheduling.

auto-retry

🛡️

SecretString Redaction

Every API key is wrapped in SecretString. It redacts itself in console.log, JSON.stringify, String(), and util.inspect. Only .value() reveals it.

security

⚡

Provider Cascade

Route fails with UPSTREAM_ERROR, 404, or blacklist? AKM automatically tries the next configured provider/model group. No manual fallback logic needed.

failover

🧠

Health Score Tie-Breaking

Keys degrade on rate limits and recover on success. When multiple keys are equal in LRU, health score breaks the tie — always picks the most reliable key.

intelligence

🌊

Stream Support

Use withStreamRetry() for SSE/streaming. Retries only on startup failures — never mid-stream. Keys marked healthy once the stream begins.

streaming

🚫

AbortSignal Support

Pass an AbortSignal to cancel at any point — before acquire, during execute, or while waiting for cooldown. Throws RetryAbortedError.

control

💾

File State Persistence

Cooldowns and health survive restarts via FileStateAdapter. Atomic write (temp + rename). Only non-secret metadata stored — never raw keys.

persistence

🔏

HMAC Key Identity

Enable identity checks so AKM detects swapped environment variables after restart. Stores only an HMAC fingerprint — never the key or secret.

optional

Code

Three lines of setup.
One wrapper. Done.

Works with any AI SDK — LangChain, Vercel AI, raw fetch, Anthropic SDK, anything.

import { KeyScheduler, FileStateAdapter } from "ai-key-manager"

const scheduler = new KeyScheduler({
  providers: [
    {
      name: "openrouter",
      model: "google/gemma-4-26b-a4b-it:free",
      defaultCooldownMs: 60_000,
      keys: [
        { id: "or-a7f3", value: process.env.OPENROUTER_KEY_A7F3 },
        { id: "or-k2m9", value: process.env.OPENROUTER_KEY_K2M9 },
      ]
    },
    {
      name: "google",
      model: "gemini-2.5-flash",
      defaultCooldownMs: 60_000,
      keys: [
        { id: "g-b1r8", value: process.env.GOOGLE_KEY_B1R8 },
        { id: "g-c5t2", value: process.env.GOOGLE_KEY_C5T2 },
      ]
    }
  ],
  state: new FileStateAdapter(".llm-key-state.json")
})

// Auto-pick: no provider/model needed.
// AKM picks the best available group and cascades on failure.

const result = await scheduler.withRetry({
  execute: async ({ apiKey, provider, model, signal }) => {
    // Use the injected values — they change on fallback
    return generateContent({ apiKey, provider, model, prompt, signal })
  }
})

// Targeted: start with a specific route, auto-fallback if it breaks

const result = await scheduler.withRetry({
  provider: "openrouter",
  model: "google/gemma-4-26b-a4b-it:free",

  // Optional explicit fallback list
  fallbacks: [
    { provider: "google", model: "gemini-2.5-flash" },
  ],

  execute: async ({ apiKey, provider, model }) => {
    return callSDK({ apiKey, provider, model, prompt })
  }
})

import { ChatOpenAI } from "@langchain/openai"
import { KeyScheduler } from "ai-key-manager"

export async function askWithLangChain(scheduler, prompt) {
  return scheduler.withRetry({
    execute: async ({ apiKey, provider, model }) => {
      const llm = new ChatOpenAI({
        model,
        apiKey,
        configuration: {
          baseURL: "https://openrouter.ai/api/v1"
        }
      })
      return llm.invoke(prompt)
    }
  })
}

import { generateText } from "ai"
import { createOpenAICompatible } from "@ai-sdk/openai-compatible"
import { KeyScheduler } from "ai-key-manager"

export async function askWithVercelAI(scheduler, prompt) {
  return scheduler.withRetry({
    execute: async ({ apiKey, provider, model }) => {
      const gateway = createOpenAICompatible({
        name: provider,
        apiKey,
        baseURL: "https://ai-gateway.vercel.sh/v1"
      })
      const { text } = await generateText({
        model: gateway(model),
        prompt
      })
      return text
    }
  })
}

Why AKM

vs. rolling your own.

Everything you'd eventually build, tested and shipped.

Feature	AKM ✨	DIY	AI Proxy
LRU key selection	✓ Built-in	✗ You write it	~ Maybe
Rate-limit cooldown heap	✓ Min-heap, O(log n)	✗ setTimeout hacks	~ Basic
Provider failover cascade	✓ Automatic	✗ Nested try/catch	~ Varies
Secret redaction in logs	✓ SecretString	✗ Oops	✗ Rarely
State persistence across restarts	✓ FileStateAdapter	✗ Cold start every time	~ Depends
Zero network calls / telemetry	✓ 100% local	✓ If you build right	✗ Your keys travel
Works with any SDK	✓ SDK-agnostic	~ Per SDK	✗ Locked in
93 battle tests	✓ Shipped	✗ Who has time	~ Unknown

Your demo didn't failbecause of the model. It failed because your API keygot rate-limited.

It always happensat the worst time.

Anyone who can't afforda failed request.