Flex Inference: 50% Off LLM Calls on Gemini, OpenAI, and Bedrock
Every major AI provider now offers half-price inference if you can tolerate a few extra seconds of latency. One parameter change. Same API. Here's how it works and why.
Blog
Most of the site lives in the blog archive.
Every major AI provider now offers half-price inference if you can tolerate a few extra seconds of latency. One parameter change. Same API. Here's how it works and why.
Using Gemini's bounding box detection to get precise measurements when converting a screenshot to code. Plus how prompt caching and flex inference make the multi-pass approach surprisingly cheap.
Universal Commerce Protocol lets AI agents buy things. Here's how developers can monetize it and what store owners need to know.
How to configure SSH so Claude Code can run commands on remote servers
Start Here
If you want the overall picture, start with the archive.
Tools
Small utilities and extensions.
Paste a YouTube URL. AI watches the video and writes a scroll-synced text breakdown.
Calculator for checking API costs across Claude, GPT, and Gemini, including prompt caching.
Clamp() calculator plus the tutorial explaining the math behind it.