<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Prompt-Caching on Yushi's Blog</title><link>https://blog.yushi91.com/blog/prompt-caching/</link><description>Recent content in Prompt-Caching on Yushi's Blog</description><generator>Hugo</generator><language>en-US</language><copyright>Copyright © 2025, Yushi Cui.</copyright><lastBuildDate>Thu, 23 Apr 2026 10:00:00 +1200</lastBuildDate><atom:link href="https://blog.yushi91.com/blog/prompt-caching/index.xml" rel="self" type="application/rss+xml"/><item><title>Prompt Caching Changed How I Think About AI Agent Costs</title><link>https://blog.yushi91.com/blog/prompt-caching-cost-savings/</link><pubDate>Thu, 23 Apr 2026 10:00:00 +1200</pubDate><guid>https://blog.yushi91.com/blog/prompt-caching-cost-savings/</guid><description>&lt;p>&lt;img src="https://blog.yushi91.com/prompt-caching.webp" alt="Prompt caching" loading="lazy">
&lt;/p>
&lt;h2 id="what-i-thought-i-knew">What I thought I knew&lt;/h2>
&lt;p>Before this week, my mental model of AI pricing was simple. You pay per token. Input tokens cost some amount, output tokens cost more, you add them up. System prompt counts as input. Tool definitions count as input. Every conversation starts from zero and you pay for every token you send.&lt;/p>
&lt;p>Predictable. Also wrong, or at least not the whole picture.&lt;/p>
&lt;p>I was reading the Anthropic docs properly this time and ran into a feature called prompt caching. It changed how I think about agent workflow design.&lt;/p></description></item></channel></rss>