Key takeaways from two Cursor field engineering sessions
The right framing
Don't optimize individual prompts — optimize task completion. Token efficiency is the outcome of solving tasks quickly and correctly, not of counting tokens per prompt.
Model sets ~75% of output quality. That's fixed. The remaining 25% is yours to control — and it all comes down to context.
The real cost isn't the plan — it's the correction loop. Skipping planning leads to blind edits, bugs to fix, and multiple back-and-forths. That wave of fix cycles costs far more tokens than a good upfront plan.
Recommended workflow
Model selection guide
| Use case | Model to reach for |
|---|---|
| Planning, architecture, complex reasoning | Frontier / thinking model — Sonnet, GPT, Opus |
| Executing a clear, detailed plan | Composer 1.5 — fast and cost-effective middle ground |
| Docs, research, codebase exploration, small features | Composer 1 — cheapest, still very capable for routine tasks |
| Routing by task complexity automatically | Auto mode — defaults to Composer 2, routes up when needed |
| Sub-agents running isolated parallel work | Composer 1, or inherit from parent for simple output tasks |
Context management
Tag exact resources with @
Don't let the agent search the entire repo. Tag the specific file, doc, or past chat. It short-circuits semantic search — fewer tokens, faster, more accurate.
Keep context below 60–65%
Exceeding the limit triggers compaction — prior context gets compressed and quality degrades, but you still pay for all of it. Monitor the indicator on every chat.
Start a new chat per task
Every prior message in a thread inflates all future requests. When switching tasks or features, open a fresh agent — don't continue an unrelated thread.
Summarize manually, not automatically
Auto-summaries give equal weight to everything. Instead: "Summarize focusing on X — save as markdown." Paste the result into a new chat. Context drops dramatically while preserving what matters.
Rules, skills & sub-agents
Rules
Static context added to every request. Keep them short and essential — every rule adds tokens to every single call. Don't replicate a linter or formatter here.
Scope tip: nest rules inside /frontend or /backend dirs so they only apply to those files.
Skills
Only the name + description enter the context window upfront. The full skill body is loaded on demand when relevant. You can have many skills without bloating every prompt — this is called progressive disclosure.
The agent can update a failing skill itself. MCPs now work identically — loaded as skills, not dumped into context at startup.
Sub-agents
Best when the how doesn't matter — only the output does. Each sub-agent has its own isolated context window; the parent only receives what you define in the output format.
Don't use them when the task is complex and you'll likely need to give feedback mid-run — you can't follow up with a sub-agent once it's started.
disableModelInvocation: true — meaning you invoke it manually rather than having the agent invoke it automatically. You can migrate existing commands to skills with /migrate to skills. To browse production examples, go to Settings → Plugins → Marketplace → "Cursor Team Kit".
Caching — a hidden cost driver
Switching models mid-conversation breaks the cache. The full thread must be re-sent to the new model. Pick your model at the start and stick with it for the duration of a task.
Don't edit rules or skills mid-conversation. Once read, they're cached in the thread. Changing them forces a re-read and breaks the cache for everything that follows.
Auto mode can silently break the cache by switching models between turns without telling you. For cache-sensitive tasks, pick a model explicitly.
Forking a chat breaks the cache — it's a brand new session. Sometimes it's worth it (cleaner context = better output), but be aware of the tradeoff.
Adding a new message does not break the cache. Prior context is unchanged. A window refresh also doesn't break it — cache is managed server-side by the LLM provider, not in your browser.
Quick reference — do vs don't
Do
Give the agent a specific, scoped task
Use plan mode for anything non-trivial
Edit the plan before pressing Build
Tag the exact file with @ instead of letting the agent search
Start a new chat when switching tasks
Summarize manually when context gets high
Don't
Say "make the app better" on an expensive model
Switch models in the middle of a task
Edit rules or skills mid-conversation
Let context go above 65% without acting
Use a sub-agent for tasks you'll need to iterate on
Replicate linter/formatter logic inside rules