As you might guess, I find AI to be extremely good at some things and actively terrible at others. A lot of the things I read and demos I watch all have to do with reasonable code bases. Our code base is not reasonable. It's over 20 years old, coded by a multitude of different developers with different coding styles, design patterns, etc... What you would expect from a small company that has been slowly growing for decades. I also work in the medical field so slow evolving old code bases are the norm.
One of the reasons the AI fails constantly is that it has no context of the entire code base. It simply can't keep that context in scope for every session. So it actively adds bloat to the system unless it's guided by a skilled developer who already knows the system. I am using memories where I can but it has to regularly read huge chunks of code and that uses a lot of tokens. I regularly hit my limit on a Pro account.
What are your experiences with this? Any tips or tricks to improve output and cost? Any help is most appreciated. Thanks!
I had the same issue, take your time and describe the context. Create a documentation out of the project. I used BMAD for this. It created a specification for me, by checking the code I answered a lot of questions, clarified a lot of things. And once I had this done, I now start each session with: read the documents, prepare yourself for implementation, have a good context so that you don't make any errors.
Afterwards, I do my things, fix, test. At the end I do an update of the documentations: update lessons learned, add new rules, if necessary, update architecture, etc.
With the time, I got enough basis so that most of the fixes happen with only 2-3 iterations, which is good enough for me.
I use Opus.
In my company we have tried using claude for exactly the same task you have. The results were bad. We discovered a few interesting things, but most of the stuff was wrong: we had to dif through the code base the old way to confidently accept/reject what Claude was telling us. So we could have save a lot of time and money simply by doing it ourselves. As an upside we also learned about the codebase so now people rely on us for that (that feels good too)
But to the actual question: A lot of people's gut instinct on how to solve this doesn't work. They start going down the road of "well, if I teach the AI about my legacy codebase, it will be smarter, and therefore more efficient." But all you wind up doing is consuming all of your available context, with irrelevancies, and your agent gets dumber and costs more.
What you actually need to do is tackle it the same way a human would: Break it down into smaller problems, where the agent is able to keep the "entire problem" within context at once. Meaning 256K or less (file lengths + prompt + outputs). Then of course use a scratchpad file that holds notes, file references, constraints, and line numbers. That's your compaction protection. Restart the chat with the same scratchpad when you move between minor areas.
Context is your primary-limited resource. Fill it only with what should absolutely need to be there, and nothing else at all.
Usually it's an iterative process; if done correctly you could end up with a much better codebase. Good luck!
Then for execution: Use plan mode. Let it always write a plan first, check it, correct it and only then allow it to implement it.
Try to break big tasks down in small substeps. As small as possible. Let it implement changes iteratively. Let it do a lot of local git commits. Both Codex and Claude Code use this as documentation as well.
Basically, treat it like a junior developer working under you.
Make sure appropriate tests are written for every code change.