35 comments

  • pacoWebConsult 12 hours ago
    Why would you bloat the (already crowded) context window with 27 tools instead of the 2 simplest ones: Save Memory & Search Memory? Or even just search, handling the save process through a listener on a directory of markdown memory files that Claude Code can natively edit?
    • athrowaway3z 10 hours ago
      MCP's are toys for point-and-click devs that no self-respecting dev has any business using.

      Case in point; I'm mostly a Claude user, which has decent background process / BashOutput support to get a long-running process's stdout.

      I was using codex just now, and its processes support is ass.

      So I asked it, give me 5 options using cli tools to implement process support. After 3 min back and forth, I got this: https://github.com/offline-ant/shellagent-tools/blob/main/ba...

      Add single line in AGENTS.md.

      > the `background` tool allows running programs in the background. Calling `background` outputs the help.

      Now I can go "background ./server; try thing. investigate" and it has access to the stdout.

      Stop pre-trashing your context with MCPs people.

    • fishmicrowaver 12 hours ago
      People are just ricing out AI like they rice out Linux, nvim or any other thing. It's pretty simple to get results from the tech. Use the CLI and know what you're doing.
      • j45 3 hours ago
        Fair points, share how you are learning - seems to be more than one way to the same result.
    • elfenleid 12 hours ago
      That's a great point, the reality is that context, at least from personal experience, is brittle and over time will start to lose precision. This is a always there, persistent way for claude to access "memories". I've been running with it for about a week now and did not feel that the context would get bloated.
      • j45 3 hours ago
        I do notice building up context makes a difference. Having the context modular helps too.
    • fennecbutt 5 hours ago
      Yes, exactly this. But idiot VC funding (which YC is also somewhat engaged in I imagine) cries for MCP. Hence multi billion valuations and many million dollar salaries and bonuses being thrown around.

      It's ridiculous and ties into the overall state of the world tbh. Pretty much given up hoping that we'll become an enlightened species.

      So let's enjoy our stupid MCP and stupid disposable plastic because I don't see any way that we aren't gonna cook ourselves to extinction on this planet. :)

    • jrecyclebin 10 hours ago
      While I totally agree with you, I also can see a world where we just throw a ton of calls in the MCP and then wrap it in a subagent that has a short description listing every verb it has access to.
      • elfenleid 3 hours ago
        Absolutely. Remember these are just tools, how each one of us uses them it's a diffrent story. A lot can be leveraged as well by adding a couple of lines to CLAUDE.md on how he should use this memory solution, or not, it's totally up to anyone. You can also have a subagent that is responsible for project management that is in charge of managing memory or having a coordinator. Again a lot of testing needs to be done :)
  • dfee 1 hour ago
    The code is written by Claude, the README is written by Claude, this HN post is written by Claude.

    My God, there’s no signal. It’s all noise.

  • Merad 1 hour ago
    I built a memory tool about 6 months while playing with MCP, it was based on a SQLite db. My experience then was that Claude wasn't very good at using the tools. Even with instructions to be proactive about searching memory and saving new memories it would rarely do so. Once you did press it to be sure to save memories it would go overboard, basically saving every message in the conversation as a memory. Are seeing more success in getting natural and seamless usage of the memory tools?

    IIRC at the time I was testing with Sonnet 3.7, I haven't tried it on the newer models.

    Repo here: https://github.com/mbcrawfo/KnowledgeBaseServer

  • bryanhogan 13 hours ago
    Why would you not use context files in form of .md? E.g. how the SpecKit project does it.
    • elfenleid 12 hours ago
      I still do, but having this allows for strategies like memory decay for older information. It also allows for much more structured searching capabilities, instead of opening file which are less structured.

      .md files work great for small projects. But they hit limits:

      1. Size - 100KB context.md won't fit in the window 2. No search - Claude reads the whole file every time 3. Manual - You decide what to save, not Claude 4. Static - Doesn't evolve or learn

      Recall fixes this: - Semantic search finds relevant memories only - Auto-captures context during conversations - Handles 10k+ memories, retrieves top 5 - Works across multiple projects

      Real example: I have 2000 memories. That's 200KB in .md form. Recall retrieves 5 relevant ones = 2KB.

      And of course, there's always the option to use both .md for docs, Recall for dynamic learning.

      Does that help?

      • bryanhogan 12 hours ago
        I'm not sure. You don't use a single context.md file, you use multiple and add them when relevant in context. AIs adjust these as you need, so they do "evolve". So what you try to achieve is already solved.

        These two videos on using Claude well explain what I mean:

        1. Claude Code best practices: https://youtu.be/gv0WHhKelSE

        2. Claude Code with Playwright MCP and subagents: https://youtu.be/xOO8Wt_i72s

        • elfenleid 12 hours ago
          Yeah that's a solid workflow and honestly simpler than what I built - I think Recall makes sense when you hit the scale where managing multiple .md files becomes tedious (like 50+ conversations across 10 projects), but you're right that for most people your approach works great and is way less complex.
      • BHSPitMonkey 5 hours ago
        Can't you get recency just from git blame? Editors already show you each source line's last-touch age, even in READMEs, and even though this can get obfuscated (by reformatters, file moves, etc.) it's still a decent indicator.
    • steveklabnik 12 hours ago
      Memory features are useful for the same reason that a human would use a database instead of a large .md file: it's more efficient to query for something and get exactly what you want than it is to read through a large, ultimately less structured document.

      That said, Claude now has a native memory feature as of the 2.0 release recently: https://docs.claude.com/en/docs/claude-code/memory so the parent's tool may be too late, unless it offers some kind of advantage over that. I don't know how to make that comparison, personally.

      • elfenleid 2 hours ago
        The other point here, I wanted something more in line with LLMs natural language, something that can be queried more effeciently buy just using normal language, almost like the way we think normally, we first have a though and then we go through our memory archive.
      • ebcode 11 hours ago
        Claude’s memory function adds a note to the file(s) that it reads on startup. Whereas this tool pulls from a database of memories on-demand.
        • steveklabnik 9 hours ago
          So hilariously, I hadn't actually read those docs yet, I just knew they added the feature. It seems like the docs may not be up to date, as when I read them in response to your reply here, I was like "wait, I thought it was more sophisticated than that!"

          The answer seems to be both yes and no: see their announcement on youtube yesterday: https://www.youtube.com/watch?v=Yct0MvNtdfU&t=181s

          It's still ultimately file-based, but it can create non-Claude.md files in a directory it treats more specially. So it's less sophisticated than I expected, but more sophisticated than the previous "add this to claude.md" feature they've had for a while.

          Thanks for the nudge to take the time to actually dig into the details :)

          • koakuma-chan 6 hours ago
            It creates them on its own?
            • steveklabnik 5 hours ago
              Okay so, now that I've had time after work to play with it... it doesn't work like in the video! The video shows /memories, but it's /memory, and when I run the command, it seems to be listing out the various CLAUDE.md files, and just gives you a convenient way to edit them.

              I wonder if the feature got cut for scope, if I'm not in some sort of beta of a better feature, or what.

              How disappointing!

              • elfenleid 2 hours ago
                Let me have a look, thanks for reporting that.

                This is very much in development and I keep adding features to it. Any suggestions let me know.

                The way I use it, I add instructions to CLAUDE.md on how I want him to use recall, and when.

                ## Using Recall Memory Efficiently

                *IMPORTANT: Be selective with memory storage to avoid context bloat.*

                ### When to Store Memories - Store HIGH-LEVEL decisions, not implementation details - Store PROJECT PREFERENCES (coding style, architecture patterns, tech stack) - Store CRITICAL CONSTRAINTS (API limits, business rules, security requirements) - Store LEARNED PATTERNS from bugs/solutions

                ### When NOT to Store - Don't store code snippets (put those in files) - Don't store obvious facts or general knowledge - Don't store temporary context (only current session needs) - Don't duplicate what's already in documentation

                ### Memory Best Practices - Keep memories CONCISE (1-2 sentences ideal) - Use TAGS for easy filtering - Mark truly critical things with importance 8-10 - Let old, less relevant memories decay naturally

                ### Examples GOOD: "API rate limit is 1000 req/min, prefer caching for frequently accessed data" BAD: "Here's the entire implementation of our caching layer: [50 lines of code]"

                GOOD: "Team prefers Tailwind CSS over styled-components for consistency" BAD: "Tailwind is a utility-first CSS framework that..."

                *Remember: Recall is for HIGH-SIGNAL context, not a code repository.*

              • elfenleid 2 hours ago
                Hey! You're mixing up two different things:

                1. Claude Desktop's built-in `/memory` command (what you tried) - just lists CLAUDE.md files 2. Recall MCP server (this project) - completely separate tool you need to install

                Recall doesn't work through slash commands. It's an MCP server that needs setup:

                1. Install: npm install -g @joseairosa/recall 2. Add to claude_desktop_config.json 3. Restart Claude Desktop 4. Then Claude can use memory tools automatically in conversation

                Quick test after setup: "Remember: I prefer TypeScript" - Claude will store it in Redis.

                • steveklabnik 22 minutes ago
                  Sorry for the confusion, I was purely commenting about Claude. I have not tried your MCP.
      • atonse 11 hours ago
        It's had native memory in the form of per-directory CLAUDE.md files for a while though. Not just 2.0
  • datadrivenangel 11 hours ago
    How does Claude know when to try and remember?

    Often memory works too well and crowds out new things, so how are you balancing that?

    • datadrivenangel 7 hours ago
      Some of the other similar tools just arbitrarily pick the 3,5 or 10 most relevant memory results, which seems awkward.
  • DenisM 8 hours ago
    I think everyone concluded at this point that we need to improve models memory capabilities, but different people take different approach.

    My experience is that ChatGPT can engage in a very thoughtful conversations but if I ask for a summary it makes something very generic, useful to an outsider, but it does not catch salient points which were the most important outcomes.

    Did you notice the same problem?

    • elfenleid 3 hours ago
      I think that's a great point really. There is not 1 size fits all, different people will have different strategies that better suit their workflow.
  • btbuildem 3 hours ago
    A great hack/shortcut for solving this "memory" problem is to have a rolling RAG KB. You don't fill up the context, and you can use a re-ranking model to further improve accuracy.

    Aside from all that, using npm for distribution makes this a total non-starter for me.

    • elfenleid 3 hours ago
      Totally, point taken. I'll dig a bit deeper into that.
  • iambateman 12 hours ago
    I’ve started asking Claude to write tutorials that live in a _docs folder alongside my code.

    Then it can reference those tutorials for specific things.

    Interested in giving this a shot but it feels like a lot of infrastructure.

    • zzzeek 11 hours ago
      Yeah this is what I do, you want the knowledge in md files , but currently you don't want to stuff up the context with everything you know every time. I may be wrong here but my impression is the way that "context" is special and very limited in size vs "things the LLM is trained on" is still an unsolved problem getting AI to act like an "assistant" , AFAICT.
  • ffsm8 8 hours ago
    The memory feature I'd like to have would need built-in support from anthropic

    It'd be essentially

    1. Language server support for lookups & keeping track of the code

    2. Being able to "pin" memories to functions, classes, properties etc via the language server support/providing this context whenever changes are made in this function/class/properties etc, but not kept, so all following changes outside of that will no longer include this context (basically, changes that touch code with which memories will be done by agents with additional context, and only the results are synced back, not the way to achieve it)

    3. Provide a ide integration for this context so you can easily keep track of what's available just by moving the cursor to the point the memory is pinned at

    Sadly impossible to achieve via MCP.

  • ra 5 hours ago
    I built something similar but now use Codex instead.

    Using the VS Code extension you get dynamic context management which works really well.

    They also have a memory system built using reflexion (someone please correct me if I'm wrong) so proper evals are derived from lessons before storing.

  • daxfohl 11 hours ago
    I'm surprised Anthropic doesn't offer something like this server-side, with an API to control it. Seems like it'd be a lot more efficient than having client manually reworking the context and uploading the whole thing.
    • ryan29 11 hours ago
      Who should own the context?

      Imagine having 20 years of context / memories and relying on them. Wouldn't you want to own that? I can't imagine pay-per-query for my real memories and I think that allowing that for AI assisted memory is a mistake. A person's lifetime context will be irreplaceable if high quality interfaces / tools let us find and load context from any conversation / session we've ever had with an LLM.

      On the flip side of that, something like a software project should own the context of every conversation / session used during development, right? Ideally, both parties get a copy of the context. I get a copy for my personal "lifetime context" and the project or business gets a copy for the project. However, I can't imagine businesses agreeing to that.

      If LLMs become a useful tool for assisting memory recall there's going to be fighting over who owns the context / memories and I worry that normal people will lose out to businesses. Imagine changing jobs and they wipe a bunch of your memory before you leave.

      We may even see LLM context ownership rules in employment agreements. It'll be the future version of a non-compete.

      • daxfohl 3 hours ago
        Whoever is paying for it? If you've got personal stuff you'd keep it in your own account (or maintain it independently), separate from your work account.
    • _joel 10 hours ago
      They do, new feature, not available in claude code but via API headers. https://docs.claude.com/en/docs/agents-and-tools/tool-use/me...
      • daxfohl 6 hours ago
        That's still client side though. Seems like if they made it server-side it'd require fewer round trips.
  • DenisM 10 hours ago
    Imho you would have an easier sell if you separate knowledge into tiers: 1)overall design 2) coding standards 3) reasoning that lead to design 4) components and their individual structure 5) your current issue 6) etc

    Your project becomes progressively more valuable the further you go down the list. The overall design should be documented and curated to onboard new hires. Documenting current issues is a waste of time compared to capturing live discussion, so Recall is super useful here.

  • thund 3 hours ago
    Seems overkill when you can simply tell agents to do that automatically
  • tarun_anand 12 hours ago
    Claude introduced it's own memories api.. have you had a look?
    • elfenleid 12 hours ago
      Yes I did, I worked on this a while back, before it was availabale I believe. I'll have another check. Thanks for the heads up
  • the_arun 12 hours ago
    I wish there was a way to send compressed context to LLMs instead of plain text. This will reduce token size, performance & operational costs.
    • joshstrange 11 hours ago
      > This will reduce token size, performance & operational costs.

      How? The models aren't trained on compressed text tokens nor could they be if I understand it correctly. The models would have to uncompress before running the raw text through the model.

      • the_arun 11 hours ago
        That is what I am looking for. a) LLMs are trained using compressed text tokens and b) use compressed prompts. Don't know how..but that is what I was hoping for.
        • deepdarkforest 10 hours ago
          The whole point of embeddings and tokens are that they are a compressed version of text, a lower dimensionality. now, how low depends on performance, lower amount of vectors=more lossy (usually). https://huggingface.co/spaces/mteb/leaderboard

          You can train your own with very very compressed, i mean you could even go down to each token=just 2 float numbers. It will train, but it will be terrible, because it can essentially only capture distance.

          Prompting a good LLM to summarize the context is probably funnily enough the best way of actually "compressing" context

  • jumski 10 hours ago
    Memory is hard! I'm very curious how the version history approach is working for you? Have you considered an age when retrieving? Is model supposed to manage the version history on its own? Is the semantic search used to help with that?
  • asdev 12 hours ago
    The problem is you need to tell prompt Claude to "Store" or "Remember", if you don't it will never call the MCP server. Ideally, Claude would have some mechanism to store memories without any explicit prompting but I don't think that's currently possible today.
    • elfenleid 1 hour ago
      I've been experimenting with that in the last couple of days. I added to CLAUDE.md a directive on how and when to use recall and he's autoamtically calling the tool for store and fetch
  • warthog 12 hours ago
    imo it would be better to carry the whole memory outside of the inference time where you could use an LLM as a judge to track the output of the chat and the prompts submitted

    it would sort of work like grammarly itself and you can use it to metaprompt

    i find all the memory tooling, even native ones on claude and chatgpt to be too intrusive

    • namanyayg 11 hours ago
      I've been building exactly this. Currently a beta feature in my existing product. Can I reach out to you for your feedback on metaprompting/grammarly aspect of it?
    • elfenleid 12 hours ago
      Totally get what you're saying! Having Claude manually call memory tools mid-conversation does feel intrusive, I agree with that, especially since you need to keep saying Yes to the tool access.

      Your approach is actually really interesting, like a background process watching the conversation and deciding what's worth remembering. More passive, less in-your-face.

      I thought about this too. The tradeoff I made:

      Your approach (judge/watcher): - Pro: Zero interruption to conversation flow - Pro: Can use cheaper model for the judge - Con: Claude doesn't know what's in memory when responding - Con: Memory happens after the fact

      Tool-based (current Recall): - Pro: Claude actively uses memory while thinking - Pro: Can retrieve relevant context mid-response - Con: Yeah, it's intrusive sometimes

      Honestly both have merit. You could even do both, background judge for auto-capture, tools when Claude needs to look something up.

      The Grammarly analogy is spot on. Passive monitoring vs active participation.

      Have you built something with the judge pattern? I'd be curious how well it works for deciding what's memorable vs noise.

      Maybe Recall needs a "passive mode" option where it just watches and suggests memories instead of Claude actively storing them. That's a cool idea.

      • westurner 12 hours ago
        Is this the/a agent model routing problem? Which agent or subagent has context precedence?

        jj autocommits when the working copy changes, and you can manually stage against @-: https://news.ycombinator.com/item?id=44644820

        OpenCog differentiates between Experiential and Episodic memory; and various processes rewrite a hypergraph stored in RAM in AtomSpace. I don't remember how the STM/LTM limit is handled in OpenCog.

        So the MRU/MFU knapsack problem and more predictable primacy/recency bias because context length limits and context compaction?

        • westurner 11 hours ago
          OpenCogPrime:EconomicAttentionAllocation: https://wiki.opencog.org/w/OpenCogPrime:EconomicAttentionAll... :

          > Economic Attention Allocation (ECAN) was an OpenCog subsystem intended to control attentional focus during reasoning. The idea was to allocate attention as a scarce resource (thus, "economic") which would then be used to "fund" some specific train of thought. This system is no longer maintained; it is one of the OpenCog Fossils.

          (Smart contracts require funds to execute (redundantly and with consensus), and there there are scarce resources).

          Now there's ProxyNode and there are StorageNode implementations, but Agent is not yet reimplemented in OpenCog?

          ProxyNode implementers: ReadThruProxy, WriteThruProxy, SequentialReadProxy, ReadWriteProxy, CachingProxy

          StorageNode > Implementations: https://wiki.opencog.org/w/StorageNode#Implementations

  • nibab 3 hours ago
    Isn’t that what agents.md or Claude.md is for?
    • elfenleid 1 hour ago
      Absolutely! But this is not a replacement of those files, this is a different (better?) way to navigate through those learnings instead of having to read whole files.
  • uncletoxa 5 hours ago
    Do you think any vector db would work better than redis?
    • elfenleid 3 hours ago
      I think that's a great point. I will experiment with different approaches. I started with redis mostly because it's something I have experience with and was a quick setup win, but having different strategies I think it could make sense.
  • h1fra 12 hours ago
    I'm not super familiar with context and "memory", but adding context manually or via memory doesn't end up consuming context length either way?
    • elfenleid 12 hours ago
      Yeah it still uses context but way more efficiently, instead of injecting a 50KB context.md every time, Recall searches 10k memories and only injects the top 5 relevant ones (maybe 2KB), so you can store way more total knowledge.
  • aktuel 9 hours ago
    Wouldn't the cache over time also be filled up with irrelevant and redundant information?
  • alecco 12 hours ago
    Why not just ask CC to write a prompt or Markdown file to re-start the conversation in a new chat?
    • elfenleid 12 hours ago
      Yeah people do that but it doesn't scale, after a while your "restart prompt" is 50KB and won't fit, plus you're stuck copying stuff manually instead of just asking "what did we say about Redis" and getting the relevant bits automatically.
  • gmerc 11 hours ago
    Every single persistent memory feature is a persistence vector for prompt injection.
  • jcmontx 13 hours ago
    If this delivers can be 100% game changer, I will try it out and give some feedback
    • elfenleid 12 hours ago
      I've been using it for a while now, personally. I've found that I have less issues with context, I can easily recall (pun intended) after a context compact, etc.
  • mannyv 12 hours ago
    This is excellent for those of us who are building local AIs.
    • elfenleid 12 hours ago
      That's a great point! And also works really well for shared context between claude instances, for example, we use that for our business model in the company, all business rules and model is stored as memories in a central redis that the mcp connects to. The way that memories are stored are specific to a folder or global (similar to CLAUDE.md home directiory), but with this approach you can have an external redis where multiple claudes read and write into as a shared almost hive like memory.
  • otterley 12 hours ago
    Does it work with Valkey as well?
    • elfenleid 12 hours ago
      Yep! Valkey should work fine.

      Recall just uses basic Redis commands - HSET, SADD, ZADD, etc. Nothing fancy.

      Valkey is Redis-compatible so all those commands work the same.

      I haven't tested it personally but there's no reason it wouldn't work. The Redis client library (ioredis) should connect to Valkey without issues.

      If you try it and hit any problems let me know! Would be good to officially support it.

  • moomoo11 5 hours ago
    Throwing it out there, not sure how well it'd work but what about using OpenSearch + vector?

    AI can already form the query DSL quite nicely especially if it knows the indexes.

    I set up AI powered search this way, and it works really well with any open ended questions.

  • bananapub 12 hours ago
    how did you benchmark this against much less convoluted solutions, like "a text file"?

    how much better was this to justify all that extra complexity?

  • iamleppert 12 hours ago
    I'm not seeing how this is any different than a standard vector database MCP tool. It's not like Claude is going to know about any of the things you told it to "remember" unless you explicitly tell it to use its memory tool like shown in the demo, to remember something you've stored.
  • jMyles 12 hours ago
    Heh, I'm building the same thing this week (albeit with postgres rather than redis). I bet like 15% of the people here are.
    • _joel 10 hours ago
      Yep, me too. I've taken the reference memory mcp that anthropic release and bolted on pgsql, but with a bunch of other features that are specific to the app I'm building. Like user segmentation/isolation with RLS (app is multiuser) and some other entity relationship tracking things.
  • aiisthefiture 8 hours ago
    With redis? Why?
    • elfenleid 3 hours ago
      No particular specific reason. I was working with another project that also had redis and decided to start with it. It can be changed to other tool, which one would you recommend?