Updates to GitHub Copilot interaction data usage policy

(github.blog)

132 points | by prefork 2 hours ago

28 comments

  • stefankuehnel 1 hour ago
    If you scroll down to "Allow GitHub to use my data for AI model training" in GitHub settings, you can enable or disable it. However, what really gets me is how they pitch it like it’s some kind of user-facing feature:

    Enabled = You will have access to the feature

    Disabled = You won't have access to the feature

    As if handing over your data for free is a perk. Kinda hilarious.

    • Rapzid 8 minutes ago
      Is that not some stock feature-flag verbiage?
    • mirekrusin 36 minutes ago
      The feature is that your coding style will be in next models!
      • rzmmm 20 minutes ago
        I wish my GPL license would transit along with my code.
    • a1o 1 hour ago
      I went to check on this and I have everything copilot related disabled and in the two bars that measure usage my Copilot Chat usage was somehow in 2%, how is this possible?

      Before anyone comes to me to sell me on AI, this is on my personal account, I have and use it in my business account (but it is a completely different user account), I just make it a point to not use it in my personal time so I can keep my skills sharp.

      • saratogacx 25 minutes ago
        If you're taking about the quota bar. That is only measuring your premium request usage (models with a #.#x multiplier next to the name). If you only use the free models and code completion you won't actually consume any "usage". If you use AI code review that consumes a single request (now). Same with the Github Copilot web chat, if you use a free model, it doesn't count, if you use a premium model you get charged the usage cost.
      • hakunin 1 hour ago
        Does Github count it as copilot chat usage when you use AI search form on their website, I wonder?
    • petcat 1 hour ago
      I guess the "perk" is that maybe their models get retrained on your data making them slightly more useful to you (and everyone else) in the future? idk
    • 7bit 52 minutes ago
      It's worded that way to create FOMO in the hopes people keep it enabled.

      Dark pattern and dick move.

  • mentalgear 1 hour ago
    > On April 24 we'll start using GitHub Copilot interaction data for AI model training unless you opt out. Review this update and manage your preferences in your GitHub account settings.

    Now "Allow GitHub to use my data for AI model training" is enabled by default.

    Turn it off here: https://github.com/settings/copilot/features

    Do they have this set on business accounts also by default? If so, this is really shady.

    • lenova 1 hour ago
      Ugh, can't believe they made this opt-in by default, and didn't even post the direct URLs to disable in their blog post.

      To add on to your (already helpful!) instructions:

      - Go to https://github.com/settings/copilot/features - Go to the "Privacy" section - Find: "Allow GitHub to use my data for AI model training" - Set to disabled

      • inetknght 45 minutes ago
        > can't believe they made this opt-in by default

        You can't believe Microslop is force-feeding people Copilot in yet another way?

        > and didn't even post the direct URLs to disable in their blog post

        You can't believe Microshaft didn't tell you how to not get shafted?

    • parkersweb 29 minutes ago
      Yes - not impressed at all that this is opt-in default for business users. We have a policy in place with clients that code we write for them won’t be used in AI training - so expecting us to opt out isn’t an acceptable approach for a business relationship where the expectation is security and privacy.
    • g947o 1 hour ago
      https://github.com/orgs/community/discussions/188488

      > Why are you only using data from individuals while excluding businesses and enterprises?

      > Our agreements with Business and Enterprise customers prohibit using their Copilot interaction data for model training, and we honor those commitments. Individual users on Free, Pro, and Pro+ plans have control over their data and can opt out at any time.

      • dormento 1 hour ago
        Aka "they have lawyers and you usually don't, so we think we can get away with it."
      • themafia 1 hour ago
        > and we honor those commitments.

        Ah, so when the inevitable "bug" appears, and we all learn that you've completely failed to honor anything, what will be your "commitment" then? An apology and a few free months?

        Time to start pushing for a self hosted git service again.

    • martinwoodward 1 hour ago
      Just confirming, we do not use Copilot interaction data for model training of Copilot Business or Enterprise customers.
    • archb 1 hour ago
      Interestingly, it is disabled by default for me.
      • crashingintoyou 1 hour ago
        Reading the github blog post "If you previously opted out of the setting allowing GitHub to collect this data for product improvements, your preference has been retained—your choice is preserved, and your data will not be used for training unless you opt in."
      • gpm 1 hour ago
        Me too, which is making me wonder if they're planning on silently flipping this setting on April 24th (making it impossible to opt out in advance).
        • spiderfarmer 1 hour ago
          Is it because I'm in the EU?
          • gpm 20 minutes ago
            I'm in Canada, so not only the EU at least.
          • paularmstrong 1 hour ago
            I'm in the US and it's off for me. I believe I've previously opted out of everything copilot related in the past if there was anything.
    • DavidSJ 1 hour ago
      > Do they have this set on business accounts also by default? If so, this is really shady.

      Looks like not, but would it actually have been shadier, or are we just used to individual users being fucked over?

      • hrmtst93837 1 hour ago
        If they turned it on for business orgs, that would blow up fast. The line between "helpful telemetry" and "silent corporate data mining" gets blurry once your team's repo is feeding the next Copilot.

        People are weirdly willing to shrug when it's some solo coder getting fleeced instead of a company with lawyers and procurement people in the room. If an account tier is doing all the moral cleanup, the policy is bad.

  • pred_ 47 minutes ago
    What is the legal basis of this in the EU? Ignoring the fact they could end up stealing IP, it seems like the collected information could easily contain PII, and consent would have to be

    > freely given, specific, informed and unambiguous. In order to obtain freely given consent, it must be given on a voluntary basis.

  • section_me 1 hour ago
    If I'm paying, which I am, I want to have to opt-in, not opt-out, Mario Rodriguez / @mariorod needs to give his head a wobble.

    What on earth are they thinking...

    • sph 1 hour ago
      > What on earth are they thinking...

      @mariorod's public README says one of his focuses is "shaping narratives and changing \"How we Work\"", so there you go.

      • fmjrey 1 hour ago
        Translation: more alignment with Microsoft practices
      • section_me 1 hour ago
        "shaping narratives", sounds like they follow the methodologies of a current president
        • okanat 1 hour ago
          It looks like the literal translation of "manipulation" to Linkedin-speak.
    • wenldev 1 hour ago
      [dead]
  • diath 1 hour ago
    > This approach aligns with established industry practices

    "others are doing it too so it's ok"

    • theshrike79 1 hour ago
      Ackshually Anthropic is opt-in AND they give you discounts if you enable it
      • nodar86 7 minutes ago
        What kind of discounts? I have never heard of this
      • cma 32 minutes ago
        Anthropic puts up random prompts defaulting to enabled to trick you into accidentally enabling.
  • sph 1 hour ago
    Thanks to Github and the AI apocalypse, all my software is now stored on a private git repository on my server.

    Why would I even spend time choosing a copyleft license if any bot will use my code as training data to be used in commercial applications? I'm not planning on creating any more opensource code, and what projects of mine still have users will be left on GH for posterity.

    If you're still serious about opensource, time to move to Codeberg.

    • midasz 21 minutes ago
      I'm in my happy space selfhosting forgejo and having a runner on my own hardware
    • thesmart 50 minutes ago
      Yeah, I'm guessing that probably because in their TOS you grant them some license work-around for running the service, which can mean anything.
  • OtherShrezzing 1 hour ago
    It’s not clear to me how GitHub would enforce the “we don’t use enterprise repos” stuff alongside “we will use free tier copilot for training”.

    A user can be a contributor to a private repository, but not have that repository owner organisation’s license to use copilot. They can still use their personal free tier copilot on that repository.

    How can enterprises be confident that their IP isn’t being absorbed into the GH models in that scenario?

  • hmate9 1 hour ago
    For what it's worth they're not trying to hide this change at all and are very upfront about it and made it quite simple to opt out.
    • matltc 1 hour ago
      They didn't even link the setting in their email. They didn't even name it specifically, just vaguely gestured toward it. Dark patterns, but that's Microslop for ya
      • hmate9 46 minutes ago
        going to github i was greeted with a banner and a link directly to the settings for changing it
  • _pdp_ 50 minutes ago
    Microsoft doing dumb things once again.

    Who in their right mind will opt into sharing their code for training? Absolutely nobody. This is just a dark pattern.

    Btw, even if disabled, I have zero confidence they are not already training on our data.

    I would also recommend to sprinkle copyright noticed all over the place and change the license of every file, just in case they have some sanity checks before your data gets consumed - just to be sure.

  • liquid_thyme 37 minutes ago
    They use data from the poor student tier, but arguably, large corporates and businesses hiring talented devs are going to create higher quality training data. Just looking at it logically, not that I like any of this...
  • pizzafeelsright 1 hour ago
    I am not certain this is that big of a deal outside of "making AI better".

    At this point, is there any magic in software development?

    If you have super-secret-content is a third party the best location?

    • thesmart 49 minutes ago
      How about "no." You may be okay giving away your individual rights, including to copyright, but I am not.
  • hoten 1 hour ago
    Why is there no cancel copilot subscription option here?. Docs say there should be...

    Mobile

    https://github.com/settings/billing/licensing

    EDIT:

    https://docs.github.com/en/copilot/how-tos/manage-your-accou...

    > If you have been granted a free access to Copilot as a verified student, teacher, or maintainer of a popular open source project, you won’t be able to cancel your plan.

    Oh. jeez.

  • OtherShrezzing 1 hour ago
    So, how does this work with source-available code, that’s still licensed as proprietary - or released under a license which requires attribution?

    If someone takes that code and pokes around on it with a free tier copilot account, GitHub will just absorb it into their model - even if it’s explicitly against that code’s license to do so?

  • Deukhoofd 1 hour ago
    So basically they want to retain everyone's full codebases?

    > The data used in this program may be shared with GitHub affiliates, which are companies in our corporate family including Microsoft

    So every Microsoft owned company will have access to all data Copilot wants to store?

  • thesmart 53 minutes ago
    I'm ready to abandon Github. Enschitification of the world's source infrastructure is just a matter of time.
  • TZubiri 1 hour ago
    Two issues with this:

    1- Vulnerabilities, Secrets can be leaked to other users. 2- Intellectual Property, can also be leaked to other users.

    Most smart clients won't opt-out, they will just cut usage entirely.

    • matltc 1 hour ago
      That's me. Frankly, looking at just uninstalling VSCode because Copilot straight-up gets in the way of so much, and they stopped even bothering with features that are not related to it (with one exception of native browser in v112, which, admittedly, is great)
  • indigodaddy 1 hour ago
    Checked and mine was already on disabled. Don't remember if I previously toggled it or not..
    • martinwoodward 1 hour ago
      If you previously opted out of the setting allowing GitHub to collect data for product improvements, your preference has been retained here. We figured if you didn't want that then you definitely wouldn't want this..
  • mt42or 1 hour ago
    Is it legal ? Surely not in any EU countries.
    • okanat 1 hour ago
      Does it even matter? They trained AI on obviously copyrighted and even pirated content. If this change is legally significant and a legal breach, the existence of all models and all AI businesses also is illegal.
      • 0x3f 1 hour ago
        It might or might not be legal, but it seems materially worse to screw over your direct customers than to violate the social-contracty nature of copyright law. But hey ho if you're not paying then you're the product, as ever was.
    • mentalgear 1 hour ago
      At least one instance where it was enabled in EU countries as well.
  • TZubiri 1 hour ago
    If this doesn't sound bad enough, it's possible that Copilot is already enabled. As we know this kind of features are pushed to users instead of being asked for.

    Maybe it's already active in our accounts and we don't realize it, so our code will be used to train the AI.

    Now we can't be sure if this will happen or not, but a company like GitHub should be staying miles away from this kind of policy. I personally wouldn't use GitHub for private corporate repositories. Only as a public web interface for public repos.

  • semiinfinitely 57 minutes ago
    ill be moving off github now
  • djmashko2 2 hours ago
    > Content from your issues, discussions, or private repositories at rest. We use the phrase “at rest” deliberately because Copilot does process code from private repositories when you are actively using Copilot. This interaction data is required to run the service and could be used for model training unless you opt out.

    Sounds like it's even likely to train on content from private repositories. This feels like a bit of an overstep to me.

  • rvz 1 hour ago
    > From April 24 onward, interaction data—specifically inputs, outputs, code snippets, and associated context—from Copilot Free, Pro, and Pro+ users will be used to train and improve our AI models unless they opt out.

    Now is the time to run off of GitHub and consider Codeberg or self hosting like I said before. [0]

    [0] https://news.ycombinator.com/item?id=22867803

    • 0x3f 1 hour ago
      Codeberg doesn't support non OSS and I'd rather just have one 'git' thing I have to know for both OSS and private work. So it's not a great option, IMO. Self-hosting also for other reasons.

      I'm not sure there are any good GitHub alternatives. I don't trust Gitlab either. Their landing page title currently starts with "Finally, AI". Eek.

  • baobabKoodaa 1 hour ago
    (oops)
  • Mooshux 44 minutes ago
    [dead]
  • SilentEditor 1 hour ago
    [dead]
  • bustah 32 minutes ago
    [dead]