lolinder 12 hours ago

The headline is either sloppy or intentionally misleading: the Copyright Office is saying that the law surrounding whether AI generated works can be copyrighted was settled in 1965 (the answer being "yes if AI assisted a human creative process, no if not, and we have to decide on a case by case basis if there was enough human input to qualify"). This has been their stance all along, but now they've provided a bit more guidance on what counts as human input, which is helpful.

What this article doesn't talk about at all is the far more controversial AI copyright debate, the one most people will think of given the headline: whether training a model is fair use. That's the one everyone is actually concerned about, and they're definitely not claiming it was settled already.

  • Salgat 11 hours ago

    The human input makes sense, otherwise, couldn't you bruteforce generate billions of low resolution images that cover a vast range of situations and then use that to attack anything similar enough to fit the substantial similarity condition? You could even plug a news feed into the generator.

    • dotancohen 9 hours ago

      Somebody did this with music - they brute forced all chord progressions or something like that. In theory all new music is infringing.

      • somenameforme 8 hours ago

        Things like this often makes me wish we had more 'common sense' laws and left the discretion of interpreting that notion to judges, juries, and the various systems of appeals and other courts we have, entirely with the expectation that laws would 'evolve' over time. This might sound radical, but instead it's actually just going back to how things used to be. Here is the First Amendment in its entirety:

        ---

        "Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof; or abridging the freedom of speech, or of the press; or the right of the people peaceably to assemble, and to petition the Government for a redress of grievances."

        ---

        The rest of the bill of rights looks similarly. Now a days it'd be thousands of pages long trying to elaborate endlessly upon every single scenario. But the more important thing is that this idea of trying to codify every scenario still doesn't work because you end up with a zillion loopholes in just about every single law with some clever clown going 'ah hah! you didn't cover this!' So all you really do is end up with laws that are not only excessively fragile and subject to exploitation, but also completely indecipherable by just about anybody, certainly including the people voting on their passage.

      • AstralStorm 9 hours ago

        Unfortunate problem of copyright being an ever uphill battle and why it should be short timed.

        Similarly with patents.

        Even when used there should be a timeout, possibly per clause to avoid overly broad stuff.

        But then a few IP trolls and lawyers would have to find another job.

  • Terr_ 9 hours ago

    > whether training a model is fair use

    I want to highlight that training the model is only one part of the copyright questions going on, the other is how they are making and keeping direct long-term copies in giant training datasets.

    Imagine what would happen if a regular person bought and then immediately resold thousands of books, CDs, and movies, taking just enough time to make a copy of each one and building out their own library/movie-theater for friends and coworkers. You think the powers-that-be would let you or I get away with that?

    There is no (non-evil) reason to hold multijillion dollar corporations with professional legal advice on-tap to a lower standard than regular people.

  • sublinear 7 hours ago

    ALL OF WHAT AI HAS BEEN TRAINED ON IS HUMAN INPUT

  • bonzini 9 hours ago

    > yes if AI assisted a human creative process, no if not

    Fair enough but does that help settle the other question, which is whether weights are considered derivative works of any material used in the training?

  • cxr 12 hours ago

    There's not really much of a debate, just a bunch of clamoring and wishful thinking by rightsholders who don't understand copyright law insisting that precedent should be subordinate to mimetic outrage over LLMs.

    • throwaway17_17 11 hours ago

      In what way are ‘rightsholders’ expressing wishful thinking? I assume you are saying that there is no violation of those rights controlling various properties that have been used to train ‘AI’. You then mention precedent in a way that implies there are legal decisions that make it clear ‘AI’ training using copyrighted material does not violate the rights of those who own that material. Could you list or link to such a precedent?

      To the best of my knowledge, there is no direct precedent from any federal circuit addressing this issue and certainly no USSC opinions dealing with the issue. Additionally, any analogies drawn from precedent focused on other areas of intellectual property law is easily distinguishable. This is truly fresh legal ground and the next 10 years of jurisprudence will go a long way towards building the precedent that your comment would imply to already exist.

      Just to be explicit, the above, while a legal opinion, IS NOT legal advice.

      • cxr 10 hours ago

        No amount of solidarity from support groups comprising clusters of likeminded folks on internet message boards who're opposed to settled law is a substitute for an act of Congress, which is what it will take to give the position of folks opposed to contemporary GenAI any legs.

        Neither your comments to HN nor anyone else's strenuous assertions that there's anything to debate are going to change anything.

        If you want to treat LLMs as a special case—which is what you want, since there is an entire history of jurisprudence that you have to contend with here—then you need to get Congress to write legislation that says so.

        • Animats 9 hours ago

          > act of Congress

          More than that, a constitutional amendment. See Feist vs. Rural Telephone.

          The US does not have database copyright or "sweat of the brow" copyright. There has to be human originality.

          • AstralStorm 9 hours ago

            So if you collect things in an undisclosed database without archival rights you probably are violating bajillion copyright claims, right?

            The AI itself can be construed as a special kind of a database, given that it can be queried to reproduce at least part of its training dataset with precision...

            • cxr an hour ago

              > So if you collect things in an undisclosed database without archival rights you probably are violating bajillion copyright claims, right?

              No.

      • jpalawaga 11 hours ago

        Copyright law stipulates the conditions in which content can be reproduced, not conditions in which it can be consumed.

        Arguably the material has been learned and not copied. Maybe in some cases learned with an uncanny ability or photographic memory, but learned. (People with photographic memories also cannot reproduce content in an unlimited fashion).

        • bbarnett 10 hours ago

          Learned!

          There's nothing special about an LLM, there's no learning, and they regurgitate verbatim text too.

          May as well say curl + images in a db are learned as well, so thus I can use Mickey Miuse as I please in my php web page.

          • drdeca 10 hours ago

            While learned is probably not the best word to use as far as describing the legality goes, I also don’t think “copied” is the right word.

            Let’s say that the model “is influenced by” the copyrighted material. That seems hard to argue against.

            So, now that we aren’t using the word “learned”, why would we say that the way the models are influenced by the copyrighted works that appear in the “training set” (not to imply that “training” in the usual sense is happening) counts as a copyright violation?

            Or, perhaps the claim is that the outputs of the model are violating copyright?

            If the output is substantially similar to some particular copyrighted work that is in the “training set”, and could work as a substitute for that work, and if the output resembling the work is in part due to the influence that the work had on the model, I think in this case it would be a clear case of violation of copyright.

            However, if it doesn’t have substantial similarity to any particular copyrighted work that influenced it, only similarity to the style common to many of the works that influenced it (even if all by the same author), my impression is that this would not constitute copyright infringement because styles are not protected by copyright, only individual works.

            (Now, is this unjust, in the case of it copying the style of some particular author/artist? Idk, maybe? But my impression is that copyright doesn’t protect styles, and that it probably shouldn’t protect styles in general… so I guess maybe if we had a law making a special case forbidding the (deliberate?) copying of a person’s particular style via some kind of machine learning model? Idk.)

            • tsimionescu 9 hours ago

              The argument is that the LLM itself is essentially like a complex lossy archive of its entire training set. It's like an mp3 of all of the songs on Spotify, in some sense (of course, using all text on the internet instead of all songs on Spotify). This is the sense in which it is considered to be a copy of all of this.

            • galaxyLogic 9 hours ago

              Very insightful. Consider there are many "recreational imitators" who mimic how specific (famous) people speak. They are not violating copyright, they are just imitating a way of speaking.

              • echoangle 8 hours ago

                I don’t think this is a good argument because the way of speaking isn’t a copyright issue. I don’t think you have copyright on your specific way of speaking, only on specific recordings of you yourself speaking.

          • visarga 9 hours ago

            > There's nothing special about an LLM, there's no learning

            The model is 100-500x smaller than its training set. That is something hinting at learning, as direct storage is impossible.

            • toast0 9 hours ago

              Video compression ratios vs raw video is amazing too, but there's no learning and there's no doubt that the compressed form is subject to copyright.

ilaksh 10 hours ago

It says they were not able to reproduce an image with the same prompt. So they just didn't know about seeds?

Aloisius 11 hours ago

> "Where a human inputs their own copyrightable work and that work is perceptible in the output, they will be the author of at least that portion of the output," the guidelines said.

This policy is sensible. Most AI generated works should be uncopyrightable, except where a substantial human contribution is in the output.

Simply describing a picture and letting AI generate it shouldn't be enough for the same reason that dictating what you want to a painter isn't enough to earn you copyright over the resulting painting.

I would be wary about integrating too much AI output into works one wants to enforce copyright over without some level of documentation. The nightmare scenario is having your copyright stripped away because of evidence one used AI extensively.

  • NitpickLawyer 8 hours ago

    > Simply describing a picture and letting AI generate it shouldn't be enough

    Interesting take, and I've heard this many times. I'm curious to explore this further and see why you think that is, and where do you draw the line?

    Is it the "low effort"? Is it the "automated" stuff? Is the process of setting it up, prompting it and choosing a result not enough "creative input"? If so, why?

    Let's take a "real world" example as analogue. Say I setup a camera on a tripod. I set it to take pictures every 1 second, and leave it there. Come back 1hr later, and go through the pictures. I select one of the sunset I like, and post it. Would I not have copyright on that picture? I wasn't there when it was taken. But I did setup the camera and selected the end result. How is that different?

    Taking it back to genAI, say I build/train/finetune my own model. Would it now have enough "effort" from me that I can use those generations? Is this an effort thing or is it more? Or is it just that someone else did the work?

    What about random "art"? As in art based on random numbers. Say I write a script in python to use random math formulas to "draw" on a canvas. I let it run for a couple of hours, come back, look at the results and select one. Do I not get copyright on the resulting "art" because it was randomly generated by a script? Does it matter if the script was written by me? Would it be different if you download my script and generate the art yourself? Would you not have copyright?

    I guess what I'm trying to say is "where do we draw the line?". It's not clear to me why people say "simply prompting and selecting isn't creative enough". This distinction wasn't there before. Plenty of "art" out there based on random processes + curation. Why the sudden change?

  • njarboe 10 hours ago

    If the painter is doing a "work for hire" you should get the copyright.

    • Aloisius 10 hours ago

      They can if they buy the copyright from the painter.

      They just can't get it from the government because they are not, in any sense, the author of the creative work.

  • galaxyLogic 9 hours ago

    Right, you cannot copyright such output, is now clear(er).

    But what about the other direction, can distributing such AI generated content VIOLATE somebody else's copyright?

    If output of AI cannot be copyrighted, can it violate copyright?

sublinear 7 hours ago

I'm pretty confident the copyright office was massively overthinking it in 1965 and knocked it out of the park far beyond the watered down and ignorant arguments we hear today. It's sad really.

BeefySwain 10 hours ago

Why is a binary (compiled machine code) protected by copyright, but the raw output of an AI model is not?

  • andsoitis 9 hours ago

    Courts have ruled that compilation does not remove originality—the binary is still a transformation of an original, copyrighted work (the source code).

  • realusername 9 hours ago

    Because binaries are a transformation of the source code, which is written by a human.

    Other kind of binaries which are fully generated by a machine like private keys aren't copyrightable.

Animats 9 hours ago

US copyright applications are not examined, in the sense that patents are. Issued patents are presumptively valid. Registered copyrights are not. Whether a copyright application is valid has to be determined by a court.

philippta 9 hours ago

I think the main two questions everyone need clarified are:

1. Can I get sued by a 3rd party when using AI generated work in my project?

2. Can I sue a 3rd party when they use my AI generated work in their project?

jarsin 12 hours ago

When uploading books to kindle direct publishing you have to state that you own the copyright and publishing right.

So any book or story on Amazon that was generated substantially via prompting should now have to be removed based on this guidance from the copyright office.

  • cyberax 12 hours ago

    That's incorrect. Purely factual books (like phone dictionaries or map atlases) are perfectly fine for publishing.

    • feoren 11 hours ago

      Purely factual books are copyrightable. It is the collection and curation of those facts that is protected. You cannot just copy someone else's 100 Amazing Facts about The Rainforest verbatim; if you publish 100 Cool Truths about The Jungle and it has those same 100 facts, you'll get sued and they'll easily win.

      • jcranmer 11 hours ago

        The EU and the UK generally has something akin to "sweat of the brow", where collections of facts that took time to collate are copyrightable.

        But in the US, Feist v Rural explicitly disavowed the sweat of the brow doctrine, and said that facts have no copyright value--a work requires a quantum of original creative spark to be copyrightable (it was discussed in the context of phone books--a phone book does still have some residual "thin copyright", but the listing of phone numbers is not copyrightable, and it is actually difficult to infringe on the thin copyright of a phone book). In the US, your example would easily be found to be not infringing, if the only similarity were reproducing the same 100 facts.

        • schoen 10 hours ago

          However, if it states the facts in exactly the same way, that could be considered infringing because of the creativity presumably involved in deciding how to state each fact.

          "Elephants are enormous mammals, usually grey in color, with significant intelligence and social habits. They are native to portions of Africa and Asia, with different elephant species found in each region. They are famous for their strong and dexterous trunks, which can also be used to communicate something like a trumpet. Humans, especially in South Asia, have long admired elephants and used them for transportation and various kinds of work; African elephants are famous for having been ridden to war by the Carthaginians, to the dismay of their Roman opponents. Today elephants are significantly threatened by various human activities, both those intentionally directed at the elephants (like killing them for meat or for the ivory derived from their tusks) and those not deliberately meant to affect them (like deforestation). While our word for elephants comes from the Greek, it was probably borrowed by the ancient Greeks from another language family."

          I just wrote this paragraph about elephants based on my own knowledge. There is no copyrightability of the substantive information here (e.g. if you learned something new about elephants, you can tell other people) but there is probably some copyrightability of the paragraph based on things like the creativity of my word choices. That distinction can sometimes create confusion in discussions about "facts", and I'm not positive that legal standards that are meant to clarify it have always given a clear and workable rule.

          • bryanrasmussen 10 hours ago

            that is not a list of facts however. This is a list of facts

            Largest Land Animals By Size:

                African Bush Elephant
            
                Asian Elephant
            
                African Forest Elephant
            
            ...