In the US at least this should be pretty well covered by the case law on news aggregators.
Putting aside that other products, such as OpenAI's ChatGPT and modern Google Search have the same "AI-powered web search" functionality, I can't see how this is meaningfully different from a user doing a web search and pasting a bunch of webpages into an LLM chat box.
> But what about ad revenue?
The user could be using an ad blocker. If they're using Perplexity at all, they probably already are. There's no requirement for a user agent to render ads.
> But robots.txt!!!11
`robots.txt` is for recursive, fully automated requests. If a request is made on behalf of a user, through direct user interaction, then it may not be followed and IMO shouldn't be followed. If you really want to block a user agent, it's up to you to figure out how to serve a 403.
> It's breaking copyright by reproducing my content!
Yes, so does the user's browser. The purpose of a user agent is to fetch and display content how the user wants. The manner in which that is done is irrelevant.
Just because it is possible -- or even easy -- to essentially steal from newspapers/other media outlets, doesn't make it right, or legal. The people behind it put in labor, financial resources, and time to create a product that, like almost every other service, has terms attached -- and those usually come with some form of monetization. Maybe it is a paywall, maybe it is advertisements -- but it is there.
Using an adblocker, or finding some loophole around a paywall, etc, are all very easy to do technically, as any reader of this site knows. That said, the media outlet doesn't have to allow it. And when it is violated on an industrial scale, like Perplexity, then they can be understandably upset and take legal action. And that includes any AI (or other technology, for that matter) that is a wrapper around plagiarism.
Sites opted in to Google originally because it fed them traffic. They most likely did not opt in to an AI rewriter that takes their work and republishes it without any compensation.
No fair plays done by people, even before the LLMs, so we get the PoW challenge on everywhere.
And what is that conclusion? since Adblockers are used by anywhere, it is OK to corporates not to license them directly and just yank them and put it into curation service? especially without ads? that's a licensing issue. the author allowed you to view the article if you provide them monetary support (i.e. ads), they didn't allow you to reproduce and republish the work by default.
also calling browser itself as reproducing? Yes, the data might be copied in memory (but I wouldn't call it as reproducing material, more like transfer from the server to another), but redistribution is the main point here.
It's like saying well, "the part of the variable is replicated to register from the L2 cache, so whole file on DRAM can be authorized to reproduce", Your point of calling "it's reproducing and should not be reproduced in first place" can't be prevented unless you bring non-turing computers that doesn't use active memory.
The nasty bots make a single access from an IP, and don't use it again (for your server), and are disguised to look like a browser hit out of the blue with few identifying marks.
I suspect they seem easier to sue than OpenAI, Anthropic, Meta, Google, and literally anything coming out of china.
> Japan’s copyright law allows AI developers to train models on copyrighted material without permission. This leeway is a direct result of a 2018 amendment to Japan’s Copyright Act, meant to encourage AI development in the country’s tech sector. The law does not, however, allow for wholesale reproduction of those works, or for AI developers to distribute copies in a way that will “unreasonably prejudice the interests of the copyright owner.”
Personally, About their news service, Their news summarization is kinda misleading with AI hallucination in some places.
However, the transformative nature of derivative work is not only about its apparence. It also factors in whether the transformation changes the nature of the message, and whether the derivative work is in direct competition with the original work [1]. I suspect for e.g. news articles, there's a good case that people get information that way instead of going to the newspaper, which means the derivative work competes with the original. Also when it comes to reporting news, there's not many ways to make the message different that doesn't make the AI service bad.
[1]: https://en.wikipedia.org/wiki/Andy_Warhol_Foundation_for_the_Visual_Arts,_Inc._v._Goldsmith
https://www.cric.or.jp/english/clj/cl2.html#chapter2sect3sub5
It’s plenty clear to me that they’ve broken copyright law a lot. They’ve downloaded copyrighted material without permission for their own use, which we’ve been assured is Not Good for us individual people. Some of them even redistributed it by seeding torrents, which is even more Not Good.
Of relevance here is the fact that 1) Meta denies having seeded the content, and there looks to be no hard evidence that they distributed the content to other users, 2) the case is ongoing, so a decision has not yet been reached about whether they broke any laws, and 3) the fact that Meta is being sued for this shows that even corporations worth trillions of dollars are not immune to the consequences of breaking the law.
https://www.crunchbase.com/organization/perplexity-ai
I don’t know how to parse this. I don’t think of them as small. Though they were only founded in 2022 and may not have a huge number of employees, they have had 8 funding rounds. They’re private, so I don’t know what they have raised, but some say that the company could have a $18B valuation.
https://www.bloomberg.com/news/articles/2025-07-17/ai-startup-perplexity-valued-at-18-billion-with-new-funding | https://archive.is/6DZpo
Is that small?
> The confusion of intellectual property and property rights is fair enough given the name, but intellectual property is not a property right at all. Property rights are required because property is rivalrous and exclusive: When one person is using a pair of shoes or an acre of land, other people’s access is restricted. This central feature is not present for IP: an idea can spread to an infinite number of people and the original author’s access to it remains untouched.
> There is no inherent right to stop an idea from spreading in the same way that there is an inherent right to stop someone from stealing your wallet. But there are good reasons why we want original creators to be rewarded when others use their work: Ideas are positive externalities.
> When someone comes up with a valuable idea or piece of content, the welfare maximizing thing to do is to spread it as fast as possible, since ideas are essentially costless to copy and the benefits are large.
> But coming up with valuable ideas often takes valuable inputs: research time, equipment, production fixed costs etc. So if every new idea is immediately spread without much reward to the creator, people won’t invest these resources upfront, and we’ll get fewer new ideas than we want. A classic positive externalities problem.
> Thus, we have an interest in subsidizing the creation of new ideas and content.
And so you can reframe whether or not IP rights should be assigned in this case, based on whether you believe that the welfare generated by making AI better by providing it with content is more valuable for society than the welfare generated by subsidizing copyright holders.
The proper way to decide this would be to pass a law in the legislature. But of course our system in general and tech companies in particular don’t work that way.
The argument really isn't based on rights, it's based on the rules of the game have been that people that make things get to decide what folks get to do with those things via licensing agreements, except for a very small set of carve outs that everyone knew about when they made the thing. The argument is consent. The counter argument is one/all of ai training falls under one of those carve outs, and/or it's undefined so it should default to whatever anyone wants, and/or we should pass laws that change the rules. Most of these are just as logical as if someone invented resurrection tomorrow, then murder would no longer be a crime.
These seem to be very different indeed. You only need to be able to own and give property to have inheritance.
If your property is owned by a monarch or de facto the state, and you work your lifetime to rent it from them, then you don't get inheritance.
Your statements seem to extend that further: If you rent an apartment, you the property is owned by an landlord (lord is literally in the title!) and passed down by their wishes. Similarly if you work for Walmart for life, the company is owned and passed down by the Waltons. In these cases the property rights extend beyond life and are transferred via circumstances of birth, while the rights of labor end.
Interesting that IP rights are ended by death (or death+n years) as well. This line of reasoning suggests maybe that should apply to all property.
Isn't the AI in this case also copyrighted intellectual property that benefits its owners and not the society? As far as I know, Perplexity is a private, for-profit corporation.
I don't see how improving Perplexity's proprietary models is any more beneficial to society than YouTube blocking ad blockers.
Works are so sparse, and there is such an explosion in how many texts there are that when someone has a right to the exclusive use of one of these huge numbers that are almost unrepresentable, you lose almost nothing.
If someone didn't announce that they had written, let's say, Harry Potter and there was a secret law forbidding you from distributing it, that would be really bad, but it would never matter.
Copyright infringement is a pure theft of service. You took it because it was there, because someone had already spent the effort to make it, and that was the only reason you took it.
Land, physical property, etc. meanwhile, is something that isn't created only by human effort.
For this reason copyright, rather than some fake pseudo-property of lower status than physical property, is actually much more legitimate than physical property.
I don’t think it’s as clear who is at fault if I mention “he who must not be named” in a hypothetical scenario where Harry Potter was never published, and then start telling people about the manuscript I found. If I violated someone’s rights to privacy or property to get or keep the original manuscript, that’s one thing, but merely having it even if the author didn’t want me to have it as a copy especially is another issue. If I never published it but merely described it to others, I’m not sure if I’m any less culpable, but it seems like I should be.
I’m not sure how much more I can explore your thought experiment, but I appreciate you for sharing it with me.
Can they, though? Isn't that why Perplexity is being sued?
I don't have an answer to your question, which seems more general and doesn't correspond to the situation described by the article anyway: here the corporations have the right to use copyrighted materials to train their model, in the same way that you are allowed to learn from the same materials. You might even learn it by heart if you want to, but copyright laws forbid you from reproducing it, and in this instance the Japanese law tries to follow the same principle for AI models.
How should the corporations implement their training to prevent their models to reproduce the material verbatim is their problem, not the copyright holder's, in exactly the same fashion if you learn an article by heart, it's on you to make sure you won't recite it to the public.
Humans are human. Humans can human when there is no profit motive without it being a copyright violation. Effectively infinitely scaling, for profit products, can't 'human' without it being a copyright violation. The two are much different cases, in no way comparable.
For profit products are PRODUCTS intended to make money for companies. AIs are scalable past an individual human.
Rules/concepts for humans are not relevant at all for for profit products.
Copyright usually doesn't prevent copying per se, it's the redistribution that is violative. You, as well as Perplexity are free to scrape public sites. You'll both be sued if you distribute it.
There are obviously laws that differ in every region but at a philosophical level I believe in the ideal of fair use. An AI is a distinctly different "work" than these originals and much like a human's own output is informed by all the information they have taken in over their lifetime, so is the output of a model.
edit: When many make this argument, what they are really saying is "big fucks small". This may not be what you are saying, but seems to be the general philosophy of many who make this argument. I am sympathetic to that which is why I believe we should have something like a 15% tax or 2% of revenue of AI paid into a general tax fund. I find it impossible to litigate how much a news article should be "worth" when 400 of the same news article were written the same day with the value immeadiately diminishing after the "news" was new.
Seems to be a fairly recent trend. Wonder what changed.
The copyright industry has done all it can for us, even in the most charitable interpretation. They literally, by constitutional mandate, can't be allowed to stand in the way of progress. We're not talking Napster 2.0 here.
Copyright doesn't promote the progress of science. Rather the opposite, as it allows journals that contribute nothing to progress to charge the rest of us to access research our taxes paid for.
As for "arts," useful and otherwise, those are secured these days via unbreakable permanent DRM, which overtly violates the constitutional basis of copyright law as a time-limited bargain with the public domain. You should be at least as outraged about that as you are about AI, but evidently you're not.
Meanwhile, you'd have to have rocks in your head to argue that AI doesn't constitute scientific progress at a bare minimum.
If it is adjudicated to be a violation, well, that's the end of copyright, for better or worse. AI is more important. Don't fight to lock down information; fight for equitable access instead.
First of all, the way certain platforms get sued for certain activities while others are left alone is unfair and creates significant market distortions.
Then there is the fact that wealthy individuals have much better legal representation than non-wealthy individuals.
Then there are tax loopholes which create market asymmetries above that.
The word 'fair' doesn't even make sense anymore. We've got to start asking; fair for who?
> Japan’s copyright law allows AI developers to train models on copyrighted material without permission. This leeway is a direct result of a 2018 amendment to Japan’s Copyright Act, meant to encourage AI development in the country’s tech sector. The law does not, however, allow for wholesale reproduction of those works, or for AI developers to distribute copies in a way that will “unreasonably prejudice the interests of the copyright owner.”
The article is almost completely lacking in details though about how the information was reproduced/distributed to the public. It could be a very cut-and-dry case where the model would serve up the entire article verbatim. Or it could be a much more nuanced case where the model will summarize portions of an article in its own words. I would need to read up on Japanese copyright law, as well as see specific examples of infringement, to be able to make any sort of conclusion.
It seems like a lot of people are very quick to jump to conclusions in the absence of any details, though, which I find frustating.
It certainly seems legal to train. But the case is about scraping without permission. Does downloading an article from a website, probably violating some small print user agreement in the process, count as distribution or reproduction? I guess the court will decide.
https://jskfellows.stanford.edu/theft-is-not-fair-use-474e11f0d063
Therefore, their output is a derivative work and violates copyright. The 2018 amendment is driven by big capital and should be reverted. Machines can plagiarize at huge scale and should have have no human rights.
I'm somewhat optimistic this problem can be solved, though, with filters and usage policies. YouTube, another platform with basically unlimited potential for copyright infringement, has managed to implement a system that is good enough at preventing infringement to keep lawsuits at bay.
It's also not clear if that's what Yomiuri Shimbun is alleging here. In their 2023 "Opinion on the Use of News Content by Generative AI" [1] they give this example:
> Newspaper companies have long provided databases containing past newspaper pages and articles for a fee, and in recent years, they have also sold article data for AI development. If AI imports large quantities of articles, photos, images, and other data from news organizations’ digital news sites without permission, commercial AI services for third parties developing it could conflict with the existing database sales market and “unreasonably prejudice the interests of the copyright owner” (Article 30-4 of the Act). Also, even if all or part of a particular article communicates nothing further than facts and hardly constitutes a copyright, many contents deserve legal protection because of the effort and cost invested by the newspaper companies. Even if an AI collects and uses only the factual part, it does not mean it will always be legal.
So basically arguing that 2018 amendment which allows the use of copyrighted works to train AI models without permission from the copyright holder is not applicable because the use would "would unreasonably prejudice the interests of the copyright owner in light of the nature or purpose of the work or the circumstances of its exploitation". [2]
... which I think is a much more nuanced argument. I don't think we can just lump all of these cases together and say "it's infringement" or "it's fair use" without actually considering the details in each case. Or the specific laws in each country.
quality news content has not been a thing for a long time now, so the public will not notice any change
Give it a try on your next visit to Tokyo. I recommend arriving on the cablecar - almost feels like you're descending into Jurassic Park by helicopter (wife gets quite annoyed when I predictably start humming John Williams).
> Japan’s largest newspaper, Yomiuri Shimbun, sues AI startup Perplexity for copyright violations