Microsoft’s Megatron AI is at the center of a new legal storm, as a group of high-profile authors has accused the company of training the model with nearly 200,000 pirated books. The lawsuit highlights a critical point of contention in the AI industry: the ethical and legal implications of data sourcing for large language models. The authors allege that this vast collection of pirated material was used to enable the AI to generate text that closely resembles their original writings.
The plaintiffs, including acclaimed writers Kai Bird and Jia Tolentino, are demanding a court order to prevent further copyright infringement by Microsoft and seeking statutory damages of up to $150,000 per allegedly misused work. They argue that generative AI, which produces various forms of media, relies heavily on these expansive datasets to learn and replicate human creative expression. Their complaint specifically details the role of the pirated books in shaping the AI’s output.
Microsoft has not yet issued a statement regarding the lawsuit, and the authors’ attorney has opted not to comment. This legal action follows recent significant rulings in California concerning other AI companies, Anthropic and Meta, demonstrating the nascent and evolving legal framework surrounding AI and copyright.
The scope of copyright challenges against AI developers is broad and growing. Major media organizations, music labels, and photography companies have all initiated lawsuits, asserting their rights over content used for AI training. Tech companies often invoke the “fair use” doctrine, contending that their AI models produce “transformative” new content and that imposing fees for training data could stifle innovation in the AI sector.
Megatron AI Under Fire: Authors Allege 200,000 Pirated Books Used
28
previous post