Inside AI #5: Meta Training Data Email Leaks, Are Google "The Baddies", Our Presentation in Brussels

Edition 5

Feb 15, 2025

In This Edition:

Text within this block will maintain its original spacing when published

Key takeaways:

Recent News:
- Authors vs. AI: Meta and OpenAI Accused of Illegally Training Its Model Obtained from Well-known “Shadow Library” Websites
- DeepSeek Data Leak Exposes 1 Million Sensitive Records
- Google Lifts a Ban on Using Its AI for Weapons and Surveillance
- Updates on Suchir
- Mark Zuckerberg Defends Meta’s Recent Moves, Sets Up 2025 in Leaked All-Hands Meeting
- Challenges and Shortcomings of the Paris AI Summit
- Anduril in Talks to Raise Money at $28 Billion Valuation as Defense-tech Booms
We highlighted the importance of establishing a central EU AI Office Whistleblower Mailbox during our participation in the Roundtable on General Purpose AI Governance and the EU AI Act’s Code of Practice organized by Pour Demain in Brussels last week.
Reach out if you’d like to see the slides we presented.

Insider Currents

Carefully curated links to the latest news spotlighting voices and information emerging from within the frontier of AI from the past 2 weeks.

Authors vs. AI: Meta and OpenAI Accused of Illegally Training Its Model Obtained from Well-known “Shadow Library” Websites

Recently unsealed emails have emerged as key evidence in a copyright lawsuit against Meta, in which book authors allege the company unlawfully used pirated books to train its AI models, according to Ars Technica.

Internal communications cited in the report suggest that Meta took deliberate steps to obscure its data collection activities. In one internal message, Meta researcher Frank Zhang indicated that the company intentionally avoided using Facebook servers for downloading the dataset, referring to this approach as “stealth mode” to prevent the activity from being traced back to Meta. Further supporting these claims, in a deposition, Meta executive Michael Clark (Meta executive overseeing project management) stated that the company adjusted torrenting settings to minimize its visibility, reducing the likelihood of detection.

A class-action lawsuit filed in U.S. federal court by the Joseph Saveri Law Firm on behalf of Sarah Silverman and other authors accuses OpenAI and Meta of unlawfully using copyrighted material to train AI language models like ChatGPT and LLaMA. In the case against OpenAI, authors claim that disclosures suggest ChatGPT was trained on 294,000 books allegedly obtained from well-known “shadow library” websites, including Library Genesis (LibGen), Z-Library (Bok), Sci-Hub, and Bibliotik. Meanwhile, Meta has admitted that LLaMA was trained on a dataset called ThePile, which the lawsuit claims includes “all of Bibliotik,” comprising 196,640 books.

→ Read: Torrenting From a Corporate Laptop Doesn’t Feel Right”: Meta Emails Unsealed

→ Read: Sarah Silverman Sues OpenAI, Meta for Being “Industrial-strength Plagiarists”

DeepSeek Data Leak Exposes 1 Million Sensitive Records

DeepSeek suffered a significant data leak, exposing over one million sensitive records, which was revealed by cybersecurity researchers at Wiz Research. Cited from CSO Online, “This leak raised serious concerns about data security and privacy, particularly as AI companies continue to aggregate and analyze vast amounts of information.”

The breach occurred due to a misconfigured database that was left publicly accessible without authentication. Wiz Research identified that the exposed data included:

Chat logs and system details
Operational metadata
API secrets
Sensitive log streams

This large database (over one million records) was “publicly accessible to anyone with an internet connection, raising significant concerns about DeepSeek’s data management practices and compliance with privacy laws,” added Forbes. If personal data belonging to EU or US residents was compromised, the breach could result in regulatory scrutiny under GDPR and CCPA.

→ Read the Forbes Article Here

→ Read the Article by Wiz Research Here

Google Lifts a Ban on Using Its AI for Weapons and Surveillance

Google has updated its AI principles, removing language that previously prohibited the development of:

Technologies that cause or are likely to cause overall harm
Weapons or other technologies whose principal purpose or implementation is to cause or directly facilitate injury to people
Technologies that gather or use information for surveillance violating internationally accepted norms
Technologies whose purpose contravenes widely accepted principles of international law and human rights

According to a report by Wired, Google executives cited in a blog “the increasingly widespread use of AI, evolving standards, and geopolitical battles over AI,” as key factors behind the change.

The decision has sparked internal debate among employees. Business Insider reported that some Google staff members expressed concerns on the company’s internal message board, Memegen, through widely shared memes:

One depicted CEO Sundar Pichai searching, “How to become a weapons contractor?”
Another referenced a well-known comedy sketch featuring a Nazi soldier, captioned: “Google lifts a ban on using its AI for weapons and surveillance. Are we the baddies?”
A third featured Sheldon from The Big Bang Theory, reacting to Google’s increasing collaboration with the Pentagon with the phrase: “Oh, that’s why.”

While these memes were among the highest-rated by employees, they reflect only a small subset of Google’s workforce, which exceeds 180,000. Others may view closer collaboration between technology firms and government agencies more favourably, Business Insider noted.

→ Read the Full Article by Wired

→ Read the Full Article by Business Insider

Parents of OpenAI Whistleblower File Lawsuit Seeking Full Police Report on Son’s Death - San Francisco releases autopsy report

Following our previous reports on the OpenAI whistleblower, his parents filed a lawsuit against the San Francisco Police Department on January 31, seeking the full release of the department’s report on their son’s death, Fortune reported.

We, again, won’t go into depth here and refrain from speculation. Below we link both to coverage on the lawsuit as well as a summary of the coroners & SFPD’s autopsy report, ruling the case shut.

Beyond the media attention, the case represents a profound personal loss—the loss of a talented programmer and researcher. Family and friends described him to Fortune as a “prodigy with a firm moral compass.” His parents continue to seek clarity and closure regarding his passing, Decrypt added.

→ Read an Article by Fortune and by Decrypt covering the lawsuit

→ Read an Article Covering the Official Autopsy Record

Mark Zuckerberg Defends Meta’s Recent Moves, Sets Up 2025 in Leaked All-Hands Meeting

In a recently leaked all-hands meeting, Meta CEO Mark Zuckerberg described 2025 as a pivotal year, likening it to a “sprint” rather than a “marathon.” He emphasised that by the end of the year, the company should have a clearer strategic direction, with AI remaining a primary focus. Additionally, he addressed recent shifts in Meta’s policies, including changes to fact-checking procedures and diversity, equity, and inclusion initiatives, wrote Business Insider.

Acknowledging the ongoing issue of internal leaks, Zuckerberg remarked, “Everything I say leaks. And it sucks, right? I want to be able to talk about stuff openly, but we’re trying to build and create value—not destroy it by discussing things that inevitably get leaked.” as reported by AdWeek.

In response to the persistent issue, Meta’s Chief Information Security Officer, Guy Rosen, issued an internal memo warning of “appropriate action, including termination” for employees who leak company information. He stressed that unauthorised disclosures not only pose security risks but also harm team morale and divert focus from Meta’s core mission, as reported by AdWeek.

However, the memo itself was promptly leaked to The Verge, wrote AdWeek.

→ Read: Mark Zuckerberg Defends Meta’s Recent Moves, Sets Up 2025 in Leaked All-Hands Meeting

Challenges and Shortcomings of the Paris AI Summit

The Paris AI Summit did not yield concrete solutions but instead relied on broad statements and industry jargon, according to Transformer News, referring to a leaked draft of the Statement on Inclusive and Sustainable Artificial Intelligence for People and the Planet. The statement, which ended up not being signed by the U.S. and the U.K., reinforced concerns among AI experts that the summit represented a “missed opportunity” to address critical AI risks and to establish meaningful global commitments on AI safety.

The statement also failed to build on commitments made at previous summits. The Transformer highlighted the following gaps:

At the Bletchley Summit, participating nations, including France, acknowledged AI’s “significant risks” and committed to developing “risk-based policies” to mitigate them.
In Seoul, companies pledged to establish safety frameworks, while governments agreed to define risk thresholds for advanced AI systems and develop proposals ahead of the AI Action Summit in France.

Despite these prior commitments, the draft statement lacks concrete steps for implementation. An anonymous AI governance expert, speaking to The Transformer, pointed to a key omission:

The statement does not address how governments should prepare for the potential near-term development of artificial general intelligence (AGI) by private companies. This, they argued, represents a significant failure of governmental responsibility to safeguard global public interests.

→ Read Transformer News’ Substack

Anduril in Talks to Raise Money at $28 Billion Valuation as Defense-tech Booms

Anduril, the defense technology startup founded by Palmer Luckey, has agreed to a term sheet to secure funding at a valuation of $28 billion, according to sources familiar with the situation, reported by CNBC.

Palmer Luckey, who sold Oculus to Facebook for $2 billion in 2014, has long supported Donald Trump. “I’ve backed tech-for-Trump longer than almost anyone,” Luckey told CNBC on Nov. 6 after Trump’s victory, emphasising the need for a strong military as a non-partisan issue. In December, his defense tech company, Anduril, partnered with OpenAI to deploy advanced AI for “national security missions.”

→ Read the Full Article by CNBC

Announcements & Call to Action

Updates on publications, community initiatives, and “call for topics” that seek contributions from experts addressing concerns inside Frontier AI.

OAISIS in the EU AI Act Roundtable on the CoP

On February 2nd, we took part in the Roundtable on General-Purpose AI Governance and the EU AI Act’s Code of Practice, organized by Pour Demain in Brussels.

Karl Koch, Co-Founder of OAISIS, presenting during the Pour Demain Roundtable on GPAI Governance in Brussels

Key messages we highlighted:

The importance and tremendous benefits of whistleblower regulation for AI labs, society, and regulators
How protections afforded through the EU whistleblowing directive already extend to the CoP - even if the EU AI Act is only covered in the directive from August 2026
The importance of establishing an EU whistleblower mailbox to support whistleblowers directly as well as national regulators.
- Setting up such a central mailbox & team could serve as a ‘lighthouse’ case for other regulators globally
- Offerings could include psychological, financial, and independent advisory support
- All of these offerings are already provided by leading EU member states to whistleblowers today - we can do this.

Pour Demain, a think tank dedicated to the responsible development and deployment of general-purpose AI, summarised the key outcomes of the discussion.

→ Read the Report by Pour Demain Here

→ Reach out to us if you want to see the slides we presented

Thank you for trusting OAISIS as your source for insights on protecting and empowering insiders who raise concerns within AI labs.

Your feedback is crucial to our mission. We invite you to share any thoughts, questions, or suggestions for future topics so that we can collaboratively enhance our understanding of the challenges and risks faced by those within AI labs. Together, we can continue to amplify and safeguard the voices of those working within AI labs who courageously address the challenges and risks they encounter.

If you found this newsletter valuable, please consider sharing it with colleagues or peers who are equally invested in shaping a safe and ethical future for AI.

Until next time,
The OAISIS Team

OAISIS: Inside AI

Discussion about this post