BI obtained an internal list of websites that could and couldn’t be used for training Anthropic’s latest AI models.
Anthropic’s contractor Surge AI left the list fully public on Google Docs.
‘Sites you can use’ include Bloomberg, Harvard, & the Mayo Clinic.
Many of the whitelisted sources copyright or otherwise restrict their content.
At least 3 – the Mayo Clinic, Cornell University, & Morningstar – told BI they didn’t have any AI training agreements with Anthropic.
The spreadsheet also includes a blacklist of websites that Surge AI’s gig workers were “now disallowed” from using.
The blacklist includes companies like the NYT & Reddit which have sued AI startups for scraping without permission.