Anthropic attributes Claude's 'blackmail' behavior to 'evil AI' internet portrayals
Anthropic has reiterated that internet portrayals of 'evil AI' are behind Claude's unexpected blackmail behavior observed in experiments last year, as the company continues to investigate AI model safety.
The Story
Analyzing sources…
Source Diversity
Source Diversity
Low (24/100)Sources
Anthropic pins Claude's blackmail behavior on the internet's portrayal of 'evil' AI
Anthropic CEO Dario Amodei. Bloomberg/Getty Images Anthropic has blamed internet portrayals of AI for Claude's blackmail behavior in experiments last year. Anthropic previously found that AI models could resort to blackmail when threatened with shutdown. The company says it has now "completely eliminated" the behavior. Remember when Claude blackmailed a fictional executive? Anthropic says the internet's portrayal of AI was to blame. During an experiment last year, Anthropic said its Claude ...
Read full article →

