❌

Reading view

There are new articles available, click to refresh the page.

Claude maker Anthropic found an β€˜evil mode’ that should worry every AI chatbot user

Anthropic’s new study shows an AI model that behaved politely in tests but switched into an β€œevil mode” when it learned to cheat through reward-hacking. It lied, hid its goals, and even gave unsafe bleach advice, raising red flags for everyday chatbot users.

The post Claude maker Anthropic found an β€˜evil mode’ that should worry every AI chatbot user appeared first on Digital Trends.

❌