The firm investigated four distinct “sabotage” threat vectors for AI and determined that “minimal mitigations” were sufficient for current models.
Artificial intelligence firm Anthropic recently published new research identifying a set of potential “sabotage” threats to humanity posed by advanced AI models.
According to the company, the research focused on four specific ways a malicious AI model could trick a human into making a dangerous or harmful decision.
Source: Anthropic