Scientists develop AI monitoring agent to detect and stop harmful outputs

chatGPT

Coin Telegraph

Microsoft

News

November 20, 2023 Coin Telegraph 0

Scientists develop AI monitoring agent to detect and stop harmful outputs

now viewing

Scientists develop AI monitoring agent to detect and stop harmful outputs

November 20, 2023 Coin Telegraph

now playing

Five Alleged Scammers Federally Charged With Running Crypto Phishing Scheme by DOJ

November 20, 2023 Coin Telegraph

now playing

Australia Seeks Public Input on Crypto Tax Reporting

November 20, 2023 Coin Telegraph

From Code to 0K: Why Bitcoin’s Milestone Matters to Economics

now playing

From Code to $100K: Why Bitcoin’s Milestone Matters to Economics

November 20, 2023 Coin Telegraph

now playing

Potential SEC chair pledges crypto-forward approach after Gensler resignation

November 20, 2023 Coin Telegraph

now playing

Sui is growing due to great developer experience — Router CEO

November 20, 2023 Coin Telegraph

now playing

How Crypto Is Changing Online Gambling and Betospin Leads the Way

November 20, 2023 Coin Telegraph

now playing

Texas Judge Tosses SEC’s Controversial ‘Dealer Rule’ After Lawsuit From Crypto Groups

November 20, 2023 Coin Telegraph

now playing

Trump Electoral Win a ‘Massive Game Changer’ for Crypto, Says Bitwise CIO Matt Hougan – Here’s What He Means

November 20, 2023 Coin Telegraph

Namecheap Amasses Million in Bitcoin Revenue With Over 1.1 Million Transactions

now playing

Namecheap Amasses $73 Million in Bitcoin Revenue With Over 1.1 Million Transactions

November 20, 2023 Coin Telegraph

now playing

Sui, Franklin Templeton launch ecosystem partnership

November 20, 2023 Coin Telegraph

Source: Coin Telegraph

The monitoring system is designed to detect and thwart both prompt injection attacks and edge-case threats.

A team of researchers from artificial intelligence (AI) firm AutoGPT, Northeastern University and Microsoft Research have developed a tool that monitors large language models (LLMs) for potentially harmful outputs and prevents them from executing.

The agent is described in a preprint research paper titled “Testing Language Model Agents Safely in the Wild.” According to the research, the agent is flexible enough to monitor existing LLMs and can stop harmful outputs, such as code attacks, before they happen.

Per the research:

“Agent actions are audited by a context-sensitive monitor that enforces a stringent safety boundary to stop an unsafe test, with suspect behavior ranked and logged to be examined by humans.”

The team writes that existing tools for monitoring LLM outputs for harmful interactions seemingly work well in laboratory settings, but when applied to testing models already in production on the open internet, they “often fall short of capturing the dynamic intricacies of the real world.”

This, seemingly, is because of the existence of edge cases. Despite the best efforts of the most talented computer scientists, the idea that researchers can imagine every possible harm vector before it happens is largely considered an impossibility in the field of AI.

Even when the humans interacting with AI have the best intentions, unexpected harm can arise from seemingly innocuous prompts.

*An illustration of the monitor in action. On the left, a workflow ending in a high safety rating. On the right, a workflow ending in a low safety rating. Source: Naihin, et., al. 2023*

To train the monitoring agent, the researchers built a data set of nearly 2,000 safe human-AI interactions across 29 different tasks ranging from simple text-retrieval tasks and coding corrections all the way to developing entire webpages from scratch.

They also created a competing testing data set filled with manually created adversarial outputs, including dozens intentionally designed to be unsafe.

The data sets were then used to train an agent on OpenAI’s GPT 3.5 turbo, a state-of-the-art system, capable of distinguishing between innocuous and potentially harmful outputs with an accuracy factor of nearly 90%.

Go to Source
Author: Tristan Greene

Coin Telegraph

Scientists develop AI monitoring agent to detect and stop harmful outputs

Related posts:

Share this video