chatGPT

Coin Telegraph

Machine Learning

News

August 8, 2023 Coin Telegraph 0

ChatGPT and Claude are ‘becoming capable of tackling real-world missions,’ say scientists

now viewing

ChatGPT and Claude are ‘becoming capable of tackling real-world missions,’ say scientists

August 8, 2023 Coin Telegraph

Market Cap of Top Five Stablecoins Surges to New All-Time High of 4,700,000,000, According to Analyst

now playing

Market Cap of Top Five Stablecoins Surges to New All-Time High of $204,700,000,000, According to Analyst

August 8, 2023 Coin Telegraph

now playing

5 Factors Behind Wall Street’s and Crypto’s Trillion-Dollar Market Meltdown

August 8, 2023 Coin Telegraph

now playing

EU Ministers Fear Trump’s Crypto Policies, Sources Say

August 8, 2023 Coin Telegraph

now playing

Coinbase To Launch 24/7 Bitcoin and Ethereum Futures Contracts in ‘Next Evolution’ of US Markets

August 8, 2023 Coin Telegraph

now playing

Coinbase Derivatives to Launch 24/7 Bitcoin and Ethereum Futures

August 8, 2023 Coin Telegraph

now playing

Strategy shares down 30% since Saylor’s Forbes cover

August 8, 2023 Coin Telegraph

now playing

Crypto needs policy change more than Bitcoin reserve — Execs

August 8, 2023 Coin Telegraph

now playing

Coinbase to launch 24/7 BTC, ETH futures in US

August 8, 2023 Coin Telegraph

now playing

Bitcoin miner CleanSpark to join S&P SmallCap 600 Index

August 8, 2023 Coin Telegraph

Ethereum’s .8K Wake-Up Call: Can Tech Outshine Market Skepticism?

now playing

Ethereum’s $1.8K Wake-Up Call: Can Tech Outshine Market Skepticism?

August 8, 2023 Coin Telegraph

Source: Coin Telegraph

The scientists developed a tool called “AgentBench” to benchmark LLM models as agents.

Nearly two dozen researchers from Tsinghua University, Ohio State University and the University of California at Berkeley collaborated to create a method for measuring the capabilities of large language models (LLMs) as real-world agents.

LLMs such as OpenAI’s ChatGPT and Anthropic’s Claude have taken the technology world by storm over the past year, as cutting-edge “chatbots” have proven useful at a variety of tasks, including coding, cryptocurrency trading and text generation.

Typically, these models are benchmarked based on their ability to output text perceived as humanlike or by their scores on plain-language tests designed for humans. By comparison, far fewer papers have been published on the subject of LLM models as agents.

Artificial intelligence (AI) agents perform specific tasks, such as following a set of instructions within a specific environment. For example, researchers will often train an AI agent to navigate a complex digital environment as a method for studying the use of machine learning to develop autonomous robots safely.

Traditional machine learning agents like the one in the video above aren’t typically built as LLMs due to the prohibitive costs involved with training models such as ChatGPT and Claude. However, the largest LLMs have shown promise as agents.

The team from Tsinghua, Ohio State and UC Berkeley developed a tool called AgentBench to evaluate and measure LLM models’ capabilities as real-world agents, something the team claims is the first of its kind.

According to the researchers’ preprint paper, the main challenge in creating AgentBench was going beyond traditional AI learning environments — video games and physics simulators — and finding ways to apply LLM abilities to real-world problems so they could be effectively measured.

*Flowchart of AgentBench’s evaluation method. Source: Liu, et al*

What they came up with was a multidimensional set of tests that measures a model’s ability to perform challenging tasks in a variety of environments.

These include having models perform functions in an SQL database, working within an operating system, planning and performing household cleaning functions, shopping online, and several other high-level tasks that require step-by-step problem-solving.

Per the paper, the largest, most expensive models outperformed open-source models by a significant amount:

“[W]e have conducted a comprehensive evaluation of 25 different LLMs using AgentBench, including both API-based and open-source models. Our results reveal that top-tier models like GPT-4 are capable of handling a wide array of real-world tasks, indicating the potential for developing a potent, continuously learning agent.”

The researchers went so far as to claim that “top LLMs are becoming capable of tackling complex real-world missions” but added that open-sourced competitors still have a “long way to go.”

Go to Source
Author: Tristan Greene

Coin Telegraph

ChatGPT and Claude are ‘becoming capable of tackling real-world missions,’ say scientists

Related posts:

Share this video