CMU Researcher Uses ChatGPT To Execute Computer Tasks

Large Language Models Can Solve Computer Tasks Using Keyboard, Mouse Actions

Tuesday, May 23, 2023 - by Aaron Aupperlee

CMU-led research shows that large language models can perform tedious or repetitive tasks by completing keyboard and mouse actions.
CMU-led research shows that large language models can perform tedious or repetitive tasks by completing keyboard and mouse actions.

Research spearheaded by Carnegie Mellon University shows that AI systems such as ChatGPT and Midjourney, known best for generating text, code or images, can also handle repetitive tasks.

Stephen McAleer, a postdoctoral fellow in the Computer Science Department at CMU's School of Computer Science, worked with his colleagues to demonstrate that large language models (LLMs) such as ChatGPT can be used to perform general computer tasks by completing keyboard and mouse actions. These tasks range from file management and internet searches to email handling and form completion. This work is part of a new line of research shifting from generative AI systems to AI that takes action on computers, expanding the potential for AI applications.

"This approach opens up the possibility of many new applications and products that can navigate the web and automate repetitive tasks," McAleer said. "The ultimate goal is to enable the agent to do anything on a computer that a human can do."

The application improves its output through a recursive process of criticism and improvement, a method McAleer and his colleagues have coined RCI. With RCI, a pretrained LLM like ChatGPT is assigned a task, formulates a plan, executes it, and then reviews its performance to identify and rectify issues.

McAleer, along with Geunwoo Kim and Pierre Baldi from the Department of Computer Science at the University of California, Irvine, recently published a paper detailing their work.

"Current AI provides transformative opportunities and at the same time raises formidable challenges," Baldi said. "This work is on the side of the opportunities, showing how computer work can be greatly facilitated if not completely automated using large language models."

Their research demonstrated that RCI outperforms all existing methods on a popular computer task benchmark. Whereas existing methods could require tens of thousands of demonstrations to learn a task, RCI typically requires only two or three demonstrations and has a 94% success rate across 55 tasks.

McAleer's interest in automating basic computer tasks emerged during his junior year of college, when an internship at a bank found him continually repeating the same tasks. This experience led him to two realizations: a disinterest in banking and a desire to automate tedious tasks.

"I've been working toward that goal ever since," McAleer said.

Enabling LLMs to perform tasks humans find dull or repetitive could improve work, enhance productivity and foster prosperity. Successful models could expedite economic growth, automate scientific processes and save people valuable time to do things they love.

McAleer does advise caution, however. AI tools could become increasingly powerful, and his research shows that LLMs can self-improve. Consequently, it's essential to start having discussions about rules, regulations and safeguards to protect society from potential AI harm and ensure its benefits are universally accessible.

"Progress in AI has been moving more rapidly than I expected and it's time to start thinking about what happens if we are successful. What if autonomous agents could start making money on their own just by using computers? I can imagine scenarios where this would quickly get out of control," McAleer said, "and I'm thinking hard about how to advance this technology the right way."

More information about RCI can be found on the project's website.

For more information, Contact:
Aaron Aupperlee | 412-268-9068 | aaupperlee@cmu.edu