
A group of software program engineers, AI specialists and programmers at Tsinghua University, working with TikTok guardian firm ByteDance, has introduced the event of a graphical consumer interface (GUI) agent mannequin referred to as UI-TARS. The group introduced its improvement and introduction to the world at giant in a paper posted to the arXiv preprint server.
Over the previous decade, AI purposes have flourished. A number of the most well-known are LLMs similar to ChatGPT. However others have been below improvement to serve quite a lot of functions. One utility is helping pc customers in finishing up mundane duties, similar to sourcing the most affordable airline fare for a flight between two cities after which shopping for tickets for it. Such duties sometimes contain time-consuming net looking.
AI researchers have recommended that such duties may very well be automated by good brokers. On this new research, the group in China has performed simply that with the development of UI-TARS—a GUI agent mannequin that can be utilized regionally on a private pc or through the cloud on different gadgets.
The mannequin was educated utilizing 50 billion tokens that represented traits of a GUI (through screenshots), similar to these discovered on conventional net pages. Coaching additionally concerned reflection tuning, which meant the mannequin was programmed to study from errors after which to adapt, modifying the way it approached totally different or unknown conditions.
When working UI-TARS, a consumer is offered with two tabs—one exhibits the “thinking process” that the app is present process because it goes about its general job. The opposite tab exhibits the web sites, information or different GUIs that the app is working with. Thus, if it was used to guide a flight, a consumer might see the airline web sites being seen and will then swap over to see what the app was doing with them.
On the finish of the method, the consumer is offered with the ultimate net web page prompting affirmation of ticket buy. In testing their mannequin, the group discovered that it outperformed different AI fashions similar to GPT-4o, or Gemini-2.0.
Extra data:
Yujia Qin et al, UI-TARS: Pioneering Automated GUI Interplay with Native Brokers, arXiv (2025). DOI: 10.48550/arxiv.2501.12326
UI-TARS: github.com/bytedance/UI-TARS
© 2025 Science X Community
Quotation:
UI-TARS GUI agent mannequin can automate duties similar to discovering and reserving airline tickets (2025, January 23)
retrieved 23 January 2025
from https://techxplore.com/information/2025-01-gui-agent-automate-tasks-airline.html
This doc is topic to copyright. Aside from any truthful dealing for the aim of personal research or analysis, no
half could also be reproduced with out the written permission. The content material is supplied for data functions solely.
Click Here To Join Our Telegram Channel
Source link
You probably have any considerations or complaints relating to this text, please tell us and the article might be eliminated quickly.Â