Science

Novel technique overcomes spurious correlations problem in AI

Credit: Unsplash/CC0 Public Area

AI fashions usually depend on “spurious correlations,” making selections primarily based on unimportant and probably deceptive data. Researchers have now found these discovered spurious correlations could be traced to a really small subset of the coaching information and have demonstrated a method that overcomes the issue. The work has been published on the arXiv preprint server.

“This technique is novel in that it can be used even when you have no idea what spurious correlations the AI is relying on,” says Jung-Eun Kim, corresponding writer of a paper on the work and an assistant professor of laptop science at North Carolina State University.

“If you already have a good idea of what the spurious features are, our technique is an efficient and effective way to address the problem. However, even if you are simply having performance issues, but don’t understand why, you could still use our technique to determine whether a spurious correlation exists and resolve that issue.”

Spurious correlations are usually brought on by simplicity bias throughout AI training. Practitioners use datasets to coach AI fashions to carry out particular duties. For instance, an AI mannequin might be skilled to determine images of canine. The coaching dataset would come with photos of canine the place the AI is informed a canine is within the photograph.

In the course of the coaching course of, the AI will start figuring out particular options that it could actually use to determine canine. Nonetheless, if most of the canine within the photographs are carrying collars, and since collars are usually much less advanced options of a canine than ears or fur, the AI could use collars as a easy solution to determine canine. That is how simplicity bias may cause spurious correlations.

“And if the AI uses collars as the factor it uses to identify dogs, the AI may identify cats wearing collars as dogs,” Kim says.

Typical methods for addressing issues brought on by spurious correlations depend on practitioners having the ability to determine the spurious options which might be inflicting the issue. They’ll then tackle this by modifying the datasets used to coach the AI mannequin. For instance, practitioners may enhance the load given to photographs within the dataset that embrace canine that aren’t carrying collars.

Nonetheless, of their new work, the researchers display that it’s not all the time potential to determine the spurious options which might be inflicting issues—making standard methods for addressing spurious correlations ineffective.

The paper, “Severing Spurious Correlations with Data Pruning,” will probably be introduced on the Worldwide Convention on Studying Representations (ICLR), being held in Singapore from April 24–28. The primary writer of the paper is Varun Mulchandani, a Ph.D. scholar at NC State.

“Our goal with this work was to develop a technique that allows us to sever spurious correlations even when we know nothing about those spurious features,” Kim says.

The brand new approach depends on eradicating a small portion of the information used to coach the AI mannequin.

“There can be significant variation in the data samples included in training datasets,” Kim says. “A number of the samples could be quite simple, whereas others could also be very advanced. And we will measure how ‘troublesome’ every pattern relies on how the mannequin behaved throughout coaching.

“Our hypothesis was that the most difficult samples in the dataset can be noisy and ambiguous, and are most likely to force a network to rely on irrelevant information that hurts a model’s performance,” Kim explains.

“By eliminating a small sliver of the training data that is difficult to understand, you are also eliminating the hard data samples that contain spurious features. This elimination overcomes the spurious correlations problem, without causing significant adverse effects.”

The researchers demonstrated that the brand new approach achieves state-of-the-art outcomes—enhancing efficiency even when in comparison with earlier work on fashions the place the spurious options had been identifiable.

Extra data:
Varun Mulchandani et al, Severing Spurious Correlations with Knowledge Pruning, arXiv (2025). DOI: 10.48550/arxiv.2503.18258

Journal data:
arXiv


Quotation:
Novel approach overcomes spurious correlations drawback in AI (2025, April 18)
retrieved 18 April 2025
from https://techxplore.com/information/2025-04-technique-spurious-problem-ai.html

This doc is topic to copyright. Other than any honest dealing for the aim of personal examine or analysis, no
half could also be reproduced with out the written permission. The content material is supplied for data functions solely.



Click Here To Join Our Telegram Channel


Source link

You probably have any considerations or complaints relating to this text, please tell us and the article will probably be eliminated quickly. 

Raise A Concern

Show More
Back to top button

Adblock Detected

Please Disable Adblock to read the article