12 points by PaulHoule 8 hours ago | | 11 comments

Makes a lot of sense to me, so am I. These models are not trained to generate truthful statements, but statements that sound plausible… just like fake news.

I think this is more interesting:

> [LLMs] are more prone to flagging LLM-generated content as fake news while often misclassifying human-written fake news as genuine.

> To address this, we introduce a mitigation strategy that leverages adversarial training with LLM-paraphrased genuine news. The resulting model yielded marked improvements in detection accuracy for both human and LLM-generated news.

reply

Sounds a bit like the urban myth about the tank classifier, where all the photos of the enemy tanks were taken on a cloudy day.

How is the LLM supposed to determine the truthfulness of a thing?

reply

> How is the LLM supposed to determine the truthfulness of a thing?

I read on HN thal LLMs know literaly everything (electronics, computer science, physics). /s

I was curious to see specifics of what the authors consider fake news sources.

I didn't see any concrete identification in the paper itself, but following one of the references[1] I found this:

> Within our list of news sites, we differentiate between “unreliable news websites” and “reliable news websites.” Our list of unreliable news websites includes 1,142 domains labeled as conspiracy/pseudoscience” by mediabiasfactcheck.com as well as those labeled as “unreliable news”, misinformation, or disinformation by prior work (Hanley, Kumar, and Durumeric 2023; Barret Golding 2022; Szpakowski 2020).

> Our set of “unreliable” or misinformation news websites includes websites like realjewnews.com, davidduke.com, thegatewaypundit.com, and breitbart.com. We note that despite being labeled unreliable every article from each of these websites is not necessarily misinformation.

> Our set of “reliable” news websites consists of the news websites that were labeled as belonging to the “center”, “center-left”, or “center-right” by Media Bias Fact Check as well as websites labeled as “reliable” or “mainstream by other works (Hanley, Kumar, and Durumeric 2023; Barret Golding 2022; Szpakowski 2020). This set of “reliable news websites” includes websites like washingtonpost.com, reuters.com, apnews.com, cnn.com, and foxnews.com

The methodological issues here, and the trajectory following from ignoring them wholesale, are considerably more interesting than the study itself, to me.

1: "From January 1, 2022, to April 1, 2023, there was a dramatic surge in synthetic articles, especially on misinformation news websites (Hanley and Du- rumeric, 2023)." which links to https://arxiv.org/pdf/2305.09820.pdf

reply

We already know that humans can't detect fake news. So how can humans train a model to detect fake news?

reply

I see no obvious contradiction here.

Humans can't fly, but can build machines that do.

I said train a model. This paper is about training LLMs. If you can't generate training data, then how can you train an LLM?

And if humans could build some other type of truth machine, how would they know it was working if they themselves are not truth machines?

Although the Münchhausen trilemma demonstrates that perfection is impossible, what is certainly not impossible is to create automated tests for validity which can exceed the normal functioning of our brains.

All those tests have to do is mimic system 2 thinking all the time, while we mere humans have to switch to system 1 often because 2 is slower.

We can generate and label data that holds the desired properties with no problems, after all humans are the authors of both authentic and fake news. This does not hinge on the ability to differentiate the data after it is generated.

Humans are good at detection g fake news when critically analysing what they read, however that's a cognitive expense few can spare on day to day information and so miss fake news in day to day life. Not because they can't spot it but because they don't have the time and energy to spare to critically evaluate everything they come across when they're trying to engage with things for the sake of recreation.