Hacker News new | past | comments | ask | show | jobs | submit login

The side-by-side display makes it pretty easy to distinguish the one from the other, simply compare them at a level where the one that makes the least sense is the one that is nonsense. Like that I score 10/11. But when looking at just the left side one suddenly the problem is much harder, and I'm happy to get better than even. Bits that don't help: not an English native writer. Seen too many real life papers with crappy writing that quite a few of these look plausible, especially when they are about fields that I know very little of.

Presumably when you're a native English speaker and have a broader interest the difficulty goes down a bit.

I like this project very much and would like to see some overall scores, and it might not hurt to allow for a verified result link to detect bragging rather than actual results (not that anybody on HN would ever brag about their score ;) ).

Overall: I'm not worried that generated papers will swamp the publications any day soon but for spam/click farms this must be a godsend and for sure it will cause trouble for search engines to classify real content from generated content.




and for sure it will cause trouble for search engines to classify real content from generated content.

Fortunately, SEO spam is currently nowhere near as coherent as this, and often features some phrases that are a dead giveaway ("Are you looking for X? You've come to the right place!" or a strangely-thesaurised version thereof), but I am also worried about this new generation of manufactured deception.


I tried making fake tweets using GPT-2 two years ago. When I actually interviewed people to verify my model I got good results for people who didn't actively engage with twitter versus people who regularly engaged in twitter (note that this was an N ~ 10 people and it was limited to GPT-2 774M.

I found that people would also refuse the test and would believe whatever the output of the model was due to my choice of subject.

Others that did a similar exercise and tried to verify their results using reddit had a great deal of people who would be able to spot fakes quite easily.

The biggest issue would be someone using a system to deliberately fool a targeted set of people which is easy given how ad networks are run.


Having read your comment first (ooh, horribile dictu on HN) I decided to try playing by only looking at the left paper and deciding if it was fake. Luckily the model seems to have picked up that "last names can be units" too strongly and the 2nd fake paper was discussing a frequency of "10 Jones".


Good one :) What's your score across 10 in sequence just looking at one? I get about 7 out of 10 that way if I repeat it a number of times.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: