Most PDF scientific articles sadly don't have good embedded metadata [0], so thi...

afandian · on Oct 26, 2015

What's "the DOI issue"?

(Also, if your favourite publisher isn't putting the metadata you want in PDFs, write to them and ask! It may make a difference...)

rossmounce · on Oct 26, 2015

the DOI issue is as described by "aroch"

Quickscrape is capable of downloading PDFs, given a DOI I think: https://github.com/ContentMine/quickscrape

If publishers listened to their readers we would have had 100% Open Access ten years ago. Traditional academic publishers do not listen and don't care.

Even if I could convince say PLOS to do something about this it wouldn't change much. We need all or the majority of publishers to provide good embedded metadata. Not just an isolated one or two. I don't see a good mechanism for making that happen, sadly.

afandian · on Oct 26, 2015

Well hopefully the coverage of full-text link metadata in Crossref will increase over time. Until then, best of luck to ContentMine!

Improving metadata coverage is a different issue than changing business model. Objectively, one's easier to implement than the other.

It is indeed tricky getting ~5,000 publishers to provide optimal metadata, but it's a good thing to have an industry-standard platform to do it in.

(I work at Crossref)

gcr · on Oct 27, 2015

Do you mean having all DOIs for all referenced papers included in the pdf metadata somehow? From my experience, authors don't know LaTeX well enough to do that. Half of them don't even read the submission guidelines that clearly say not to put page numbers on the camera-ready paper; they're never going to understand complicated metadata commands...

afandian · on Oct 27, 2015

The metadata in papers is the responsibility of publishers not authors.