Most PDF scientific articles sadly don't have good embedded metadata [0], so this & the DOI issue make this not very useful (at least for the journals I read).
I also would have implemented this with a simpler shell script calling exiftool[1] and pdftotext[2], but hey; fun to have a python-based implementation :)
If publishers listened to their readers we would have had 100% Open Access ten years ago. Traditional academic publishers do not listen and don't care.
Even if I could convince say PLOS to do something about this it wouldn't change much. We need all or the majority of publishers to provide good embedded metadata. Not just an isolated one or two. I don't see a good mechanism for making that happen, sadly.
Do you mean having all DOIs for all referenced papers included in the pdf metadata somehow? From my experience, authors don't know LaTeX well enough to do that. Half of them don't even read the submission guidelines that clearly say not to put page numbers on the camera-ready paper; they're never going to understand complicated metadata commands...
I also would have implemented this with a simpler shell script calling exiftool[1] and pdftotext[2], but hey; fun to have a python-based implementation :)
[0] http://rossmounce.co.uk/2012/12/31/pdf-metadata-why-so-poor/ [1] http://www.sno.phy.queensu.ca/~phil/exiftool/ [2] http://poppler.freedesktop.org/