I would be very surprised if this were true. I volunteer with standardebooks.org on getting books in the public domain transcribed, proofread, and typeset for modern e-readers. There are a swarm of books that we would like to upload but can't, due to unclear copyright status on them. This includes stuff that everyone and their mother has probably bought several times already, including the works of HP Lovecraft, Robert E. Howard, etc. For most of these, we still follow the Mickey Mouse rule, and do not proceed until we are absolutely sure they're safe.
Case in point, I've been thinking about transcribing The Worm Ouroboros[0] for some time, as it should be in the public domain even by conservative estimates, and yet I can't formally verify its copyright status, so I haven't yet.
Alex from SE here. I wanted to transcribe Worm a few years ago too. IIRC the problem was that PG-AU had a post-1923 edition transcribed. The 1922 first edition is extremely rare. The Newberry Research Library in Chicago has a copy and I actually flipped through it once. But you can't check out those kinds of rare books. They can scan books for you but at a fairly steep cost and I didn't want to do that at the time.
Maybe the situation has changed since then! But I'd love to have that book in our catalog.
Would I be able to take photos of the book pages if I'm in Chicago without checking it out (on prem capture)? I am familiar with using a DSLR to non-destructively capture book contents, and am willing to make the time to do so. Looks like you even get 2 hours of parking for free!
Possibly, you should ask them. However most decent OCR requires a flat top scanner or (in the case of rare books like this) a scanning device that holds pages firmly and takes pictures from above. I imagine that bringing your DSLR and doing it by hand for 200+ pages would be extremely tedious and error-prone.
You can get decent ocr from using thinner books to level the pages by supporting one side of a book as you flip and holding open with your hands. Source: I have ocred approx 200,000 pages this way for one of my websites. It is not the best, but it works surprisingly well as a poor man's just get it done method. Aim for at least 300 ppi and ABBYY can figure it out. 200 ppi still works well.
I have a portable cradle I can bring with me that will hold the camera above while I flip pages. I emailed Newberry Library, and will report back when I have more info.
It was published in 1922, and the author died in 1945. Seems like that book is PD by any measure.
According to the article, it looks like Gutenberg adds a layer of "let a lawyer sign off on it", presumably to keep them from getting sued out of existence. Is that the case for Worm Ouroboros?
It doesn't seem that surprising to me. The lack of clear US copyright status was entirely because there was no easy way to exhaustively search the copyright registration/renewal records, and now there is. If there is no renewal contained in these records, it is definitively not copyrighted.
It's been almost two decades since I've worked with Project Gutenberg and Distributed Proofreaders, but back then they had pro bono counsel who would provide opinions and clear works prior to publication. Does standardebooks.org not have the same?
Project is too small, from my understanding (I'm only a contributor, not a core team member). You can see from the contribution guidelines[0]:
Ebooks that are not clearly in the U.S. public domain. If it’s not on Gutenberg, we’ll probably decline it.
So we're basically piggybacking off of the copyright verification work we assume that PG has already done. This is one of the reasons I haven't started The Worm Ourboros yet--it's in Australia's Gutenberg archive, but not the US one.
With that said, their specialty is in the transcription part. If you try reading one of their public domain works on a Kindle, they're often full of formatting problems and typos, since the transcripts are sometimes sourced from OCR scans. I've known people who tried starting a free book from Gutenberg, but eventually gave up and bought the same e-book off Amazon for a dollar because it at least had a working TOC. That kind of sale saddens me greatly. The end-user thinks, "Ah, it was only a dollar, I got my money's worth," but the publisher has basically paid nothing for the work, adds a few hours of digital typesetting, and then makes 100% profit on the sale.
Standard Ebooks often uses the Gutenberg raw text as a starting point and then cleans it up. We have a set of tools used for the initial cleanup process[0] that handles pagination, TOC generation, and some other basic "modernization" steps. The texts are then proofread and edited to conform to our style guide[1], which aims for maximizing readability on modern e-reader devices, as well as adding semantic meaning to any text markup. You can look at the guide for producing such a book to get a better idea of the process.[2] The end result is a free, public domain work which looks and feels like a professional production.
Fantastic comment, thank you for your efforts (and the rest of the folks at standardebooks.org). I haven't looked, but I'd ask that whenever possible, final artifacts are also uploaded to the Internet Archive.
Case in point, I've been thinking about transcribing The Worm Ouroboros[0] for some time, as it should be in the public domain even by conservative estimates, and yet I can't formally verify its copyright status, so I haven't yet.
[0] https://en.wikipedia.org/wiki/The_Worm_Ouroboros