Pirate Bay is probably not the most optimal analogy, more like Anna's Archive imho [1], individually offered by web property scrape runs compressed into a package, maybe served by torrents like this Academic Torrents site example [2].
Scraper engine->validation/processing/cleanup->object storage->index + torrent serving is rough pipeline sketch.
Scraper engine->validation/processing/cleanup->object storage->index + torrent serving is rough pipeline sketch.
[1] https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu... ("HN Search: annas archive")
[2] https://academictorrents.com/details/9c263fc85366c1ef8f5bb9d... ("AcademicTorrents: Reddit comments/submissions 2005-06 to 2023-12 [2.52TB]")