Hacker News new | past | comments | ask | show | jobs | submit login

Pirate Bay is probably not the most optimal analogy, more like Anna's Archive imho [1], individually offered by web property scrape runs compressed into a package, maybe served by torrents like this Academic Torrents site example [2].

Scraper engine->validation/processing/cleanup->object storage->index + torrent serving is rough pipeline sketch.

[1] https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu... ("HN Search: annas archive")

[2] https://academictorrents.com/details/9c263fc85366c1ef8f5bb9d... ("AcademicTorrents: Reddit comments/submissions 2005-06 to 2023-12 [2.52TB]")




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: