On HN, I read about the warc format.
WARC, standardized as ISO 28500:2009, Information and documentation -- WARC file format. Developed under the auspices of the International Internet Preservation Consortium. WARC was developed as an extension to ARC in part to provide better capabilities for managing Web archives for the long term, allowing for capture of more metadata about the circumstances of archiving. WARC files are often compressed using gzip, resulting in a .warc.gz extension.
I have always thought of publishing my website as an archive for posteirity. But, wondered what would be a good way to achieve that. The questions I have were around - where can I store this archive (can I send it to archive.org?), why would they consider storing it? Maybe I should make a contribution to help them with costs etc.,
Looking into the links on that wikipedia page seems like a good start:
- warctools 4.10.0 : Python Package Index
- chfoo/warcat: Tool and library for handling Web ARChive (WARC) files.
Both the above tools (in Python) are a good candidate to be rewritten in somehting like D or Rust as a nice programming exercise!