Rsync'ing Project Gutenberg, a report

February 10, 2019 by Lucian Mogosanu

From the logs:

mircea_popescu: incidentally, either spyked or lobbes what do you need to make a complete gutenberg.org copy ? it IS going away, for one thing the initiator guy died and for the other thing, with their world-famous http://btcbase.org/log/2017-03-15#1627828 there's no way they'll stay online all that long.
a111: Logged on 2017-03-15 23:50 mircea_popescu: which incidentally - has been read TODAY by more people than read ALL of marcel proust's works since the making of gutenberg.org

mircea_popescu: should prolly also salvage http://www.perseus.tufts.edu/hopper/ but that's going to be more work than a straight download & strip headers job.
asciilifeform: mircea_popescu: apparently gutenberg is rsync'able ( https://archive.is/PWeNA ) , tho i haven't tried
mircea_popescu: aha. not much work.

Thusly proceeding, I read the "Mirroring How-To" guide, which pointed me to a place called ibiblio.org, which supposedly contains a full mirror of gutenberg.org -- supposedly, because on a first attempt, one can easily notice that their ftp doesn't contain said item, or if it does, it's hidden so well that I could not find it.

However, further down the line in the mirroring wiki-guide, we are given the anchor to a list of mirrors. Similarly, I randomly selected a couple of links, finding that they either timed out or didn't contain the gutenberg mirror they purport to. Fortunately, the third choice, rsync.mirrorservice.org, worked, in that I could:

$ rsync -av --del rsync://mirrorservice.org/gutenberg.org/ guten

and after three days or so of downloading, I have sitting somewhere circa 800GB of files that on a cursory glance seem to contain books and other assorted items, e.g. mp3 files and DVD images.

The mirror is currently resting on a private machine, but I will make it available in the following months, after some disk acquisition and swapping which will allow me to host it at house Mogosanu. Meanwhile, I expect that for now (and probably only in the very near future), the step above should be reproducible by other folks who wish to maintain their own mirror.

Filed under: computing.
RSS 2.0 feed. Comment. Send trackback.

One Response to “Rsync'ing Project Gutenberg, a report”

  1. [...] that we, and by "we" I mean "I", have a full copy of the Project Gutenberg archive, the first thing I can do before publishing it is to have a look at what's actually there, keeping [...]

Leave a Reply