Convert website to static files using httrack

Use the httrack tool to archive a dynamic website.
When a CMS website is no longer updated with content or a campaign has ended, often the pages itself should be archived for reference. But it’s not always possible to update all of the CMS along the way. That’s why it’s convenient to know how to archive a site to static HTML files.

I prefer using httrack. Karen from Lullabot developed the following workflow:

brew install httrack

httrack "http://${root_uri}" -O "$targetdir" -N "%h%p/%n/index%[page].%t" -WqQ%v --robots=0

find . -name "*.html" -type f -print0 | xargs -0 perl -i -pe '/((?<![\'"])\/index.html|(?<=[\'"]\/)index.html)\b//g'

If the source site uses HTTP authentication, provide username and password as part of the URL: username:password@your.url

The last command is optional but useful if you want to preserve the URL paths (for inbound links).