Convert website to static files using httrack

1 minute read Modified:

Use the httrack tool to archive a dynamic website.
Table of Contents

When a CMS website is no longer updated with content or a campaign has ended, often the pages itself should be archived for reference. But it’s not always possible to update all of the CMS along the way. That’s why it’s convenient to know how to archive a site to static HTML files.

There is excellent description about archiving Drupal sites on drupal.org.

I prefer using httrack. Karen from Lullabot developed the following workflow:

brew install httrack

httrack "http://${root_uri}" -O "$targetdir" -N "%h%p/%n/index%[page].%t" -WqQ%v --robots=0

find . -name "*.html" -type f -print0 | xargs -0 perl -i -pe '/((?<![\'"])\/index.html|(?<=[\'"]\/)index.html)\b//g'

If the source site uses HTTP authentication, provide username and password as part of the URL: username:password@your.url

The last command is optional but useful if you want to preserve the URL paths (for inbound links).