Wellfire Interactive // Expertise for established Django SaaS applications

Content migration: archival sites

A fallback strategy for migrating content on large and disorganized sites

Maintaining the old content in a parallel home is not an ideal solution. There’s usually a reason you moved to a new system, new design, new information architecture. Keeping the old stuff online as-is and on a subdomain requires server resources, can create confusion for your readers or customers, and in general is a pain-in-the-ass albatross ‘round the neck.

Yet if your old content is distributed across thousands of files in an onion of different systems, you may find it more important to make the content available than to maintain system purity while losing reader content. This is especially true if it includes plain HTML files (non-CMS controlled) or resource files like images and PDF files. Are you really going to extract all of that content from thousands of files in dozens or hundreds of folders? Not likely.

In this case you can move the entire old site to a subdomain and set up some permanent redirection rules to that domain. This buys you time to migrate the content if you wish and keeps alive all of those files linked to from other sites.

To do this we’re assuming that your content, no matter how messy, does subscribe to a collection of identifiable rules on at least one side of the new system/old system divide. Hopefully on the new system all media is served from one folder and all content is served up on clean URLs without file extensions. Add a rewrite rule that takes all URL requests trailed with an HTML file extension and redirects them permanent to the new archival subdomain. If the old system used a query pattern to pull pages from the database, redirect all pages including that query pattern to the archival subdomain. And send image file requests that don’t include your new media folders to the archival subdomain, too.

Your goal should be to move that old content into the new CMS over time, so that you don’t end up with another system. Again, this should be considered a strategy of last resort. But when confronted with thousands of unaccounted for files and pages, it certainly beats just tossing it all away.