Ask Scott: Tools for content migration planning?

Published

I’ve lost track of where this question came from. Based on my writing workflows and where I found this long-neglected draft, I’m ~80% sure this is an expanded and clarified version of an answer I gave extemporaneously during an event Q&A.

Regardless, while content audits and migrations are not my unique area of expertise, I’ve done my share over the years. I thought it might be helpful to publish this summary of the primary tools and vocabulary I use to talk about this kind of work.

Q: What tools do you recommend for a big content migration? For example, auditing thousands of pages of content and then migrating and transforming that content to a new content management system?

What I’ve seen work well is a combination of page tables, which are documents, and a content matrix, which is a spreadsheet. Both of these can be made in any office suite you already have access to. If you haven’t already tried this basic approach, give it a try. Content migrations are hard enough as it is without complicating things!

Oh, and you’ll also need a little magic. But first, let’s [something something Matrix film reference].

What is a content matrix?

Content matrix is a fancy name for a spreadsheet that tracks an entire content migration or site rebuild or similar — a forward-looking content inventory, if you will. The term comes from the content management system (CMS) and developer world, where a content migration could be a purely technical endeavor that doesn’t involve any auditing or content transformation, shudder to think. But they’re common enough that the term has stuck even for more hands-on, content strategy flavored projects.

The content matrix is where your plan will live. It holds the status of every single page or similar content deliverable that can be given an ID number in your inventory system. The matrix can even contain actual data or information about content items, such as tags and keywords, old URLs and new slugs for redirects, page titles, and more. But none of that is necessary for the way I’m talking about them here. When you hear “content matrix”, you can think “migration plan”.

If you, like me, are not a “data” person, and have not historically spent a lot of time wrangling large spreadsheets, it might feel overwhelming at first to organize your entire migration plan and production tracking into one document. But I think you’ll find that there’s just a bit of a break-in period. A content matrix spreadsheet is like a new pair of Doc Martens: you’ve got to put on some thick socks and get to walking! After a few weeks, your “overwhelming” content matrix will start to soften up and take the shape of your feet, er, project. Sorry, got lost in the metaphor.

I’m encouraging you to stick out the discomfort because a huge content migration is not something that happens quickly (understatement alert). You and this matrix, this spreadsheet, are going to be friends for a long, long time. So give it a chance. Break it in. And don’t overcomplicate things from the very beginning if you don’t have to.

Look, I can’t exaggerate how little I know about using the advanced features of apps like Excel or Google Sheets, so believe me when I say that a little knowledge and some basic features will go a long way. Some data validation here, collapsed rows there, and pretty soon you’re going to feel like a content migration wizard. It’s often really, truly, all you need, even for some of the biggest, multi-site content migrations you can imagine.

If you’ve already completed a content inventory and/or audit in a spreadsheet, a lot of your work is already done! Make a new copy of the inventory, or some new sheets within the same document, and start hacking and slashing to build your content matrix on top of it.

What are page tables?

Page tables are the thing in-between the design for a content type (e.g. the complete visual layout and content model specification for a “press release” content type) and a specific instance of that content type (e.g. the June 19, 2022 press release). A page table is effectively a worksheet, and has all the stuff you have to “fill in” to complete a given piece of content. You could further consider a blank worksheet — a blank table — for a given content type a content template. I don’t find this to be an overly useful distinction, and end up talking about empty, in-progress, and completed worksheets alike as page tables.

As recommended in this overview of page tables from Pickle Jar, I like to use a document format, especially collaborative ones like Google Documents, to create page tables because it’s very accessible and shareable. You don’t need access to a special tool, and anyone who’s mastered the art of typing into a box and leaving comments and otherwise collaborating the way you’re likely used to collaborating already can contribute to the content.

Documents are flexible, too … if even just 1 out of 100 content items for a content type you have hundreds or thousands of has some sort of quirk or exception, that can become a headache really quickly if you’re locked into some sort of rigid database or form. WIth a page table, you can just, bloop!, add another row. Easy-peasy. Documents can also make it easy to show before and after — for regulated or keystone content that you need a lot of internal eyeballs on, you can easily add another column to show before and after, or even alternate versions. (Pro tip: If you’re not going to be printing these page tables, and you probably won’t, consider giving yourself more room with a tabloid or other large digital “paper” size … it will still render nicely on screen, but allow room for more columns or larger text.)

You can add a column to your content matrix for “Page Table” and include a link to any work-in-progress or completed page tables for a given piece of content. You can further add a “current page” column right next to it to link to existing or previous versions of the content, which can help your writers or reviewers.

You said something about magic?

Yes! The magic that makes these two document types work together is the ID System. Every item in your migration plan should have a unique identification number. You can use whatever you like as long as it’s unique. The system I’ve used most often uses the first number to specify a section, the second number to identify a page/document within that section, and any further numbers for further divisions, e.g. sections within a complex content type like a product landing page.

Here is an example of what I’m talking about from a stub of an old in-house project. Here, the top-level IDs (e.g. 1.x) represent conceptual groupings, whereas the added incremental letters at the end of the IDs indicate an additional status or modality for a given piece of content — so something that might have to get produced separately, but is still part of the page:

  • 1.x (Technical Services General)
    • 1.0 Technical Services Overview
      • 1.0a QuickStart Brief
      • 1.0b Personal Training Brief
      • 1.0c Project Support Brief
    • 1.1 General Inquiry Form
      • 1.1a On-site Confirmation Message
  • 2.x (QuickStart Programs)
    • 2.0 QuickStart Overview
    • 2.1 For Individuals information
    • 2.2 Individuals Inquiry Form
      • 2.2a Confirmation Message
    • 2.3 For Organizations information
    • 2.4 Organizations Inquiry Form
      • 2.4a Confirmation Message
  • 3.x (Personal Training)
    • 3.0 Personal Training Overview
      • 3.0a Inquiry Form
  • 4.x (Project Support)
    • 4.0 Project Support Overview
      • 4.0a Inquiry Form
    • 4.1 Code Optimization
    • 4.2 Code Updating
    • 4.3 Code Development

Just to give you a rough idea. You’ll want to play with the system to find something that works for your content.

When you have IDs assigned for pages you’re planning to include in your new or updated site in the content matrix, you can associate things elsewhere in your collaboration process by referencing those ID numbers. Each document you create has an ID number that ties back into the content matrix, which lets you specify dozens of pages and create the production infrastructure to write/produce those pages before you’ve so much as got a title in mind for them. This accelerates things greatly. Each item in your content matrix — each row in your spreadsheet — can link directly to a document with the same ID number, making it relatively easy to bounce around different sections of your site and check on progress, do QA, and so on.

The ID system can be a little tedious at times, yes, but I’ve seen it work for sites with thousands of URLs. Most spreadsheet tools will do their best to help you continue a numbering scheme if you simply select a few in a row and drag down to expand. (Just be sure to double-check it got it right!)

Spreadsheets are cool I guess, but what about using this specific, complicated, and expensive app I found instead?

The best way to learn whether or not an app or other fancy tool will help you out is to start your migration and production planning right now, with the simplest tools you can already access. If you hit roadblocks in collaborating, or in structuring data, or in coordinating workflows — that you think specific features above and beyond what’s built into your office suite can help with — then sure, give a fancy tool a try! You’ll have a much better idea of what specific fancy apps and custom tools you might need after trying it the simple way, first.