Uncategorised legacy data—save or bin?

06 Feb 2020

It’s easy enough to enable, and require, document profiling for new documents created within the system, but it is a challenging and seemingly intractable problem to decide what to do for the tens of thousands—if not hundreds of thousands—of legacy documents that already exist.

Deploying a Knowledge Management (KM) or Information Governance system in an organisation that has no previous experience with such systems is difficult, even in the best of circumstances. Beyond the often-heated philosophical debate over structured versus free-form storage that occurs with every one of these projects, there’s also another universal difficulty: bringing existing work product into the new system.

What can be done to finally address this lingering and universal problem? Here are four suggestions to think about when considering your legacy data:

1. Not all documents are equal. Not all should survive?

In thinking about the treatment of legacy documents, it’s critically important to remember that not all existing documents are equally valuable to an organisation.

Generally speaking, documents recently created or accessed are likely to have greater relevance and value to an organisation than old WordPerfect files tucked away in a forgotten directory. Indeed, some older documents may exist only because historically, it had been easier to migrate them en masse to the new hardware than it had been to review and make retention and disposition decisions about them. But remember, mere existence of a document does not mean that it has continuing value to the organisation.

A document migration initiative is also a perfect opportunity to validate existing electronic document repositories against an organisation’s data policies for managing and retaining hardcopy documents. Often, because they take up less physical space, electronic documents may be stored longer than if they had been hardcopies. Similar to the ease of pushing documents forward rather than reviewing them for disposition, many older documents lingering on file servers might well have been slated for destruction pursuant to established records retention schedules if they had been stored in hardcopy format.

How old is too old for electronic documents? Records retention schedules aside, one methodology for assessing the business value of documents is to set a cut-off date, typically three to five years before the present, where documents on the newer side are automatically accorded business value and those on the older side are deemed to have reached their end of life and should be destroyed.

This cut-off date should be consistent with existing records management schedules or other criteria—but it shouldn’t necessarily be considered the final word. Organisations should review a sample of documents on either side of the proposed cut-off, and results are often very revealing. As a basic objective, sampling individual documents close to either side of the cut-off date will validate whether the cut-off date is appropriate and makes business sense.

2. Leverage existing organisational systems

The absence of a holistic document management or KM system does not mean that documents reside in an unorganised pile. Users typically create their own organisational systems for storing their own materials—in the form of named and nested folders.

While it may make little sense to review thousands of legacy documents for their relevance, it is a much more manageable task to review a list of folder names. Even at the folder level, too, it may be possible to identify obsolete subject matter—as well as clearly current topics—and apply triage decisions more quickly as a result.

One advantage to this approach is that folder-level analysis requires minimal additional technology investment—Microsoft Windows itself has built-in utilities to identify all the directories and sub-directories on a given network drive or file share. If the legacy documents have been stored in a SharePoint repository, the process is even simpler—SharePoint contains an ‘export to Excel’ function that will create a spreadsheet containing all folder and sub-folder names, plus basic metadata (creator, creation date, etc.) that is typically helpful in assessing the age and relevance of the folder content.

A major disadvantage of this approach, of course, is that decisions that are made based on folder name and metadata do not catch the fact that someone may have stored additional non-related materials in a named folder, too. Further, old and new folder names alike may lack sufficient details that make it easy to determine their contents, forcing greater reliance on folder-based metadata to make relevance decisions. However, given that the only alternative is to conduct a time-consuming file-by-file analysis of each document within a folder—the efficiency of folder-based analysis may still outweigh these potential downsides.

Yet, if this is not a complete solution, folder analysis does still identify groups of folders that should be migrated and others that should be deleted, leaving a smaller population that will require further analysis.

3. Bring them all in—with a disclaimer

A third approach to managing legacy data is to embrace the problem by migrating all existing documents into the new system. In such an approach, older materials would become full-text searchable, but their document profile information would include dramatic disclaimers like ‘legacy’ or ‘imported’ to show that these documents are unvalidated as to their continuing relevance or accuracy.

Importing all materials, with or without disclaimers, has the advantage of requiring no complex planning and will appease those who are concerned about the potential loss of historical organisational documents. However, this single advantage must be balanced against multiple, powerful downsides, including the most critical: bringing in everything will noticeably reduce the power of the KM solution.

This approach almost always imports a significant amount of duplicative and draft material of limited value that clogs up search results. Running a search against every document ever created by any user in an organisation will gather many low-value search results, calling into question the value of the entire search.

Second, bringing in so many similar or duplicate documents, from multiple locations, will provide competing drafts without offering versioning to understand the priority of one version over another, a key feature of modern KM solutions.

Third, importing all existing documents will take up considerable storage space, something that might make it much more expensive to host and back up the new document repository, especially if there are no policies to automatically age-out documents pursuant to some records retention schedule.

4. Leave them in place

A final approach to managing legacy documents is to set existing repositories to read-only mode and leave them where they are as permanent archives.

Typically, this approach is also paired with the ability of users to import individual or groups of files from these archives into the new system as needed. This approach requires the least immediate work by IT professionals but offers the least benefits to both system users and IT.

With this approach, end users would need to consult two entirely separate systems to find previous work product, and the legacy files will not be searchable to the same degree or by using the same tools as the new files. As a practical matter, the legacy files will remain largely hidden until they become wholly irrelevant or forgotten.

For IT, maintaining existing file repositories will require maintaining the physical infrastructure on which they are stored. This may prove both expensive and difficult—hardware fails after a period of time, and the costs to migrate data from one storage system to another as systems become obsolete may be both expensive and duplicative of the cost of the KM system.

In the end, this approach delays making hard decisions, but at an increasing financial cost.

Conclusion

Knowledge Management, Information Governance, or Document Management systems are excellent tools for helping an organisation manage its institutional information. However, these systems must be set up to offer genuine value to end users. And ideally, the organisation will find a way to consolidate all of its document-based (i.e., unstructured) work in this new system.

Such an approach will be more effective and cheaper to maintain over the long run than those approaches that include long-term access to any part of the pre-existing storage system, even in read-only mode.

Uncategorised legacy data—save or bin?

1. Not all documents are equal. Not all should survive?