ISO/IEC 27038:2014 – Information technology — Security techniques — Specification for digital redaction
For reasons such as the release of official documents under Freedom of Information laws or your safety, sometimes it is necessary to disclose digital data to third parties, sometimes even to the general public. When it is deemed inappropriate to disclose sensitive data contained in the files (such as the names of anonymous individuals and various other confidential or proprietary information), the sensitive information must be removed from the files safely before their release. ‘Redaction’ involves denying file recipients knowledge of sensitive file data.
Since redactions are often important in protecting highly confidential information, mistakes in the process that lead to inappropriate disclosures can be grave and even dire. There have been incidents of identity theft, disclosure of confidential information, privacy breaches, and compromise of the identities of undercover agents and informants due to redaction failures, while the exposure of trade secrets could prove extremely costly for a company. It is at least embarrassing for the individuals deemed responsible when redactions fall short.
The following information risks are associated with digital redaction:
1. Making poor decisions regarding the redaction of data, the technical method or procedure to be used, and/or the suitability (mainly competency and diligence) of those assigned to the task.
2. Failure to correctly identify which data items must be redacted (both for individual data items and files);
3. Failing to destroy the redacted data entirely, such as:
Use of ineffective or inappropriate technical means of redaction, including crudely modifying sensitive data rather than permanently removing them (i.e., reformatting or overlaying redacted text to appear invisible, applying easily reversed mechanical processes, or tokenizing textual identifiers);
– Accidentally releasing sensitive data that has not been redacted in part or in whole (perhaps by releasing multiple versions of a sensitive document that have been redacted differently, which could be reconstructed directly or indirectly);
– Removing or partially deleting sensitive data, while leaving residual data (including, but not limited to, the editing journal or cached copies), allowing the data to be restored from the redacted version;
– Over relying on pixelation, blurring or similar methods of obscuring parts of images (often due to personal privacy concerns), while techniques such as deconvolution and more advanced image processing/transformation techniques can potentially restore enough of what is original to allow for better identification;
– Failing to redact sensitive metadata (for instance, metadata in document properties or reviewer comments, or location data on digital images);
4. Failure to distinguish between redacted and unredacted data, as well as being unable to clarify to recipients what parts of the original document have been altered;
5. Adding a great deal of redaction to a document, removing sensitive items beyond their intended scope, or handling it poorly (which may require you to justify your activities and decisions at some point);
6. Changing or modifying the meaning of the remaining data inappropriately or inadvertently as a result of contextual factors (e.g., removing some records may invalidate the statistical analysis of the remainder), and by causing collateral damage to the file system (such as file integrity problems and erroneous formatting changes) as part of the redaction process;
7. Leaving sufficient data in the file for recipients to infer sensitive information, perhaps while using other sources of information available (For example, replacing names with anonymous labels in redacted files, while revealing separately the relationship between labels and names; providing anonymous statistics on known small populations; revealing how many characters have been redacted, or maybe even where they have been redacted, by examining their printed size; exploring correlation, inference, and data mining techniques to glean information from redacted content);
8. Overreliance on redaction, believing that it will guarantee the confidentiality of sensitive information whatsoever, regardless of technical and process failures, which can and do happen; conversely, placing no reliance on redaction, believing that it cannot protect sensitive information (these are risks associated with governance and assurance);
9. Issues relating to information security incidental to or peripheral to the redaction process, including:
– Sending redacted files, redaction instructions, or redacted content to individuals who are not authorized to receive it;
– Failing to ensure the security of critical information related to the redaction process, including original files, redaction instructions and processing, while in transit, during processing and while being stored (e.g. intercepting sensitive information during transit over the Internet);
– The accidental disclosure of unredacted versions of a file, whether in the same manner and through the same mechanism or separately;
– Making unredacted versions of the files available to the public without permission or in an inappropriate way (such as Wikileaks);
– Deliberately or accidentally disclosing the redacted information other than by releasing the digital records (for example, by releasing the redaction instructions or overhearing a discussion about sensitive information);
– Destruction of the integrity and/or availability of unredacted original files (by overwriting the originals with the redacted versions);
10. Concealing illegal or inappropriate activities through image redaction (paedophilia is an example of when image redaction failed);
11. Other risks are also present (the risk analysis implied here is not exhaustive: it does not necessarily reflect any particular situation).
Scope and objectives
Redaction is defined according to the standard as the “permanent deletion of information within a document”. Documents are defined as “records of information that can be read independently”. The definitions are important because these terms often mean different things in other contexts and general usage. However, later in the standard, redactions are expanded to include not just removing confidential material but, when applicable, indicating where content has been deleted.
As described in the standard, it “describes requirements for software redaction tools and methods for testing whether digital redaction has been performed securely but excludes the redaction of information from databases.” In addition to defining databases as ‘units of recorded information, the document specifically excludes redactions of databases.
Even though this standard has a restricted scope, the risks it covers are significant and many of the associated controls are technically and procedurally complex. Like other ISO27k standards, it does not attempt to cover all the vagaries of the redaction process in great detail but provides sound if rather generic and high-level guidance.
This standard includes preambles, scopes and definitions, followed by the following topics:
– The introduction to digital data redaction and anonymization principles;
– Redaction requirements – a description of the redaction process;
– Redaction processes like printing and physically redacting the original materials, editing those documents in different ways, configuring metadata (for example, document properties or changelogs) relevant to the enhanced redaction, taking into account both the broader context as well as the specifics (for instance, the potential to guess, infer or reconstruct redacted content from other content in redacted files, or by using other sources);
– Maintaining records and notes so that you can explain and justify redaction decisions and actions;
– Software redaction tools – essential functional requirements;
– Redaction testing – basic methods and simple procedures for checking the effectiveness of redaction; and
– An annexe with information about redacting PDF documents.
The title of the document uses the term ‘specification’, which by definition implies a formal definition of what may be audited and certified as complying against.
The standard’s status
In 2014, the standard was published.
The ISO specification standard usually uses the keyword “shall” exclusively to indicate mandatory requirements. The DIS version also uses “should” in places and provides additional guidance. As a result, users can better understand and apply the standard, but an audit and certification of compliance will be harder if indeed that was the intention of the standard in the first place.
In general, the standard says little about the governance of or the management of redactions (for instance, identifying the content that should be redacted, why, how, and by whom, or analyzing and dealing with risk in a given redaction situation) nor on the security measures that should be employed (such as preventing the release of unredacted content without consent or explicit instructions). Possibly another standard or a collaborative project by members of the ISO27k Forum would be helpful here for further implementation guidance.