The Four Cs of Documentation and Code

I have a similar philosophy to writing both software and documentation. It is driven by four words which (conveniently) really do all begin with C.

Those are, in approximate descending order of importance:

Correctness
Completeness
Clarity
Consistency

Let’s dive a little deeper into each of these.

Correctness

We have all encountered either software or documentation that is incorrect. For software, the results can range from humorous to deadly. In the case of documentation, this is often worse than having no doc at all.

If your user is getting a different result than the one promised by your documentation, they are usually going to assume the problem is something they are doing. It’s going to be a long time before they decide that they need to explicitly verify whether your documentation is correct. This is time that you have stolen from them to benefit nobody, which is clearly unacceptable!

When can this be sacrificed?

Very rarely, but it does happen.

Most commonly, you may need to simplify your description of a system to the point where it no longer gives a good impression of how it will behave in edge-cases. I’m not a fan of making this tradeoff, but it’s a legitimate decision.

When I was a techwriter, this was a major difference between my editor and I. I tended to want to give the user a mental model that was reasonably close to the implementation, whereas she didn’t consider that to be especially important. I think this disagreement was very useful, as we tended to reach good compromises which strengthened the doc.

As far as code… only you can decide if it’s ok for your program to be wrong.

Clarity

Documentation with accurate information is only useful if people can understand it.

Code that is unclear to humans does at least get its job done. However, there are few situations in which you will write professional code and be sure that nobody will ever need to look at it again.

The most common reason we sacrifice clarity is time pressure.

I have made this longer than usual because I have not had time to make it shorter.

— Blaise Pascal (except in french)
Lettres Provinciales

Compacting ideas, removing duplication, deciding what is important… these tasks take time and mental energy.

We also often lose clarity when we don’t have a full understanding of our problem domain. I know that as a techwriter, I could write true facts I was told by engineers, but it wasn’t until I got more context for what I was doing that I could start to write really clear documentation.

When can this be sacrificed?

For documentation, typically you want things to be as clear as possible. That said, if the system you are documenting is inherently confusing, you may need to balance completeness against clarity. If you find yourself doing this, often it is wise to provide an "advanced" section with all the gnarly details.

For programs, in some cases we may sacrifice clarity in pursuit of performance goals. This is less important han it once was, in part because our platforms are often built around allowing us to write the code we want without sacrificing the performance we need. However, when you are on the ragged edge of some metric, you are quite likely to start writing cursed things to meet your goals.

I have a theory that programmers who are very smart, but poor communicators, are dangerous to the long-term health of a product. You have to make sure somebody wraps up their magic in an encapsulation model that stops it from touching everything else!

Of course, if you have written code that is not easy to understand, you probably want to write a test suite for it that is.

Completeness

Documentation that is light on details may leave the user high and dry. When it comes to command line tools or libraries, if you don’t tell the user how something works there is usually no way for them to figure it out on their own (unless they perform a dictionary attack on your API). GUI tools are a little safer if the GUI design is effective at teaching.

What is incomplete code?

Incomplete code is when you have functions that only work in "expected" program states. Code that does not work in expected states of your program is simply incorrect.

For example, this trivial example is correct but incomplete:

#include <string>
#include <iostream>
#include <stdlib.h>

enum CowType
{
   eHolstein = 0,
   eJersey,
   eHighland,
   ePinzgauer
};

std::string getMoo(CowType cow)
{
    switch (cow)
    {
        case eHolstein:
        case eJersey:
            return "Mooooooo";
        case eHighland:
            return "MeeeeermooHOOH";
    }
    return "I am a cow";
}

// Prints the moo of a random type of cow
void main()
{
   CowType cow = static_cast<CowType>(rand() % 2);
   std::cout << getMoo(cow);
}

This program does exactly what it says, so it is correct. But getMoo() does not do the right thing if you pass in ePinzgauer (unless they actually do say "I am a cow"). This program is correct, but the code is incomplete.

Some people believe it would be wrong to complete this function, since code that is never used is by definition untested (never mind that you should probably be unit testing getMoo() anyway. I did an impromptu poll of ~10 software developers, and a strong majority agreed that functions should cover all valid program states; not just ones that actually occur in the current codebase.

YMMV on this one.

When can this be sacrificed?

If there is code or documentation that would be difficult and time-consuming to write, OR it significantly inhibits clarity, you may wisely choose to reduce completeness in service of those goals.

Naturally if you do this, you should record what is missing and why!

/*
Does not support Pinzgauers.
It was taking me too long to find a video of one mooing online,
 so I gave up.
*/
std::string getMoo(CowType cow);

Consistency

Consistency has many benefits, most notably that it contributes to clarity. People don’t like to be reading something, and then have it change tone as if it were being written by a different person (even if it was).

Even more important is that you *always refer to a thing with the same terminology*. You may know that a honking potato and a bleating tuber are the same thing in your API, but if you use both terms in your documentation your audience is going to assume there is a reason for that. By consistently using the same terminology, you don’t leave the reader wondering if there is some new topic they need to learn.

When I was a technical writer, we had custom systems which worked very hard enforce specific wording and phrasing. Our documentation build system was a wonder to behold!

Consistency can also enable certain types of automation. I have used regex to find and replace in a codebase, it works a lot better when everybody agrees how words are spelled whether to use tabs or spaces, and what kind of line ending you are going to use!

When can this be sacrificed?

This is the bottom item because in my view it is the one we should be most willing to give up.

If you have a formatting or wording standard, and that standard is hard to read in one particular case, don’t just blindly stick to it. Correctness, clarity, and completeness are all more important than consistency.

One particular case that comes to mind was a list of controls we supported for different object types. I don’t remember the exact wording, but following our standards resulted in subheadings something like:

For when ParameterA specifies a group identifier and ParameterB specifies the label of a holstein Cow object
For when ParameterA specifies a group identifier and ParameterB specifies the label of a jersey Cow object
For when ParameterA specifies a group identifier and ParameterB specifies the label of a highland Cow object

This is from memory; as I recall it was actually longer than that!

These headings hung around until somebody pointed out that they were basically unreadable, especially in the context of our function reference. After some debate, we agreed to violate the standard and replace it with something like this:

Group ID and specifying a Cow of type HOLSTEIN
Group ID and specifying a Cow of type JERSEY
Group ID and specifying a Cow of type HIGHLAND

This also highlights two excellent rules to follow about clarity when you have a list (including a set of nearly identical calls in code):

Keep the common elements between each item compact.
Make sure that the differences stand out, usually by putting them at the beginning or end (people often skip the middles of both words and sentences when they read).