4. Source-Code Documentation

Before we get into the discussion of source-code documentation, let us first define a few terms which we will use along the way.

Making changes to source code after it was initially written is called maintenance. A maintenance programmer can be the original writer after, say, a year has gone by and he’s forgotten what he was thinking when he wrote it. Or it can be someone else. Whoever it is, we will call him a maintenance programmer. We say that a body of source code is in the maintenance portion of its lifetime (or in maintenance) after has reached its first release, and is still being used to build updated versions of the software system it is being used with.

Let us also take, for the sake of this discussion, the “unit of software” that the maintenance programmer is going to be making changes to, and call it a module or subsystem. In the C language, a module (subsystem) would typically be a .c file and its .h counterpart, together handling one sphere of responsibility, typically containing every way one data type (or a group of data types that work together) can be manipulated and/or used in a larger system. A module’s API would consist of the collection of data and functions that are publicly available to users of that module. Usually there is documentation that goes with each element of that API, and we call the collection of that documentation its API documentation: the bare minimum users of that module will need to use it correctly.

4.1. The Problem Being Solved

An alarming amount of bugs in the software industry are introduced into software after it was originally written, by maintenance programmers who make changes to a module before they attain full understanding of how it is supposed to work internally.

There can be many reasons for this, the most common of which is, “it takes too long” (given the time available). This can be because the means to achieve this level of understanding

does not exist,
is inadequate,
is not findable,
is not accessible (e.g. hidden behind a paywall or private repository the programmer does not have access to), or
is too long and wordy.

This is important: to safely make changes to a module, the maintenance programmer needs to reach a very specific threshold of understanding of how that module was designed to work internally before he starts changing the code. And by “safely”, I mean he understands the original author’s design intentions thoroughly enough that he is not risking introducing new bugs due to lack of understanding. This threshold of understanding plays an important role, as you will see below.

For sake of this discussion, we are going to call that threshold a full understanding of that module’s design.

Note carefully: for complex modules, the API documentation is not enough to achieve this understanding by itself. In fact, the writer(s) of the the API documentation often intentionally do not document a module’s internal workings or design, because the vast majority of its end users will neither need nor want to know those details. It is also often the case that maintenance programmers need or want to make changes and/or improvements to its internal workings without changing the API or its documentation. My point: API documentation (on one hand) and the documentation required for a maintenance programmer to gain full understanding of a module before making changes to it (on the other hand) are 2 completely different things, with different purposes:

API documentation is public facing, and
the documentation required for a maintenance programmer to gain full understanding of a software module’s design before making changes to it, is internal—for maintenance programmers.

An interesting and important part of this problem is: when a developer is new (fresh out of University or wherever), he still lacks many VITAL understandings that ONLY come with experience. One of those understandings is what happens when you become responsible for source code that has no Internal Documentation with it. If it has a simple design, no big deal—you can work out how it was designed by looking at the source code.

But as the design gets more and more complex, the LONGER IT TAKES to adequately understand it through the source code alone. And this factor literally has no limit. In the industry, very complex source code without internal documentation has been given a name:

a “maintenance nightmare”.

This is a term used in the industry for exactly this problem: you are responsible for maintaining complex code, but you can’t tell what the original author’s design intentions were. What happens to that code? Over time (months or years), it literally gets thrown away and not used, especially if it has bugs in it. Reason: because programmers cannot understand it fast enough to make it more efficient to modify it (or fix it), rather than re-write it from scratch. I have seen very complex cases that can take take days or weeks of diligent, focused study to thoroughly understand it. Very few programmers can afford (or are ALLOWED) that kind of time, and so bugs hide in code like that, and new bugs are easily introduced into it by maintenance programmers—all because they don’t understand the original author’s design intentions.

A classic example is Microsoft Windows. Somewhere between Windows 3.1 (1992) and Windows XP (1998), someone introduced a bug into (what I am almost certain is) an example of the above: complex code that is missing the documentation about its internal design intention. It caused what is called a “memory leak”, and after you would run Windows aggressively for 3-5 hours (opening and closing application windows over and over), it would crash with the error message: “Out of resources”, simultaneously losing all unsaved work. (It was, because a window handle was allocated, but not returned to the “list of window handle resources” when the window was closed.) The less memory (RAM) you had, the faster this would happen. But as of Windows 8, literally 20 years later, THAT BUG STILL HAD NOT BEEN FOUND, and the only thing that was saving them was that there was a lot more memory in most computers, so it took longer for it to crash. I will bet you $100 right now that that bug STILL HAS NOT BEEN FOUND even in Windows 11—the latest version at this writing—though I have no direct information on it since 2012. The result in the industry: people are moving to Linux... gradually; Windows is losing its market, and there are VERY FEW Windows system administrators as well as programmers who are not MAD AT MICROSOFT for various problems—that memory leak being one of them.

4.2. An Alternate View of the Problem

One of the areas I have been most valuable in in my career is development of very complex subsystems (with very complex algorithms). I learned early that there are 2 schools of thought among developers out there who will take on such development tasks:

programmers who will dive in and start coding right away, and
programmers who will take some more time in the design stage (which includes documenting the complex subsystem so that others can understand it, as though he is explaining it to someone who has never seen it before).

The documentation process allows the original author to not only think it through better, but to find and eliminate bugs in the design BEFORE they make it into the source code.

Group 1 will struggle and 3 months later will have a buggy product that they are lucky if it works correctly. Worse yet, what they produce will often be a “maintenance nightmare” as described above, making it a long-term liability for anyone who accepts responsibility it.

Group 2 will spend a bit of extra time (hours, a day or even several days for subsystems that are very complex) proving the design and finding and correcting flaws in the design BEFORE starting to write code. And group 2 will have a bug-free (or nearly so), well-documented, highly-maintainable version of the software in 3 weeks.

The more complex the subsystem, the more important this extra step is. (And vice-versa: the simpler the system, the less this step is needed.)

The documentation allows others (including the original author, later) to properly understand the design of the complex subsystem, including why certain design decisions were made and
the coding was written directly from that documentation, with validation testing occurring to prove correct operation at each stage.

I learned the above both THE EASY WAY and THE HARD WAY several times.

To quote a colleague of mine:

Having well-written internal documentation is crucial for explaining architecture and design decisions, providing readers with a starting point to understand the system, and helping them navigate the code more effectively.

[Documentation is] not a replacement for good code, but is an entry point [for other programmers, including oneself later, to understand the code]. It gives readers a coherent way to understand why the system looks the way it does, and then allows them to follow the code with much less friction. When documentation and code are aligned this way, they reinforce each other rather than compete.

—W-Mai

4.3. The Solution

The solution to this problem is Internal Documentation (a.k.a. maintenance documentation or source-code documentation).

4.3.1. What is Internal Documentation?

Internal Documentation is comments in the source code itself that get a maintainer oriented as quickly as possible about how a module is designed to work internally[1].

The reason this documentation is directly in the source code is that, according to [Meyer1997], this is the location where it is most likely to stay up-to-date. And I can confirm, this is true from my experience.

Donald Knuth (one of the fathers of Computer Science) calls this “Literate Programming”[2]. To roughly quote him from one of his interviews:

“When you are writing source code, you are not just teaching the computer what to do, but also teaching other programmers how it works, not only users of the API, but also future maintainers of your source code. Comments add information about what the author was thinking when the code was written, and why you did things that way—subtleties about the design that cannot be conveyed by the source code alone.”

When that job of orienting the maintenance programmer is done well, the contents of the source code itself (including these comments) are adequate to efficiently orient and educate the maintenance programmer to the extent that he can safely make changes to it.

4.3.2. Factors

Degree of Complexity:

As described above in An Alternate View of the Problem, the degree of complexity of a subsystem plays a major role in determining to what extent Internal Documentation is needed.

Source Code Lifetime:

In general, the longer the life of a particular body of source code, the more maintenance programmers are going to need to make changes to it. And each time they do, they are going to need to attain full understanding of that module. Thus, the longer that lifetime, the more important it is for a maintenance programmer to be able to attain that level of understanding efficiently.

On the reverse side of that coin, for source code that is only going to be sent into the field once and never looked at after that, being able to efficiently attain full understanding of that source code has little, if any, importance.

Number of Maintenance Programmers:

The more people that will need to understand that module, the more people are going to need to arrive at a full understanding of how that module is supposed to work. Thus, the ability for programmers to acquire that knowledge efficiently becomes more important when many programmers will be involved.

4.3.3. Measurement

The measure of how good Internal Documentation is for a source-code module involves only 1 thing:

How long does it take for a maintenance programmer to attain full understanding of the designer’s intentions for that module, i.e. how it is supposed to work internally?

The simplicity of the above implies that his understanding is both accurate and complete. Are all pertinent details available (that are not immediately visible in the source code)? While more complex software modules will necessarily take longer to study to understand fully, the time involved should be very short compared to thddd articlese

time it would take to gain a similar level of understanding by studying the source code itself. If it takes more than a few minutes of studying to attain full understanding, it’s probably too long, especially when it can likely be covered in a small number of paragraphs in the Internal Documentation.

The length of time to attain that level of understanding should probably be measured in seconds. Reason: every second counts. 1-2 minutes is pretty good for a complex module. 30 seconds is even better. 5-10 seconds is even better.

Thus very long, wordy documentation is usually unwelcome unless the details are really important. Mostly what this documentation should try to do is ORIENT the programmer, not explain what he can see in the source code. If necessary, refer to the function (or subroutine as Donald Knuth likes to call it) as needed, but don’t repeat it. (In other words, Internal Documentation should always be at least 1 level of abstraction above the source code.)

Details well covered elsewhere (such as well-known Computer Science topics) can be relied on and referred to as needed, but should not be part of this documentation, since it only clutters the documentation and makes it slower for programmers who don’t need it. (They can always look it up elsewhere when they do need it.)

Brevity is important. Not repeating what’s already in the code is important.

4.3.4. The Basic Questions—Revisited

The first objective of Internal Documentation is quickly orienting future maintenance programmers. Therefore, The Basic Questions should get answered first:

What is it?
What does it do?
Why does it exist (i.e. what problem[s] does it solve)?
How does it do it (overview level or via an easy-to-read diagram)?

In the context of source code, #4 often breaks down to to a couple of questions, especially for orientation purposes:

4.1. What is its architecture? (How do its parts fit together?)

4.2. What are its data flows (when not already obvious)?

Once your reader is thus oriented, he will now find that reading the source code will now make a lot more sense. Thus, it is often the case that this orientation is all he will need. This assumes (of course) that the architecture and data flows are simple enough that the rest of what he will need to understand is now plainly visible in the code.

Sometimes, even a carefully-worded 1-liner can do it all, if the reader can be assumed to know certain things. Other times will take a small number of paragraphs. Don’t overlook the power of a picture or diagram to convey many related details at once.

In complex cases, there may be specific details that he will need, and these should be included: what the original programmer was thinking, and any subtleties that are not immediately visible in the source code.[2]

Obviously, the more complex the module, the more details there will need to be: witness the many “white papers” there are that describe the details of things like compiler front-end design, language parsers, etc. etc.. While brevity is extremely helpful, as are pictures and diagrams to convey many related details at once, the priority when writing internal documentation is:

clarity (the ease with which other programmers can understand your intention),
readability (the ease with which other programmers can read your comments),
brevity (the quality of using few words when speaking or writing).

Bertrand Meyer in his book [Meyer1997] gives an excellent explanation of the importance of brevity in maintenance documentation, and how to achieve it, in Chapter 26: A sense of style. Section 26.4 HEADER COMMENTS AND INDEXING CLAUSES explains thoroughly how to achieve brevity in documentation comments, and the well-justified reasoning behind each point. This material is important to brevity in maintenance documentation and is well worth the time to study it thoroughly.

In this author’s own company, [Meyer1997] Chapter 26 is firm policy, for all the reasons stated above.