.. include:: /include/substitutions.txt
.. include:: /include/custom_roles.txt
.. _source-code_documentation:

*************************
Source-Code Documentation
*************************

Before we get into the discussion of source-code documentation, let us first define
a few terms which we will use along the way.

Making changes to source code after it was initially written is called
:iu:`maintenance`.  A maintenance programmer can be the original writer after, say, a
year has gone by and he's forgotten what he was thinking when he wrote it.  Or it
can be someone else.  Whoever it is, we will call him a :iu:`maintenance
programmer`.  We say that a body of source code is in the :iu:`maintenance` portion
of its lifetime (or :iu:`in maintenance`) after has reached its first release, and
is still being used to build updated versions of the software system it is being
used with.

Let us also take, for the sake of this discussion, the "unit of software" that the
maintenance programmer is going to be making changes to, and call it a :iu:`module`
or :iu:`subsystem`.  In the C language, a module (subsystem) would typically be a
``.c`` file and its ``.h`` counterpart, together handling one sphere of
responsibility, typically containing every way one data type (or a group of data
types that work together) can be manipulated and/or used in a larger system. A
module's :iu:`API` would consist of the collection of data and functions that are
publicly available to users of that module.  Usually there is documentation that
goes with each element of that API, and we call the collection of that documentation
its :iu:`API documentation`:  the bare minimum users of that module will need to
use it correctly.



.. _the problem being solved:

The Problem Being Solved
************************

An alarming amount of bugs in the software industry are introduced into
software *after* it was originally written, by maintenance programmers who make
changes to a module *before they attain full understanding of how it is supposed to
work internally*.

There can be many reasons for this, the most common of which is, "it takes too long"
(given the time available).  This can be because the means to achieve this level
of understanding

- does not exist,
- is inadequate,
- is not findable,
- is not accessible (e.g. hidden behind a paywall or private repository the
  programmer does not have access to), or
- is too long and wordy.

This is important:  to *safely* make changes to a module, the maintenance
programmer needs to reach a very specific *threshold of understanding* of how that
module was designed to work internally before he starts changing the code.  And
by "safely", I mean he understands the original author's design intentions thoroughly
enough that he *is not risking introducing new bugs due to lack of understanding*.
This threshold of understanding plays an important role, as you will see below.

For sake of this discussion, we are going to call that threshold a :iu:`full
understanding` of that module's design.

Note carefully:  for complex modules, the API documentation is not enough to achieve
this understanding by itself.  In fact, the writer(s) of the the API documentation
often intentionally *do not document* a module's internal workings or design, because
the vast majority of its end users will neither need nor want to know those details.
It is also often the case that maintenance programmers need or want to make changes
and/or improvements to its internal workings without changing the API or its
documentation.  My point:  API documentation (on one hand) and the documentation
required for a maintenance programmer to gain :ui:`full understanding` of a module
before making changes to it (on the other hand) are 2 completely different things,
with different purposes:

- API documentation is *public facing*, and
- the documentation required for a maintenance programmer to gain
  :ui:`full understanding` of a software module's design before making changes to it,
  is *internal*---for maintenance programmers.

An interesting and important part of this problem is:  when a developer is new
(fresh out of University or wherever), he still lacks many VITAL understandings that
ONLY come with experience.  One of those understandings is what happens when you
become responsible for source code that :iu:`has no Internal Documentation with it`.
If it has a simple design, no big deal---you can work out how it was designed by
looking at the source code.

.. _maintenance nightmare:

But as the design gets more and more complex, the LONGER IT TAKES to adequately
understand it through the source code alone.  And this factor literally has no limit.
In the industry, very complex source code without internal documentation has been
given a name:

    a "maintenance nightmare".

This is a term used in the industry for exactly this problem:  you are responsible
for maintaining complex code, but you can't tell what the original author's design
intentions were.  What happens to that code?  Over time (months or years), it
literally gets thrown away and not used, especially if it has bugs in it.  Reason:
because programmers cannot understand it fast enough to make it more efficient to
modify it (or fix it), rather than re-write it from scratch.  I have seen very complex
cases that can take take days or weeks of diligent, focused study to thoroughly
understand it.  Very few programmers can afford (or are ALLOWED) that kind of time,
and so bugs hide in code like that, and new bugs are easily introduced into it by
maintenance programmers---all because they don't understand the original author's
design intentions.

A classic example is Microsoft Windows.  Somewhere between Windows 3.1 (1992) and
Windows XP (1998), someone introduced a bug into (what I am almost certain is) an
example of the above:  complex code that is missing the documentation about its
internal design intention.  It caused what is called a "memory leak", and after you
would run Windows aggressively for 3-5 hours (opening and closing application windows
over and over), it would crash with the error message:  "Out of resources",
simultaneously losing all unsaved work.  (It was, because a window handle was
allocated, but not returned to the "list of window handle resources" when the window
was closed.)  The less memory (RAM) you had, the faster this would happen.  But as of
Windows 8, literally 20 years later, THAT BUG STILL HAD NOT BEEN FOUND, and the only
thing that was saving them was that there was a lot more memory in most computers, so
it :iu:`took longer for it to crash`.  I will bet you $100 right now that that bug
STILL HAS NOT BEEN FOUND even in Windows 11---the latest version at this
writing---though I have no direct information on it since 2012.  The result in the
industry:  people are moving to Linux...  gradually; Windows is losing its market,
and there are VERY FEW Windows system administrators as well as programmers who are
not MAD AT MICROSOFT for various problems---that memory leak being one of them.



An Alternate View of the Problem
********************************

One of the areas I have been most valuable in in my career is development of *very
complex subsystems* (with very complex algorithms).  I learned early that there are 2
schools of thought among developers out there who will take on such development tasks:

1.  programmers who will dive in and start coding right away, and

2.  programmers who will take some more time in the design stage (which includes
    documenting the complex subsystem so that others can understand it, as though
    he is explaining it to someone who has never seen it before).

The documentation process allows the original author to not only think it through
better, but to find and eliminate bugs in the design *BEFORE* they make it into the
source code.

Group 1 will struggle and 3 months later will have a buggy product that they are
lucky if it works correctly.  Worse yet, what they produce will often be a
":ref:`maintenance nightmare <maintenance nightmare>`" as described above, making
it a long-term liability for anyone who accepts responsibility it.

Group 2 will spend a bit of extra time (hours, a day or even several days for
subsystems that are *very complex*) proving the design and finding and correcting
flaws in the design *BEFORE* starting to write code.  And group 2 will have a
bug-free (or nearly so), well-documented, highly-maintainable version of the software
in 3 weeks.

The more complex the subsystem, the more important this extra step is.  (And
vice-versa:  the simpler the system, the less this step is needed.)

A.  The documentation allows others (including the original author, later) to properly
    understand the design of the complex subsystem, including *why certain design
    decisions were made* and

B.  the coding was written *directly from that documentation*, with validation
    testing occurring to prove correct operation at each stage.

I learned the above both *THE EASY WAY* and *THE HARD WAY* several times.

.. _w-mai on value of internal documentation:

To quote a colleague of mine:

    Having well-written internal documentation is crucial for explaining architecture
    and design decisions, providing readers with a starting point to understand the
    system, and helping them navigate the code more effectively.

    [Documentation is] not a replacement for good code, but is an entry point
    [for other programmers, including oneself later, to understand the code].  It
    gives readers a coherent way to understand *why* the system looks the way it
    does, and then allows them to follow the code with much less friction.  When
    documentation and code are aligned this way, they reinforce each other rather
    than compete.

    ---W-Mai



The Solution
************

The solution to :ref:`this problem <the problem being solved>` is
:ref:`Internal Documentation <internal documentation>` (a.k.a. *maintenance
documentation* or *source-code documentation*).


.. _internal documentation:

What is Internal Documentation?
===============================

Internal Documentation is comments in the source code itself that get a maintainer
oriented as quickly as possible about how a module is designed to work internally\ [1]_.

The reason this documentation is directly in the source code is that, according to
:ref:`meyer1997`, this is the location where it is most likely to stay up-to-date.
And I can confirm, this is true from my experience.

Donald |nbsp| Knuth (one of the fathers of Computer Science) calls this "Literate
|nbsp| Programming"\ [2]_.  To roughly quote him from one of his interviews:

    "When you are writing source code, you are not just teaching the computer what to
    do, but also teaching other programmers how it works, not only users of the API,
    but also future maintainers of your source code.  Comments add information about
    what the author was thinking when the code was written, and *why you did things
    that way*---subtleties about the design that cannot be conveyed by the source
    code alone."

When that job of orienting the maintenance programmer is done well, the contents of
the source code itself (including these comments) are adequate to efficiently orient
and educate the maintenance programmer to the extent that he can safely make changes
to it.


Factors
=======

:ub:`Degree of Complexity:`

As described above in `An Alternate View of the Problem`_, the degree of complexity of
a subsystem plays a major role in determining to what extent Internal Documentation
is needed.

:ub:`Source Code Lifetime:`

In general, the longer the life of a particular body of source code, the more
maintenance programmers are going to need to make changes to it.  And each time they
do, they are going to need to attain *full understanding* of that module.  Thus, the
longer that lifetime, the more important it is for a maintenance programmer to be
able to attain that level of understanding *efficiently*.

On the reverse side of that coin, for source code that is only going to be sent into
the field once and never looked at after that, being able to efficiently attain full
understanding of that source code has little, if any, importance.

:ub:`Number of Maintenance Programmers:`

The more people that will need to understand that module, the more people are going to
need to arrive at a full understanding of how that module is supposed to work.  Thus,
the ability for programmers to acquire that knowledge efficiently becomes more
important when many programmers will be involved.



Measurement
===========

The measure of how good Internal Documentation is for a source-code module involves
only 1 thing:

    - How long does it take for a maintenance programmer to attain full
      understanding of the designer's intentions for that module, i.e. how
      it is supposed to work internally?

The simplicity of the above implies that his understanding is both accurate and
complete.  Are all pertinent details available (that are not immediately visible in
the source code)?  While more complex software modules will necessarily take longer
to study to understand fully, the time involved should be very short compared to thddd articlese

time it would take to gain a similar level of understanding by studying the source
code itself.  If it takes more than a few minutes of studying to attain full
understanding, it's probably too long, especially when it can likely be covered in a
small number of paragraphs in the Internal Documentation.

The length of time to attain that level of understanding should probably be measured
in seconds.  Reason:  every second counts.  1-2 minutes is pretty good for a complex
module.  30 seconds is even better.  5-10 seconds is even better.

Thus very long, wordy documentation is usually unwelcome unless the details are really
important.  Mostly what this documentation should try to do is ORIENT the programmer,
not explain what he can see in the source code.  If necessary, refer to the function
(or subroutine as Donald Knuth likes to call it) as needed, but don't repeat it.
(In other words, Internal Documentation should always be *at least* 1 level of
abstraction above the source code.)

Details well covered elsewhere (such as well-known Computer Science topics) can be
relied on and referred to as needed, but should not be part of this documentation,
since it only clutters the documentation and makes it slower for programmers who
don't need it.  (They can always look it up elsewhere when they *do need it*.)

Brevity is important.  Not repeating what's already in the code is important.


The Basic Questions---Revisited
===============================

The first objective of Internal Documentation is quickly orienting future maintenance
programmers.  Therefore, :ref:`the basic questions` should get answered first:

.. include:: /include/basic_questions.txt

In the context of source code, #4 often breaks down to to a couple of questions,
especially for :u:`orientation` purposes:

    | 4.1.  What is its architecture?  (How do its parts fit together?)
    | 4.2.  What are its data flows (when not already obvious)?

Once your reader is thus oriented, he will now find that reading the source code
will *now make a lot more sense*.  Thus, it is often the case that this orientation
is all he will need.  This assumes (of course) that the architecture and data flows
are simple enough that the rest of what he will need to understand is now plainly
visible in the code.

Sometimes, even a carefully-worded 1-liner can do it all, if the reader can be assumed
to know certain things.  Other times will take a small number of paragraphs.  Don't
overlook the power of a picture or diagram to convey many related details at once.

In complex cases, there may be specific details that he will need, and these should be
included:   what the original programmer was thinking, and any subtleties that are
not immediately visible in the source code.\ [2]_

Obviously, the more complex the module, the more details there will need to be:
witness the many "white papers" there are that describe the details of things like
compiler front-end design, language parsers, etc. etc..  While brevity is extremely
helpful, as are pictures and diagrams to convey many related details at once, the
priority when writing internal documentation is:

1.  clarity (the ease with which other programmers can understand your intention),
2.  readability (the ease with which other programmers can read your comments),
3.  brevity (the quality of using few words when speaking or writing).

Bertrand Meyer in his book :ref:`meyer1997` gives an excellent explanation of the
importance of brevity in maintenance documentation, and how to achieve it, in
:ref:`A Sense of Style`.  Section 26.4 :ref:`HEADER COMMENTS AND INDEXING CLAUSES`
explains thoroughly how to achieve brevity in documentation comments, and the
well-justified reasoning behind each point.  This material is important to brevity in
maintenance documentation and is well worth the time to study it thoroughly.

In this author's own company, :ref:`meyer1997` Chapter 26 is firm policy, for all the
reasons stated above.

.. [1]  In some cases programmers with higher levels of responsibility for code
        results, and/or end-user safety, also can need this level of understanding
        of externally-created libraries used in software they are responsible for.

.. [2]  See :ref:`Knuth on Literate Programming`.



Additional Reading
******************

- :ref:`ddd articles`

- :ref:`articles on consequences of poor code documentation`.