6. Bertrand Meyer on Terseness in Code Documentation
Bertrand Meyer, retired Chair of Software Engineering at ETH Zürich [2001-2016], in 1997 wrote a famous book which has become “the bible” of Object-Oriented software engineering: [Meyer1997]. Many parts of this master-work also apply to Software Engineering in general — not just O-O Software Engineering.
In this book, he dedicates a whole chapter (26) to coding style, and within that chapter, he makes some really outstanding arguments in favor of terseness in code documentation, especially in comments that become part of the source code’s documentation, including Internal Documentation.
Below, I quote from this chapter for sake of discussion and being referenced in other parts of this document. These quotes therefore, by the “Fair Use Doctrine”, do not constitute a copyright violation.
[Meyer1997] is an amazing and valuable masterpiece and is worth far more than its retail price. It therefore continues to be a highly recommended purchase, and well worth close study and practice: all 1202 pages. And no, I’m not kidding.
6.1. Chapter 26: A sense of style
“Implementing the object-oriented method requires paying attention to many details of style, which a less ambitious approach might consider trifles.
6.1.1. COSMETICS MATTERS!
“Although the rules appearing hereafter are not as fundamental as the principles of object-oriented software construction covered in earlier chapters, it would be foolish to dismiss them as just ‘cosmetics’. Good software is good in the large and in the small, in its high-level architecture and in its low-level details. True, quality in the details does not guarantee quality of the whole; but sloppiness in the details usually indicates that something more serious is wrong too. (If you cannot get the cosmetics right, why should your customers believe that you can master the truly difficult aspects?) A serious engineering process requires doing everything right: the grandiose and the mundane.
“So you should not neglect the relevance of such seemingly humble details as text layout and choice of names. True, it may seem surprising to move on, without lowering our level of attention, from the mathematical notion of sufficient completeness in formal specifications (in the chapter on abstract data types) to whether a semicolon should be preceded by a space (in the present chapter). The explanation is simply that both issues deserve our care, in the same way that when you write quality O-O software both the design and the realization will require your attention.
“We can take a cue from the notion of style in its literary sense. Although the first determinant of good writing is the author’s basic ability to tell a story and devise a coherent structure, no text is successful until everything works: every paragraph, every sentence and every word.
6.1.1.1. Applying the rules in practice
“Some of the rules of this chapter can be checked or, better yet, enforced from the start by software tools. Tools will not do everything, however, and there is no substitute for care in writing every piece of the software.
“There is often a temptation to postpone the application of the rules, writing things casually at first and thinking ‘I will clean up everything later on; I do not even know how much of this will eventually be discarded’. This is not the recommended way. Once you get used to the rules, they do not add any significant delay to the initial writing of the software; even without special tools, it is always more costly to fix the text later than to write it properly from the start. And given the pressure on software developers, there is ever a risk that you will forget or not find the time to clean things up. Then someone who is asked later to take up your work will waste more time than it would have cost you to write the proper header comments, devise the right feature names, apply the proper layout. That someone may be you.
6.1.1.2. Terseness and explicitness
“Software styles have oscillated between the terse and the verbose. In programming languages, the two extremes are perhaps APL and Cobol. The contrast between the Fortran-C-C++ line and the Algol-Pascal-Ada tradition — not just the languages themselves, but the styles they have bred — is almost as stark.
“What matters for us is clarity and, more generally, quality. Extreme forms of
terseness and verbosity can both work against these goals. Cryptic C programs are
unfortunately not limited to the famous ‘obfuscated C’ and ‘Obfuscated C++’ contests;
but the almost equally famous DIVIDE DAYS BY 7 GIVING WEEKS of Cobol is a waste
of everyone’s attention.
“The style that follows from this chapter’s rules is a particular mix of Algol-like
explicitness (although not, it is hoped, verbosity) and telegram-style terseness. It
never begrudges keystrokes, even lines, when they truly help make the software
readable; for example, you will find rules that enjoin using clear identifiers based
on full words, not abbreviations, as it is foolish to save a few letters by calling a
feature disp (ambiguous) rather than display (clear and precise), or a class
ACCNT``(unpronounceable) rather than ``ACCOUNT. There is no tax on keystrokes.
But at the same time when it comes to eliminating waste and unneeded redundancies the
rules below are as pitiless as the recommendations of a General Accounting Office
Special Commission on Improving Government. They limit header comments to
indispensable words, getting rid of all the non-essential ‘the’ and other such
amenities; they proscribe over-qualification of feature names (as in
account_balance in a class ACCOUNT, where balance is perfectly
sufficient); against dominant mores, they permit the grouping of related components
of a complex construct on a single line, as in from i := 1 invariant i <= n until i
= n loop; and so on.
“This combination of terseness and explicitness is what you should seek in your own texts. Do not waste space, as exaggerated size will in the end mean exaggerated complexity; but do not hesitate to use space when it is necessary to enhance clarity.
...
“... You should also be terse in expressing algorithms, but never skimp on keystrokes at the expense of clarity.
...
6.1.2. HEADER COMMENTS AND INDEXING CLAUSES
“Although the formal elements of a class text should give as much as possible of the information about a class, they must be accompanied by informal explanations. Header comments of routines and feature clause answer this need together with the indexing clause of each class.
6.1.2.1. Routine header comments: an exercise in corporate downsizing
“Like those New York street signs that read ‘Don’t even think of parking here!’, the
sign at the entrance of your software department should warn ‘Don’t even think of
writing a routine without a header comment’. The header comment, coming just after
the is for a routine, expresses its purpose concisely; it will be kept by the
short and flat-short forms:
distance_to_origin: REAL is
-- Distance to point (0, 0)
local
origin: POINT
do
!! origin
Result := distance (origin)
end
“Note the indentation: one step further than the start of the routine body, so that the comment stands out.
“Header comments should be informative, clear, and terse. They have a whole style of
their own, which we can learn by looking at an initially imperfect example and
improve it step by step. In a class CIRCLE we might start with
tangent_from (p: POINT): LINE is
-- Return the tangent line to the current circle
-- going through the point p, if the point is
-- outside of the current circle.
require
outside_circle: not has (p)
...
“There are many things wrong here. First, the comment for a query, as here, should not start with ‘Return the...’ or ‘Compute the...’, or in general use a verbal form; this would go against the Command-Query Separation principle. Simply name what the query returns, typically using a qualified noun for a non-boolean query (we will see below what to use for a boolean query and a command). Here we get:
-- The tangent line to the current circle going through
-- the point p, if the point p is outside of the current
-- circle
“Since the comment is not a sentence but simply a qualified noun, the final period disappears. Next we can get rid of the auxiliary words, especially ‘the’, where they are not required for understandability. Telegram-like style is desirable for comments. (Remember that readers in search of literary frills can always choose Proust novels instead.)
-- Tangent line to current circle from point p,
-- if point p is outside current circle
“The next mistake is to have included, in the second line, the condition for the
routine’s applicability; the precondition, not has (p), which will be retained in
the short form where it appears just after the header comment, expresses this
condition clearly and unambiguously. There is no need to paraphrase it: this could
lead to confusion, if the informal phrasing seems to contradict the formal
precondition, or even to errors (a common oversight is a precondition of the form
x >= 0 with a comment stating ‘applicable only to positive x’, rather
than ‘non-negative’); and there is always a risk that during the software’s
evolution the precondition will be updated but not the comment. Our example becomes:
-- Tangent line to current circle from point p.
“Yet another mistake is to have used the words line to refer to the result and
point to refer to the argument: this information is immediately obvious from the
declared types, LINE and POINT. With a typed notation we can rely on the
formal type declarations — which again will appear in the short form — to express
such properties; repeating them in the informal text brings nothing. So:
-- Tangent to current circle from p.
“The mistakes of repeating type information and of duplicating the precondition’s
requirements point to the same general rule: in writing header comments, assume the
reader is competent in the fundamentals of the technology; do not include
information that is obvious from the immediately adjacent short form text. This does
not mean, of course, that you should never specify a type; the earlier example, --
Distance to point (0,0), could be ambiguous without the word point.
“When you need to refer to the current object represented by a class, use phrasing such
as current circle, current number and so on as above, rather than referring
explicitly to the entity Current [akin to this in C++]. In many cases,
however, you can avoid mentioning the current object altogether, since it is clear to
everyone who can read a class text that features apply to the current object. Here,
for example, we just need
-- Tangent from p.
“At this stage — three words, starting from twenty-two, an 87% reduction that would make the toughest Wall Street exponent of corporate downsizing jealous — it seems hard to get terser and we can leave our comment alone.
“A few more general guidelines. We have noted the uselessness of ‘Return the ...’
in queries; other noise words and phrases to be avoided in routines of all kinds
include ‘This routine computes...’, ‘This routine returns...’; just say what
the routine does, not that it does it. Instead of
-- This routine records the last outgoing call.
write
-- Record outgoing call.
“As illustrated by this example, header comments for commands (procedures) should be in the imperative or infinitive (the same in English), in the style of marching orders. They should end with a period. For boolean-valued queries, the comment should always be in the form of a question, terminated by a question mark:
has (v: G): BOOLEAN is
-- Does v appear in list?
...
“A convention governs the use of software entities — attributes, arguments — appearing in comments. In typeset texts such as the above they will appear in italics (more on font conventions below); in the source text they should always appear between an opening quote (‘backquote’) and a closing quote; the original text for the example is then:
-- Does ‘v’ appear in list?
“Tools such as the short class abstracter will recognize this convention when
generating typeset output. Note that the two quotes should be different: ‘v’,
not ’v’.
“Be consistent. If a function of a class has the comment Length of string, a
routine of the same class should not say Update width of string if it affects the
same property.
“All these guidelines apply to routines. Because an exported attribute [akin to a
field of an exported struct] should be externally indistinguishable from
argumentless functions — remember the Uniform Access principle — it should also
have a comment, which will appear on the line following the attribute’s declaration,
with the same indentation as for functions:
count: INTEGER
-- Number of students in course
“For secret attributes a comment is desirable too but the rule is less strict.”
...
6.1.2.2. Indexing clauses [module orientation for programmers]
Similar to header comments but slightly more formal are indexing clauses, appearing at the beginning of a class:
indexing
description: "Sequential lists, in chained representation"
names: "Sequence", "List"
contents: GENERIC
representation: chained
...
class LINKED_LIST [G] inherit
...
Indexing clauses proceed from the same Self-Documentation principle that has led to built-in assertions and header comments: include as much as possible of the documentation in the software itself. For properties that do not directly appear in the formal text, you may include indexing entries, all of the form
indexing_term: indexing_value, indexing_value, ...
where the indexing_term is an identifier and each indexing_value is some basic
element such as a string, an integer and so on. Entries can indicate alternative
names under which potential client authors might search for the class
(names), contents type(contents), implementation choices
(representation), revision control information, author information, and anything
else that may facilitate understanding the class and retrieving it through
keyword-based search tools — tools that support reuse and enable software developers
to find their way through a potentially rich set of reusable components.
Both the indexing terms and the indexing values are free-form, but the possible
choices should be standardized for each project. A set of standard choices has been
used throughout the Base libraries; the above example illustrates six of the most
common entry kinds. Every class must have a description entry, introducing as
index_value a string describing the role of the class, always expressed in terms
of the instances (as Sequential lists..., not ‘this class describes sequential
lists’, or ‘sequential list’, or ‘the notion of sequential list’ etc.). Most
significant class texts in this book — but not short examples illustrating a specific
point — include the description entry.