12. Parsing
Sphinx uses docutils parsers to deal with the normal reST input text. In O-O terms, it utilizes polymorphic state machines, choosing parsers instantiated from the right parser sub-class based on the portion of the document being processed.
Inside calls to these state machines is a reference to self (the parser state
machine object), and that object normally has an attribute document which is an
accumulation of the docutils node objects being built as an abstract syntax tree
containing the content of the .rst file being parsed. In fact, most objects
involved have an attribute document.
Also, other types of objects (e.g. domain objects) that don’t have a document
attribute come with an env (environment) attribute, which itself has an attribute
called current_document which is a reference to the same doctree (tree of
docutils node objects).
How does it know where to insert nodes? The parsing goes from top to bottom, so it is
always “at the end”, i.e. a node.append() call is used to add them, and this
method is defined by the Node class itself (high-level node ancestor in the
inheritance hierarchy), which adds the node list passed to it to its children
attribute, which is a list object.
Under the debugger, how can you tell where it is parsing? The document attribute
has its own attribute current_line which has the line number, and when parsing
is taking place, sometimes there is an argument context which provides the text
from that line.
check_line() is a commonly-used function name that deals with the line contents
inside the state machine.
How does it know what node to append to? The target object self has an attribute
parent which carries the node whose children node list being appended to.
Have a look at this example from the RSTState class’ text() method which parses
an identified paragraph. (RSTState class id defined in
docutils\parsers\rst\states.py.)
def text(self, match, context, next_state):
"""Paragraph."""
startline = self.state_machine.abs_line_number() - 1
msg = None
try:
block = self.state_machine.get_text_block(flush_left=True)
except statemachine.UnexpectedIndentationError as err:
block, src, srcline = err.args
msg = self.reporter.error('Unexpected indentation.',
source=src, line=srcline)
lines = context + list(block)
paragraph, literalnext = self.paragraph(lines, startline)
self.parent += paragraph
self.parent += msg
if literalnext:
try:
self.state_machine.next_line()
except EOFError:
pass
self.parent += self.literal_block()
return [], next_state, []
The context argument contains the first line of the paragraph.
self.state_machine.abs_line_number() returns the next line to be parsed. Thus
startline = self.state_machine.abs_line_number() - 1
sets startline to the one-based document line number of the first line of the
paragraph. Then
block = self.state_machine.get_text_block(flush_left=True)
slurps up the rest of the paragraph into the block object, which is a StringList
object containing attributes data (list of string lines), items (item[0] is
a tuple where item[0][0] contains the full path to the source document, and
item[0][1] contains the paragraphs one-based starting line number within that
document (same as startline mentioned above), and parent is the section node
this paragraph is being added to. The end of the paragraph is recognized by the
beginning of the next paragraph: a blank line followed by a line starting with
flush-left text, or the end of the document, whichever comes first.
Then lines a list created from all lines of the paragraph, which is then sent to
paragraph, literalnext = self.paragraph(lines, startline)
to be converted to a list of nodes in paragraph and literalnext which is a
Boolean value indicating whether \\ was found at the beginning of the paragraph.
Note that the paragraph variable created is populated with a list of text and
in-line element nodes created by this call:
paragraph, literalnext = self.paragraph(lines, startline)
Then that node list is appended to the target (section ) node in these lines:
self.parent += paragraph
self.parent += msg
paragraph is a list of Node objects (polymorphic) and msg is a list of
strings which contain human-readable messages applicable to any parsing errors if
there were any.
12.1. Definition Lists
Sphinx outputs HTML Definition Lists for a number of different types of document structures. It is important to understand what to expect so that custom formatting of these structures is more straightforward.
The different types of document structures are:
Plain Definition Lists (no class)
field-list (reST Field Lists)
glossary (Sphinx Glossaries)
option-list (reST Option Lists)
footnote (reST Footnotes, prior to Sphinx v8.0; after v8.0 Sphinx places footnotes into a list of
<aside>elements inside a single<asside>element.
Any of the above can also have the class “simple” added to the list of classes of the
list when all the <dd> elements have no nested formatting such as sub-lists,
tables, etc..
Definition lists look like this in HTML:
<dl>
<dt>term1</dt>
<dd>words of definition 1</dd>
<dt>term2</dt>
<dd>words of definition 2</dd>
...
</dl>
Because all <dd> elements are “simple”, Sphinx would add class “simple” to the
<dl> element like this: <dl class="simple">.
The default browser formatting for the above looks like this:
- term1
- words of definition 1
- term2
- words of definition 2 ...
12.1.1. Furo Theme
The Furo Theme may or may not be alone in this, but I can say that it takes the above and does CSS formatting on them as follows:
All
<dl>elements that DO NOT have the following classes are formatted first.option-list
field-list
footnote
glossary
simple
You can find them by looking for (regex) \bdd, and they look like this (but
without the comments):
/* All `dd` descendants (at any level) */
#furo-main-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.glossary):not(.simple) dd {
margin-left:2rem
}
/* First direct-child elements of `dd` descendants (at any level) */
dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.glossary):not(.simple) dd>:first-child {
margin-top:.125rem
}
/* Last direct-child elements of `dd` descendants (at any level),
* and also descendant "field-list" classes (Sphinx only applies "field-list" class to <dl> elements). */
dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.glossary):not(.simple) .field-list,
dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.glossary):not(.simple) dd>:last-child {
margin-bottom:.75rem
}
/* Direct-child <dt> elements of descendant elements "field-list" classes. */
dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.glossary):not(.simple) .field-list>dt {
font-size:var(--font-size--small);
text-transform:uppercase
}
/* Empty <dd> elements of descendant elements "field-list" classes. */
dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.glossary):not(.simple) .field-list dd:empty {
margin-bottom:.5rem
}
/* <ul> elements nested in <dd> elements of descendant elements "field-list" classes. */
dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.glossary):not(.simple) .field-list dd>ul {
margin-left:-1.2rem
}
/* <ul> elements nested in <dd> elements of descendant elements "field-list" classes. */
dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.glossary):not(.simple) .field-list dd>ul>li>p:nth-child(2) {
margin-top:0
}
dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.glossary):not(.simple) .field-list dd>ul>li>p+p:last-child:empty {
margin-bottom:0;
margin-top:0
}
dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.glossary):not(.simple)>dt {
color:var(--color-api-overall)
}