.. include:: /include/substitutions.txt
.. include:: /include/external_links.txt

.. _parsing:

*******
Parsing
*******

Sphinx uses docutils parsers to deal with the normal reST input text.  In O-O terms,
it utilizes polymorphic state machines, choosing parsers instantiated from the right
parser sub-class based on the portion of the document being processed.

Inside calls to these state machines is a reference to ``self`` (the parser state
machine object), and that object normally has an attribute ``document`` which is an
accumulation of the docutils ``node`` objects being built as an abstract syntax tree
containing the content of the ``.rst`` file being parsed.  In fact, most objects
involved have an attribute ``document``.

Also, other types of objects (e.g. domain objects) that don't have a ``document``
attribute come with an ``env`` (environment) attribute, which itself has an attribute
called ``current_document`` which is a reference to the same ``doctree`` (tree of
docutils node objects).

How does it know where to insert nodes?  The parsing goes from top to bottom, so it is
always "at the end", i.e. a ``node.append()`` call is used to add them, and this
method is defined by the ``Node`` class itself (high-level node ancestor in the
inheritance hierarchy), which adds the node list passed to it to its ``children``
attribute, which is a list object.

Under the debugger, how can you tell where it is parsing?  The ``document`` attribute
has its own attribute ``current_line`` which has the line number, and when parsing
is taking place, sometimes there is an argument ``context`` which provides the text
from that line.

``check_line()`` is a commonly-used function name that deals with the line contents
inside the state machine.

How does it know what node to append to?  The target object ``self`` has an attribute
``parent`` which carries the node whose ``children`` node list being appended to.
Have a look at this example from the RSTState class' ``text()`` method which parses
an identified paragraph.  (RSTState class id defined in
``docutils\parsers\rst\states.py``.)

.. code-block:: python

    def text(self, match, context, next_state):
        """Paragraph."""
        startline = self.state_machine.abs_line_number() - 1
        msg = None
        try:
            block = self.state_machine.get_text_block(flush_left=True)
        except statemachine.UnexpectedIndentationError as err:
            block, src, srcline = err.args
            msg = self.reporter.error('Unexpected indentation.',
                                      source=src, line=srcline)
        lines = context + list(block)
        paragraph, literalnext = self.paragraph(lines, startline)
        self.parent += paragraph
        self.parent += msg
        if literalnext:
            try:
                self.state_machine.next_line()
            except EOFError:
                pass
            self.parent += self.literal_block()
        return [], next_state, []

The ``context`` argument contains the first line of the paragraph.
``self.state_machine.abs_line_number()`` returns the next line to be parsed.  Thus

.. code-block:: python

        startline = self.state_machine.abs_line_number() - 1

sets ``startline`` to the one-based document line number of the first line of the
paragraph.  Then

.. code-block:: python

            block = self.state_machine.get_text_block(flush_left=True)

slurps up the rest of the paragraph into the ``block`` object, which is a StringList
object containing attributes ``data`` (list of string lines), ``items`` (item[0] is
a tuple where ``item[0][0]`` contains the full path to the source document, and
``item[0][1]`` contains the paragraphs one-based starting line number within that
document (same as ``startline`` mentioned above), and ``parent`` is the section node
this paragraph is being added to.  The end of the paragraph is recognized by the
beginning of the next paragraph:  a blank line followed by a line starting with
flush-left text, or the end of the document, whichever comes first.

Then ``lines`` a list created from all lines of the paragraph, which is then sent to

.. code-block:: python

        paragraph, literalnext = self.paragraph(lines, startline)

to be converted to a list of nodes in ``paragraph`` and ``literalnext`` which is a
Boolean value indicating whether ``\\`` was found at the beginning of the paragraph.

Note that the ``paragraph`` variable created is populated with a list of text and
in-line element nodes created by this call:

.. code-block:: python

        paragraph, literalnext = self.paragraph(lines, startline)

Then that node list is appended to the target (section ) node in these lines:

.. code-block:: python

        self.parent += paragraph
        self.parent += msg

``paragraph`` is a list of ``Node`` objects (polymorphic) and ``msg`` is a list of
strings which contain human-readable messages applicable to any parsing errors if
there were any.


Definition Lists
================

Sphinx outputs HTML Definition Lists for a number of different types of document
structures.  It is important to understand what to expect so that custom formatting
of these structures is more straightforward.

The different types of document structures are:

- :ref:`implicit plain definition lists` (no class)
- field-list (reST :ref:`implicit field lists`)
- glossary (Sphinx :ref:`explicit glossaries`)
- option-list (reST :ref:`implicit option lists`)
- footnote (reST :ref:`explicit footnotes`, prior to Sphinx v8.0; after v8.0 Sphinx
  places footnotes into a list of ``<aside>`` elements inside a single ``<asside>``
  element.

Any of the above can also have the class "simple" added to the list of classes of the
list when all the ``<dd>`` elements have no nested formatting such as sub-lists,
tables, etc..

Definition lists look like this in HTML:

.. code-block:: html
    :caption: A Plain Definition List

    <dl>
        <dt>term1</dt>
        <dd>words of definition 1</dd>
        <dt>term2</dt>
        <dd>words of definition 2</dd>
        ...
    </dl>

Because all ``<dd>`` elements are "simple", Sphinx would add class "simple" to the
``<dl>`` element like this:  ``<dl class="simple">``.

The default browser formatting for the above looks like this:

.. raw:: html

    <dl>
        <dt>term1</dt>
        <dd>words of definition 1</dd>
        <dt>term2</dt>
        <dd>words of definition 2</dd>
        ...
    </dl>


Furo Theme
----------

The Furo Theme may or may not be alone in this, but I can say that it takes the above
and does CSS formatting on them as follows:

- All ``<dl>`` elements that DO NOT have the following classes are formatted first.

  - option-list
  - field-list
  - footnote
  - glossary
  - simple

You can find them by looking for (regex) ``\bdd``, and they look like this (but
without the comments):

.. code-block:: css

    /* All `dd` descendants (at any level) */
    #furo-main-content dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.glossary):not(.simple) dd {
      margin-left:2rem
    }

    /* First direct-child elements of `dd` descendants (at any level) */
    dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.glossary):not(.simple) dd>:first-child {
      margin-top:.125rem
    }

    /* Last direct-child elements of `dd` descendants (at any level),
     * and also descendant "field-list" classes (Sphinx only applies "field-list" class to <dl> elements). */
    dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.glossary):not(.simple) .field-list,
    dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.glossary):not(.simple) dd>:last-child {
      margin-bottom:.75rem
    }

    /* Direct-child <dt> elements of descendant elements "field-list" classes. */
    dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.glossary):not(.simple) .field-list>dt {
      font-size:var(--font-size--small);
      text-transform:uppercase
    }

    /* Empty <dd> elements of descendant elements "field-list" classes. */
    dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.glossary):not(.simple) .field-list dd:empty {
      margin-bottom:.5rem
    }

    /* <ul> elements nested in <dd> elements of descendant elements "field-list" classes. */
    dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.glossary):not(.simple) .field-list dd>ul {
      margin-left:-1.2rem
    }

    /* <ul> elements nested in <dd> elements of descendant elements "field-list" classes. */
    dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.glossary):not(.simple) .field-list dd>ul>li>p:nth-child(2) {
      margin-top:0
    }

    dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.glossary):not(.simple) .field-list dd>ul>li>p+p:last-child:empty {
      margin-bottom:0;
      margin-top:0
    }

    dl[class]:not(.option-list):not(.field-list):not(.footnote):not(.glossary):not(.simple)>dt {
      color:var(--color-api-overall)
    }