.. include:: /include/substitutions.txt .. include:: /include/external_links.txt .. _parsing: ******* Parsing ******* Sphinx uses docutils parsers to deal with the normal reST input text. In O-O terms, it utilizes polymorphic state machines, choosing parsers instantiated from the right parser sub-class based on the portion of the document being processed. Inside calls to these state machines is a reference to ``self`` (the parser state machine object), and that object normally has an attribute ``document`` which is an accumulation of the docutils ``node`` objects being built as an abstract syntax tree containing the content of the ``.rst`` file being parsed. In fact, most objects involved have an attribute ``document``. Also, other types of objects (e.g. domain objects) that don't have a ``document`` attribute come with an ``env`` (environment) attribute, which itself has an attribute called ``current_document`` which is a reference to the same ``doctree`` (tree of docutils node objects). How does it know where to insert nodes? The parsing goes from top to bottom, so it is always "at the end", i.e. a ``node.append()`` call is used to add them, and this method is defined by the ``Node`` class itself (high-level node ancestor in the inheritance hierarchy), which adds the node list passed to it to its ``children`` attribute, which is a list object. Under the debugger, how can you tell where it is parsing? The ``document`` attribute has its own attribute ``current_line`` which has the line number, and when parsing is taking place, sometimes there is an argument ``context`` which provides the text from that line. ``check_line()`` is a commonly-used function name that deals with the line contents inside the state machine. How does it know what node to append to? The target object ``self`` has an attribute ``parent`` which carries the node whose ``children`` node list being appended to. Have a look at this example from the RSTState class' ``text()`` method which parses an identified paragraph. (RSTState class id defined in ``docutils\parsers\rst\states.py``.) .. code-block:: python def text(self, match, context, next_state): """Paragraph.""" startline = self.state_machine.abs_line_number() - 1 msg = None try: block = self.state_machine.get_text_block(flush_left=True) except statemachine.UnexpectedIndentationError as err: block, src, srcline = err.args msg = self.reporter.error('Unexpected indentation.', source=src, line=srcline) lines = context + list(block) paragraph, literalnext = self.paragraph(lines, startline) self.parent += paragraph self.parent += msg if literalnext: try: self.state_machine.next_line() except EOFError: pass self.parent += self.literal_block() return [], next_state, [] The ``context`` argument contains the first line of the paragraph. ``self.state_machine.abs_line_number()`` returns the next line to be parsed. Thus .. code-block:: python startline = self.state_machine.abs_line_number() - 1 sets ``startline`` to the one-based document line number of the first line of the paragraph. Then .. code-block:: python block = self.state_machine.get_text_block(flush_left=True) slurps up the rest of the paragraph into the ``block`` object, which is a StringList object containing attributes ``data`` (list of string lines), ``items`` (item[0] is a tuple where ``item[0][0]`` contains the full path to the source document, and ``item[0][1]`` contains the paragraphs one-based starting line number within that document (same as ``startline`` mentioned above), and ``parent`` is the section node this paragraph is being added to. The end of the paragraph is recognized by the beginning of the next paragraph: a blank line followed by a line starting with flush-left text, or the end of the document, whichever comes first. Then ``lines`` a list created from all lines of the paragraph, which is then sent to .. code-block:: python paragraph, literalnext = self.paragraph(lines, startline) to be converted to a list of nodes in ``paragraph`` and ``literalnext`` which is a Boolean value indicating whether ``\\`` was found at the beginning of the paragraph. Note that the ``paragraph`` variable created is populated with a list of text and in-line element nodes created by this call: .. code-block:: python paragraph, literalnext = self.paragraph(lines, startline) Then that node list is appended to the target (section ) node in these lines: .. code-block:: python self.parent += paragraph self.parent += msg ``paragraph`` is a list of ``Node`` objects (polymorphic) and ``msg`` is a list of strings which contain human-readable messages applicable to any parsing errors if there were any. Definition Lists ================ Sphinx outputs HTML Definition Lists for a number of different types of document structures. It is important to understand what to expect so that custom formatting of these structures is more straightforward. The different types of document structures are: - :ref:`implicit plain definition lists` (no class) - field-list (reST :ref:`implicit field lists`) - glossary (Sphinx :ref:`explicit glossaries`) - option-list (reST :ref:`implicit option lists`) - footnote (reST :ref:`explicit footnotes`, prior to Sphinx v8.0; after v8.0 Sphinx places footnotes into a list of ``