How to Generate a Table of Contents in Docx with Python

25 Nov 2023

libreoffice,toc,unoserver,macro

Tables of contents (TOCs) are an essential part of long documents, providing a quick overview of the document's structure and enabling easy navigation. While most Office software packages offer built-in TOC generation mechanisms, they can be difficult to call from code.

This blog post will show you how to generate a TOC in Docx with Python. We will use the following steps:

  1. Create a TOC placeholder in the Docx document.
  2. Refresh the TOC.

Create a TOC placeholder in the Docx document

The first step is to create a TOC placeholder in the Docx document. This can be done using the following Python code:

We will use the python-docx package.

pip install python-docx
from docx.oxml.ns import qn
from docx.oxml import OxmlElement

def insert_toc(d, levels="1-3"):
  """
  Insert "Table of Contents" to Document

  Parameters:
  ------------

  d: Document Object
     文档对象

  levels: string
      default "1-3"
  根据 addheading 更新目录
  """
  sdt = OxmlElement('w:sdt')
  sdtpr = OxmlElement('w:sdtPr')
  docpartobj = OxmlElement('w:docPartObj')
  docpartgallery = OxmlElement('w:docPartGallery')
  docpartgallery.set(qn('w:val'), 'Table of Contents')
  docpartunique = OxmlElement('w:docPartUnique')
  docpartunique.set(qn('w:val'), 'true')
  docpartobj.append(docpartgallery)
  docpartobj.append(docpartunique)
  sdtpr.append(docpartobj)
  sdt.append(sdtpr)

  sdtcontent = OxmlElement('w:sdtContent')

  p = OxmlElement('w:p')
  r = OxmlElement('w:r')
  t = OxmlElement('w:t')
  t.text = 'Contents'
  r.append(t)
  p.append(r)
  sdtcontent.append(p)

  fldChar = OxmlElement('w:fldChar')  # creates a new element
  fldChar.set(qn('w:fldCharType'), 'begin')  # sets attribute on element
  instrText = OxmlElement('w:instrText')
  instrText.set(qn('xml:space'), 'preserve')  # sets attribute on element
  instrText.text = f'TOC \\o "{levels}" \\h \\z \\u'   # change 1-3 depending on heading levels you need

  fldChar2 = OxmlElement('w:fldChar')
  fldChar2.set(qn('w:fldCharType'), 'separate')
  # fldChar3 = OxmlElement('w:t')
  # fldChar3.text = "Right-click to update field."
  fldChar3 = OxmlElement('w:updateFields')
  fldChar3.set(qn('w:val'), 'true')
  fldChar2.append(fldChar3)

  fldChar4 = OxmlElement('w:fldChar')
  fldChar4.set(qn('w:fldCharType'), 'end')

  p2 = OxmlElement('w:p')
  r2 = OxmlElement('w:r')
  r2.append(fldChar)
  r2.append(instrText)
  r2.append(fldChar2)
  r2.append(fldChar4)
  p2.append(r2)

  sdtcontent.append(p2)
  sdt.append(sdtcontent)
  d._element.body.insert_element_before(sdt, *('w:sectPr',))

  return d  

Github

Generate the TOC

We now have a TOC section in our document, but it's empty. How do we refresh the TOC? There are two ways to do this:

Method 1: Using a LibreOffice macro

  1. Create Macro Module
REM  *****  BASIC  *****

 Option Explicit

 Sub UpdateIndexes(path As String)
     '''Update indexes, such as for the table of contents''' 
     Dim doc As Object
     Dim args()

     doc = StarDesktop.loadComponentFromUrl(convertToUrl(path), "_default", 0, args())

     Dim i As Integer

     With doc ' Only process Writer documents
         If .supportsService("com.sun.star.text.GenericTextDocument") Then
             For i = 0 To .getDocumentIndexes().count - 1
                 .getDocumentIndexes().getByIndex(i).update()
             Next i
         End If
     End With ' ThisComponent

     doc.store()
     doc.close(True)

 End Sub ' UpdateIndexes  

Github

  1. Import the Macro Module
<!-- -->
$ mv ~/.config/libreoffice/4/user/basic ~/basic_backup
$ cp basic ~/.config/libreoffice/4/user/ -r  
  1. Run the Command
$ soffice --headless "macro:///Standard.YourModuleName.UpdateIndex(/path/to/file.odt)"

Method 2: Using Unoserver

  1. Pull the Unoserver Docker Image
docker pull chanmo/unoserver
  1. Run the Unoserver container
docker run -p 5000:5000 chanmo/unoserver
  1. Update the TOC Using HTTPie
http -f POST :5000/convert/docx file@/path/to/demo.docx -o demo.docx

The disadvantage of using a LibreOffice macro is that the server needs to have LibreOffice installed. If you don't want to install LibreOffice, you can use the Unoserver method instead.