How to Generate a Table of Contents in Docx with Python
25 Nov 2023
Tables of contents (TOCs) are an essential part of long documents, providing a quick overview of the document's structure and enabling easy navigation. While most Office software packages offer built-in TOC generation mechanisms, they can be difficult to call from code.
This blog post will show you how to generate a TOC in Docx with Python. We will use the following steps:
- Create a TOC placeholder in the Docx document.
- Refresh the TOC.
Create a TOC placeholder in the Docx document
The first step is to create a TOC placeholder in the Docx document. This can be done using the following Python code:
We will use the python-docx
package.
pip install python-docx
from docx.oxml.ns import qn
from docx.oxml import OxmlElement
def insert_toc(d, levels="1-3"):
"""
Insert "Table of Contents" to Document
Parameters:
------------
d: Document Object
文档对象
levels: string
default "1-3"
根据 addheading 更新目录
"""
sdt = OxmlElement('w:sdt')
sdtpr = OxmlElement('w:sdtPr')
docpartobj = OxmlElement('w:docPartObj')
docpartgallery = OxmlElement('w:docPartGallery')
docpartgallery.set(qn('w:val'), 'Table of Contents')
docpartunique = OxmlElement('w:docPartUnique')
docpartunique.set(qn('w:val'), 'true')
docpartobj.append(docpartgallery)
docpartobj.append(docpartunique)
sdtpr.append(docpartobj)
sdt.append(sdtpr)
sdtcontent = OxmlElement('w:sdtContent')
p = OxmlElement('w:p')
r = OxmlElement('w:r')
t = OxmlElement('w:t')
t.text = 'Contents'
r.append(t)
p.append(r)
sdtcontent.append(p)
fldChar = OxmlElement('w:fldChar') # creates a new element
fldChar.set(qn('w:fldCharType'), 'begin') # sets attribute on element
instrText = OxmlElement('w:instrText')
instrText.set(qn('xml:space'), 'preserve') # sets attribute on element
instrText.text = f'TOC \\o "{levels}" \\h \\z \\u' # change 1-3 depending on heading levels you need
fldChar2 = OxmlElement('w:fldChar')
fldChar2.set(qn('w:fldCharType'), 'separate')
# fldChar3 = OxmlElement('w:t')
# fldChar3.text = "Right-click to update field."
fldChar3 = OxmlElement('w:updateFields')
fldChar3.set(qn('w:val'), 'true')
fldChar2.append(fldChar3)
fldChar4 = OxmlElement('w:fldChar')
fldChar4.set(qn('w:fldCharType'), 'end')
p2 = OxmlElement('w:p')
r2 = OxmlElement('w:r')
r2.append(fldChar)
r2.append(instrText)
r2.append(fldChar2)
r2.append(fldChar4)
p2.append(r2)
sdtcontent.append(p2)
sdt.append(sdtcontent)
d._element.body.insert_element_before(sdt, *('w:sectPr',))
return d
Generate the TOC
We now have a TOC section in our document, but it's empty. How do we refresh the TOC? There are two ways to do this:
Method 1: Using a LibreOffice macro
- Create Macro Module
REM ***** BASIC *****
Option Explicit
Sub UpdateIndexes(path As String)
'''Update indexes, such as for the table of contents'''
Dim doc As Object
Dim args()
doc = StarDesktop.loadComponentFromUrl(convertToUrl(path), "_default", 0, args())
Dim i As Integer
With doc ' Only process Writer documents
If .supportsService("com.sun.star.text.GenericTextDocument") Then
For i = 0 To .getDocumentIndexes().count - 1
.getDocumentIndexes().getByIndex(i).update()
Next i
End If
End With ' ThisComponent
doc.store()
doc.close(True)
End Sub ' UpdateIndexes
- Import the Macro Module
<!-- -->
$ mv ~/.config/libreoffice/4/user/basic ~/basic_backup
$ cp basic ~/.config/libreoffice/4/user/ -r
- Run the Command
$ soffice --headless "macro:///Standard.YourModuleName.UpdateIndex(/path/to/file.odt)"
Method 2: Using Unoserver
- Pull the Unoserver Docker Image
docker pull chanmo/unoserver
- Run the Unoserver container
docker run -p 5000:5000 chanmo/unoserver
- Update the TOC Using HTTPie
http -f POST :5000/convert/docx file@/path/to/demo.docx -o demo.docx
The disadvantage of using a LibreOffice macro is that the server needs to have LibreOffice installed. If you don't want to install LibreOffice, you can use the Unoserver method instead.