Using Python and Google Docs to Build Books
When I started my latest fiction book, The Darkest Autumn, I wrote out the chapters as individual files. I did it in a text editor (Sublime) and saved the files to a git repo. The names of the files determined their order, chapters being named in this pattern:
the-darkest-autumn $ tree
.
├── 01_Beginnings.md
├── 02_Town_of_Ravenna.md
├── 03_Walls_of_Ravenna.md
As the book developed I thought about moving it to Scrivener. If you don't know, Scrivener is an excellent tool for writing. It encourages you to break up your work into chapters and scenes. The downside is that Scrivener is complex (I want to write, not figure out software) and Scrivener isn't designed for simultaneous collaboration. The latter issue is a very serious problem, as I like to have others review and comment on my writing as I go.
What I really wanted to do is combine the chapter breaking of Scrivener with the simplicity and collaboration of Google Docs. Preferably, I would put the book chapters into Google Docs as individual files and then send invites to my editor, wife, and my beta readers. By using Google Docs I could ensure anyone could access the work without having to create a new account and learn an unfamiliar system.
Unfortunately, at this time Google Docs has no way to combine multiple Google Docs contained in one directory into one large document for publication. To use Google Docs thhe way I want involves manually copy/pasting content from dozens of files into one master document any time you want to update a work. With even 5 to 10 documents this is time consuming and error prone (for me) to the point of being unusable. This is a problem because my fiction books have anywhere from 30 to 50 chapters.
Fortunately for me, I know how to code. By using the Python programming language, I can automate the process of combining the Google Docs into one master file which can be converted to epub, mobi (kindle), or PDF.
How I Combine Google Doc Files
First, I download all the files in the book's Google Docs directory.
This generates and downloads a zip file called something like
drive-download-20170505T230011Z-001.zip. I use unzip to open it
:
unzip drive-download-20170505T230011Z-001.zip -d the-darkest-autumn
Inside the new the-darkest-autumn folder are a bunch of MS Word-formatted files named identically to what's stored on Google Docs:
$ tree the-darkest-autumn/
the-darkest-autumn
├── 01. Beginnings.docx
├── 02. Town of Ravenna.docx
├── 03. Walls of Ravenna.docx
├── 04. Gatehouse of Ravenna.docx
├── 05. Courage.docx
├── 06. To the Upper Valley.docx
├── _afterward.docx
├── _copyright.docx
├── _dedication.docx
└── _title.docx
Now it's time to bring in the code. By leveraging the python-docx library, I combine all the Word files into one large Word files using this Python (3.6 or higher) script:
"""bookify.py: Combines multiple Word docs in a folder.
"""
import os
import sys
from glob import glob
try:
from docx import Document
from docx.enum.text import WD_ALIGN_PARAGRAPH
from docx.enum.text import WD_COLOR_INDEX
from docx.shared import Inches, Pt
except ImportError:
raise ImportError("You need to 'pip install python-docx'")
TEXT_FONT = "Trebuchet MS"
def add_matter(master_document, filename, chapter, after=False):
"""Builds """
if not os.path.exists(filename):
return master_document
if after:
master_document.add_page_break()
# Build the heading
heading = master_document.add_heading('', level=1)
heading.alignment = WD_ALIGN_PARAGRAPH.CENTER
runt = heading.add_run(chapter)
runt.font.color.theme_color = WD_COLOR_INDEX.WHITE
# Add the material
document = Document(docx=filename)
for index, paragraph in enumerate(document.paragraphs):
new_paragraph = master_document.add_paragraph()
new_paragraph.paragraph_format.alignment = paragraph.paragraph_format.alignment
new_paragraph.style = paragraph.style
# Loop through the runs of a paragraph
# A run is a style element within a paragraph (i.e. bold)
for j, run in enumerate(paragraph.runs):
# Copy over the old style
text = run.text
# Add run to new paragraph
new_run = new_paragraph.add_run(text=text)
# Update styles for run
new_run.bold = run.bold
new_run.italic = run.italic
new_run.font.size = run.font.size
new_run.font.color.theme_color = WD_COLOR_INDEX.BLACK
master_document.add_page_break()
print(f'Adding {chapter}')
return master_document
def add_chapter(master_document, filename, chapter):
"""Build chapters, i.e. where the story happens."""
# Build the chapter
document = Document(docx=filename)
# Build the heading
heading = master_document.add_heading('', level=1)
heading.alignment = WD_ALIGN_PARAGRAPH.CENTER
heading.add_run(chapter).font.color.theme_color = WD_COLOR_INDEX.BLACK
heading.paragraph_format.space_after = Pt(12)
for index, paragraph in enumerate(document.paragraphs):
new_paragraph = master_document.add_paragraph()
# Loop through the runs of a paragraph
# A run is a style element within a paragraph (i.e. bold)
for j, run in enumerate(paragraph.runs):
text = run.text
# If at start of paragraph and no tab, add one
if j == 0 and not text.startswith('\t'):
text = f"\t{text}"
# Add run to new paragraph
new_run = new_paragraph.add_run(text=text)
# Update styles for run
new_run.font.name = TEXT_FONT
new_run.bold = run.bold
new_run.italic = run.italic
# Last minute format checking
text = new_paragraph.text
master_document.add_page_break()
# Destroy the document object
del document
return master_document
def main(book):
master_document = Document()
master_document = add_matter(
master_document,
filename=f'{book}/_title.docx',
chapter='Title Page'
)
master_document = add_matter(
master_document,
filename=f'{book}/_copyright.docx',
chapter='Copyright Page'
)
master_document = add_matter(
master_document,
filename=f'{book}/_dedication.docx',
chapter='Dedication'
)
for filename in glob(f"{book}/*"):
if filename.startswith(f"{book}/_"):
print(f'skipping {filename}')
continue
# Get the chapter name
book, short = filename.split('/')
chapter = short.replace('.docx', '')
if chapter.startswith('0'):
chapter = chapter[1:]
print(f'Adding {chapter}')
master_document = add_chapter(master_document, filename, chapter)
master_document = add_matter(
master_document,
filename=f'{book}/_aboutauthor.docx',
chapter='About the Author',
after=True
)
master_document = add_matter(
master_document,
filename=f'{book}/_afterward.docx',
chapter='Afterward',
after=True
)
master_document.save(f'{book}.docx')
print('DONE!!!')
if __name__ == '__main__':
try:
book = sys.argv[1]
except IndexError:
msg = 'You need to specify a book. A book is a directory of word files.'
raise Exception(msg)
main(book)
This is what it looks like when I run the code:
$ python bookify.py the-darkest-autumn/
Adding Title Page
Adding Copyright Page
Adding Dedication
Adding 1. Beginnings
Adding 2. Town of Ravenna
Adding 3. Walls of Ravenna
Adding 4. Gatehouse of Ravenna
Adding 5. Courage
Adding 6. To the Upper Valley
skipping the-darkest-autumn/_afterward.docx
skipping the-darkest-autumn/_copyright.docx
skipping the-darkest-autumn/_dedication.docx
skipping the-darkest-autumn/_title.docx
Adding Afterward
DONE!!!
And now I've got a Word document in the same directory called the-darkest-autumn.docx.
Converting Word to EPUB
While Kindle Direct Publishing (KDP) will accept .docx files, I like to convert it to .epub using Calibre:
$ ebook-convert the-darkest-autumn.docx the-darkest-autumn.epub \
--authors "Daniel Roy Greenfeld" \
--publisher "Two Scoops Press" \
--series Ambria \
--series-index 1 \
--output-profile kindle
And now I can check out my results by using Calibre's book viewer:
$ ebook-viewer the-darkest-autumn.epub
Add the Links!
As python-docx
doesn't handle HTTP links at this time, I manually add
them to the book using Calibre's epub editor. I add links to:
- My personal author site at danielroygreenfeld.com
- The book's review page on Amazon
- The book's upcoming sequel, The River Runs Uphill.
How Well Does It Work?
For me it works wonders for my productivity. By following a "chapters as files" pattern within Google Docs I get solid collaboration power plus some (but not all) of the features of Scrivener. I can quickly regenerate the book at any time without having to struggle with Scrivener or have to add tools like Vellum to the process.
I have a secondary script that fixes quoting and tab issues, written before I realized Calibre could have done that for me.
The book I started this project for, The Darkest Autumn, is available now on Amazon. Check it out and let me know what you think of what the script generates. Or if you want to support my writing (both fiction and non-fiction), buy The Darkest Autumn on Amazon and leave an honest review.
Thinking About the Future
Right now this snippet of code generates something that looks okay, but could be improved. I plan to enhance it with better fonts and chapter headers, the goal to generate something as nice as what Draft2Digital generates.
I've considered adding the OAuth components necessary in order to allow for automated downloading. The problem is I loathe working with OAuth. Therefore I'm sticking with the manial download process.
For about a week I thought about leveraging it and my Django skills to build it as a paid subscription service and rake in the passive income. Basically turn it into a startup. After some reflection I backed off because if Google added file combination as a feature, it would destroy the business overnight.
I've also decided not to package this up on Github/PyPI. While Cookiecutter makes it trivial for me to do this kind of thing, I'm not interested in maintaining yet another open source project. However, if someone does package it up and credits me for my work, I'm happy to link to it from this blog post.
Tags: python django python python3 cookiecutter