Want 70,000 eBooks? Use this bash script

James Zaib
2 min readMar 3, 2023
Photo by Perfecto Capucine on Unsplash

Project Gutenberg is a great resource for eBooks, specifically Classic Novels which are now in the public domain.

Thanks to this handy bash script, you can have a lifetimes supply worth of reading material for your phone, kobo, kindle or any eReader.

You will need a Linux based operating system, Linux Virtual Private Server VPS or WSL Windows Subsystem for Linux. WSL is probably the easiest option for those of you running Windows 10.

MacOSX should be able to run this script natively.

This script will download the eBooks and save them in PDF.

apt install wkhtmltopdf -y

Navigate to a folder of your choosing, this is where your downloaded eBooks will be stored

Create the script using touch and set permissions to executable so that you can run it.

touch ./gutenbergDownloader.sh && chmod +x ./gutenbergDownloader.sh

Books are added quite frequently, so the amount of books is probably now a lot higher than 70000

Open gutenbergDownloader.sh with VIM

vi ./gutenbergDownloader.sh

Copy the codeblock below and paste it into gutenbergdownloader.sh

#!/bin/bash

beginBookNum=1
totalBooks=70000
urlPrefix="https://www.gutenberg.org"
targetDir='/mnt/d/ebooks/gutenberg-new/2022/05'

mkdir -p $targetDir
cd $targetDir

for i in $(seq $beginBookNum $totalBooks);
do
echo "Processing "$i
mkdir $i
sessionId=$(curl --silent -c - $urlPrefix/1 | tail -1 | awk '{print $NF}')
dirtyTitle=$(curl --silent --silent $urlPrefix/ebooks/$i 2>&1 | grep -oP '<h1(?:\s[^>]*)?>\K.*?(?=</h1>)' | sed s/\'//g | sed s/\,//g | sed s/\"//g | sed 's/ /-/g' | sed s/[.]//g | sed 's/[()]//g' | sed 's/[!]//g' | sed 's/[?]//g')
title=`echo $dirtyTitle | tr -cd '[:alnum:]._-'`
# Create human friendly title
humanTitle=`echo $title | sed s/-/" "/g`
# Find the HTML5 link on the website
txtUrl=`curl --silent $urlPrefix/ebooks/$i 2>&1 | grep 'HTML5' | grep -o -P '(?<=a href=).*(?=type)' | tr -d '"'`
# Download the HTML5 file and convert to PDF
html2pdf $urlPrefix/$txtUrl $i/$i.pdf
# Download the cover art
wget --output-document=$title.jpg `curl --silent --silent $urlPrefix/ebooks/$i 2>&1 | grep 'cover-art' | grep -o -P '(?<=://).*(?=title)' | tr -d '"'`
# Move the cover art to the correct location
mv $title.jpg $i/$title.jpg
# Copy everything to CSV for import
echo $i,2,ebooks,2022-05-03,,,/wp-content/uploads/edd/2022/05/$i/$title.jpg,,0,,,Published,,$humanTitle,/wp-content/uploads/edd/2022/05/$i/$i.pdf,,, >> $targetDir/index.csv
done

Each eBook will be added to a folder

And within the folder you will have your file

--

--