Standard Ebooks

Producing an Ebook, Step by Step

This guide is meant to take you step-by-step through the creation of a complete Standard Ebook. While it might seem a little long, most of the text is a description of how to use various automated scripts. It can take just an hour or two for an experienced producer to produce a draft ebook for proofreading (depending on the complexity of the ebook, of course).

  1. Download and set up the Standard Ebooks production tools

    The Standard Ebooks project has a series of tools that will help you produce an ebook. You can download them at Github:

    cd ~/ git clone https://github.com/standardebooks/tools.git

    Check the README.md file for instructions on how to install the few dependencies the tools require.

    We’ll assume you’ve installed the SE tools to ~/tools/.

  2. Select an ebook to produce

    The best place to look for public domain ebooks to produce is Project Gutenberg. If downloading from Gutenberg, be careful of the following:

    • There may be different versions of the same publication on Gutenberg, and the best one might not be the one with the most downloads. In particular, there could be a better translation that has fewer downloads because it was produced later, or there could be a version with better HTML markup. A great example of this phenomenon is the Gutenberg version of 20,000 Leagues Under the Seas. The most-downloaded version is an old translation widely criticized as being slapdash and inaccurate. The less popular version is a fresh, modern translation dedicated to the public domain.

    • Gutenberg usually offers both an HTML version and an epub version of the same ebook. Note that one is not always exactly the same as the other! A casual reader might assume that the HTML version is generated from the epub version, or the other way around; but for some reason the HTML and epub versions often differ in important ways, with the HTML version typically using fewer useless CSS classes, and including <em> tags that the epub version is often missing.

    Picking either the HTML or the epub version is fine as a starting point, but make sure to pick the one that appears to be the most accurate.

    For this guide, we’ll use The Strange Case of Dr. Jekyll and Mr. Hyde, by Robert Louis Stevenson. If you search for it on Gutenberg, you’ll find that there are two versions; the most popular one is a poor choice to produce, because the transcriber included the page numbers smack in the middle of the text! What a pain those’d be to remove. The less popular one is a better choice to produce, because it’s a cleaner transcription.

  3. Locate page scans of your book online

    As you produce your book, you’ll want to check your work against the actual page scans. Often the scans contain formatting that is missing from the source transcription. For example, older transcriptions sometimes throw away italics entirely, and you’d never know unless you looked at the page scans. So finding page scans is essential.

    Below are some good sources for page scans:

    Each of those sources allows you to filter results by publication date, so make sure you select 1922 and earlier to ensure they’re in the US public domain.

    If you can’t find scans of your book at the above sources, and you’re using a Project Gutenberg transcription as source material, there’s a good chance that PGDP (the sister project of Project Gutenberg that does the actual transcriptions) has a copy of the scans they used accessible in their archives. You should only use the PGDP archives as a last resort; because their scans are not searchable, verifying typos becomes extremely time-consuming.

    Please keep the following important notes in mind when searching for page scans:

    • Make sure the scans you find are published in 1922 or earlier. You must verify the copyright page in the page scans before proceeding.

    • Often you’ll find different editions, published at different times by different publishers, for the same book. It’s worth the effort to quickly browse through each different one to get an idea of the kinds of changes the different publishers introduced. Maybe one edition is better than another!

    You’ll enter a link to the page scans you used in the content.opf metadata as a <dc:source> element.

  4. Create a Standard Ebooks epub skeleton

    An epub file is just a bunch of files arranged in a particular folder structure, then all zipped up. That means editing an epub file is as easy as editing a bunch of text files within a certain folder structure, then creating a zip file out of that folder.

    You can’t just arrange files willy-nilly, though—the epub standard expects certain files in certain places. So once you’ve picked a book to produce, create the basic epub skeleton in a working directory. The create-draft tool will create a basic Standard Ebooks epub folder structure, initialize a git repository within it, and prefill a few fields in content.opf (the file that contains the ebook’s metadata).

    1. With the --gutenberg-ebook-url option

      You can pass the create-draft tool the URL for the Project Gutenberg ebook, and the tool will try to download the ebook into ./src/epub/text/body.xhtml and prefill a lot of metadata for you:

      ~/tools/create-draft --author="Robert Louis Stevenson" --title="The Strange Case of Dr. Jekyll and Mr. Hyde" --gutenberg-ebook-url="https://www.gutenberg.org/ebooks/43" cd robert-louis-stevenson_the-strange-case-of-dr-jekyll-and-mr-hyde/

      Because Project Gutenberg ebooks are produced in different ways by different people, create-draft has to make some guesses and it might guess wrong. Make sure to carefully review the data it prefills into ./src/epub/text/body.xhtml, ./src/epub/text/colophon.xhtml, and ./src/epub/content.opf.

      In particular, make sure that the Project Gutenberg license is stripped from ./src/epub/text/body.xhtml, and that the original transcribers in ./src/epub/text/colophon.xhtml and ./src/epub/content.opf are presented correctly.

    2. Without the --gutenberg-ebook-url option

      If you prefer to do things by hand, that’s an option too.

      ~/tools/create-draft --author="Robert Louis Stevenson" --title="The Strange Case of Dr. Jekyll and Mr. Hyde" cd robert-louis-stevenson_the-strange-case-of-dr-jekyll-and-mr-hyde/

      Now that we have the skeleton up, we’ll download Gutenberg’s HTML file for Jekyll directly into text/ folder and name it body.xhtml.

      wget -O src/epub/text/body.xhtml https://www.gutenberg.org/files/43/43-h/43-h.htm

      Many Gutenberg books were produced before UTF-8 became a standard, so we may have to convert to UTF-8 before we start work. First, check the encoding of the file we just downloaded. (Mac OS users, try file -I.)

      file -bi src/epub/text/body.xhtml

      The output is text/html; charset=iso-8859-1. That’s the wrong encoding!

      We can convert that to UTF-8 with iconv:

      iconv --from-code="ISO-8859-1" --to-code="UTF-8" < src/epub/text/body.xhtml > src/epub/text/tmp mv src/epub/text/tmp src/epub/text/body.xhtml
  5. Do a rough cleanup of the source text and perform the first commit

    If you inspect the folder we just created, you’ll see it looks something like this:

    A tree view of a new Standard Ebooks draft folder

    You can learn more about what the files in a basic Standard Ebooks source folder are all about before you continue.

    Now that we’ve got the source text, we have to do some very broad cleanup before we perform our first commit:

    • Remove the header markup and everything, including any Gutenberg text and the work title, up to the beginning of the actual public domain text. We’ll add our own header markup to replace what we’ve removed later.

      Jekyll doesn’t include front matter like an epigraph or introduction; if it did, that sort of stuff would be left in, since it’s part of the main text.

    • This edition of Jekyll includes a table of contents; remove that too. Standard Ebooks uses the ToC generated by the ereader, and doesn’t include one in the readable text.

    • Remove any footer text and markup after the public domain text ends. This includes the Gutenberg license—but don’t worry, we’ll credit Gutenberg in the colophon and metadata later. If you used the --gutenberg-ebook-url option with the create-draft tool, then it may have already stripped the license for you, and included some Gutenberg metadata already.

    Now our source file looks something like this:

    <h2> STORY OF THE DOOR </h2> <p> Mr. Utterson the lawyer was a man of a rugged countenance that was never lighted by a smile; cold, scanty and embarrassed in discourse; backward in <!--snip all the way to the end...--> proceed to seal up my confession, I bring the life of that unhappy Henry Jekyll to an end. </p>

    Now that we’ve removed all the cruft from the top and bottom of the file, we’re ready for our first commit.

    Please use the following commit message for consistency with the rest of our ebooks:

    git add -A git commit -m "Initial commit"
  6. Split the source text at logical divisions

    The file we downloaded contains the entire work. Jekyll is a short work, but for longer work it quickly becomes impractical to have the entire text in one file. Not only is it a pain to edit, but ereaders often have trouble with extremely large files.

    The next step is to split the file at logical places; that usually means at each chapter break. For works that are contain their chapters in larger “parts,” the part division should also be its own file. For example, see Treasure Island.

    To split the work, we use the split-file tool. split-file takes a single file and breaks it in to a new file every time it encounters the markup <!--se:split-->. The tool automatically includes basic header and footer markup in each split file.

    Notice that in our source file, each chapter is marked with an h2 tag. We can use that to our advantage and save ourselves the trouble of adding the <!--se:split--> markup by hand:

    perl -pi -e "s/<h2/<\!--se:split--><h2/g" src/epub/text/body.xhtml

    (Note the slash before the ! for compatibility with some shells.)

    Now that we’ve added our markers, we split the file. split-file puts the results in our current directory and conveniently names them by chapter number.

    ~/tools/split-file src/epub/text/body.xhtml mv chapter* src/epub/text/

    Once we’re happy that the source file has been split correctly, we can remove it.

    rm src/epub/text/body.xhtml
  7. Clean up the source text

    If you open up any of the chapter files we now have in the src/epub/text/ folder, you’ll notice that the code isn’t very clean. Paragraphs are split over multiple lines, indentation is all wrong, and so on.

    If you try opening a chapter in a web browser, you’ll also likely get an error if the chapter includes any HTML entities, like &mdash;. This is because Gutenberg uses plain HTML, which allows entities, but epub uses XHTML, which doesn’t.

    We can fix all of this pretty quickly using the clean tool. clean accepts as its argument the root of a Standard Ebook directory, and with the --single-lines option it’ll remove the hard line wrapping that Gutenberg is fond of. We’re already in the root, so we pass it ..

    ~/tools/clean --single-lines .

    Things look much better now, but we’re not perfect yet. If you open a chapter you’ll notice that the <p> and <h2> tags have a space between the tag and the text. We can clean that up with a few perl commands.

    perl -pi -e "s/<(p|h2)>\s+/<\1>/g" src/epub/text/chapter* perl -pi -e "s/\s+<\/(p|h2)>/<\/\1>/g" src/epub/text/chapter*

    Finally, we have to do a quick runthrough of each file by hand to cut out any lingering Gutenberg markup that doesn’t belong. In Jekyll, notice that each chapter ends with some extra empty divs and ps. These were used by the original transcriber to put spaces between the chapters, and they’re not necessary anymore, so remove them before continuing.

    Now our chapter 1 source looks like this:

    <?xml version="1.0" encoding="UTF-8"?> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:prefix="z3998: http://www.daisy.org/z3998/2012/vocab/structure/, se: http://standardebooks.org/vocab/1.0" xml:lang="en-US"> <head> <title>Chapter 1</title> <link href="../css/core.css" rel="stylesheet" type="text/css"/> <link href="../css/local.css" rel="stylesheet" type="text/css"/> </head> <body epub:type="bodymatter z3998:fiction"> <section id="chapter-1" epub:type="chapter"> <h2>STORY OF THE DOOR</h2> <p>Mr. Utterson the lawyer was a man of a rugged countenance...</p> <!--snip all the way to the end...--> <p>"With all my heart," said the lawyer. "I shake hands on that, Richard."</p> </section> </body> </html>

    If you look carefully, you’ll notice that the <html> tag has the xml:lang="en-US" attribute, even though our source text uses British spelling! We have to change the xml:lang attribute for the source files to match the actual language, which in this case is en-GB. Let’s do that now:

    perl -pi -e "s|en-US|en-GB|g" src/epub/text/chapter*

    Note that we don’t change the language for the metadata or front/back matter files, like content.opf, titlepage.xhtml, or colophon.xhtml. Those must always be in American spelling, so they’ll always have the en-US language tag.

  8. Typogrify the source text and perform the second commit

    Now that we have a clean starting point, we can start getting the real work done. The typogrify tool can do a lot of the heavy lifting necessary to bring an ebook up to Standard Ebooks typography standards.

    Like clean, typogrify accepts as its argument the root of a Standard Ebook directory.

    ~/tools/typogrify .

    Among other things, typogrify does the following:

    • Converts straight quotes to curly quotes;

    • Adds no-break spaces where appropriate for some common abbreviations;

    • Normalizes ellipses;

    • Normalizes spacing in em-, en-, and double-em-dashes, as well as between nested quotation marks, and adds word joiners.

    You can run typogrify as many times as you want on a source directory; it should always produce the same result, regardless of what state the source directory was in when you ran it.

    While typogrify does a lot of work for you, each ebook is totally different so there’s almost always more work to do that can only be done by hand. In Jekyll, you’ll notice that the chapter titles are in all caps. The SE standard requires chapter titles to be in title case, and the titlecase tool can do that for us.

    titlecase accepts a string as its argument, and outputs the string in title case. Many text editors allow you to configure external macros—perfect for creating a keyboard shortcut to run titlecase on selected text.

    Typography checklist

    There are many things that typogrify isn’t well suited to do automatically. Check our complete typography manual to see exactly how to format the work. Below is a brief, but incomplete, list of common issues that arise in ebooks:

    • Typography rules for coordinates. Use the prime and double prime glyphs for coordinates. These regexes helps match and replace coordinates: |([0-9])+’|\1′|g, |([0-9])+”|\1″|g

    • Typography rules for ampersands in names. This regex helps match candidates: [a-zA-Z]\.?\s*&\s*[a-zA-Z]

    • Typography rules for text in all caps. Text in all caps is almost never correct, and should either be converted to lowercase with the <em> tag (for spoken emphasis), <strong> (for extreme spoken emphasis), or <b> (for unsemantic small caps, like in storefront signs). This regex helps find candidates: [A-Z]{3,}

    • Sometimes typogrify doesn’t close quotation marks near em-dashes correctly. Try to find such instances with this regex: —[’”][^<\s]

    • Two-em dashes should be used for elision.

    • Commas and periods should generally be inside quotation marks, not outside. This regex helps find them: [’”][,.]

    The second commit

    Once you’ve run typogrify and you’ve searched the work for the common issues above, you can perform your second commit.

    git add -A git commit -m "Typogrify"
  9. Convert footnotes to endnotes and add a list of illustrations

    Works often include footnotes, either added by an annotator or as part of the work itself. Since ebooks don’t have a concept of a “page,” there’s no place for footnotes to go. Instead, we convert footnotes to a single endnotes file, which will provide popup references in the final epub.

    The endnotes file and the format for endnote links are standardized in the semantics manual.

    If a work has illustrations besides the cover and title pages, we include a “list of illustrations” at the end of the book, after the endnotes but before the colophon. The LoI file is also standardized in the semantics manual.

    Jekyll doesn’t have any footnotes, endnotes, or illustrations, so we skip this step.

  10. Converting British quotation to American quotation

    If the work you’re producing uses British quotation style (single quotes for dialog versus double quotes in American), we have to convert it to American style. We use American style in part because it’s easier to programmatically convert from American to British than it is to convert the other way around. Skip this step if your work is already in American style.

    Standard Ebooks has a tool called british2american that helps with the conversion. Your work must already be typogrified (the previous step in this guide) for the script to work.

    ~/tools/british2american .

    While british2american tries its best, thanks to the quirkiness of English punctuation rules it’ll invariably mess some stuff up. Proofreading is required after running the conversion.

    After you’ve run the conversion, do another commit.

    git add -A git commit -m "Convert from British-style quotation to American style"
  11. Add semantics

    Part of the Standard Ebooks project is adding meaningful semantics wherever possible in the text. The semanticate tool does a little of that for us—for example, for some common abbreviations—but much of it has to be done by hand.

    Adding semantics means two things:

    1. Using meaningful tags to mark up the work: <em> when conveying emphatic speech instead of <i>, <abbr> to wrap abbreviations, <section> to mark structural divisions, using the xml:lang attribute to specify the language of a word or passage, and so on.

    2. Using the epub3 semantic inflection language to add deeper meaning to tags.

      Currently we use a mix of epub3 structural semantics, z3998 structural semantics for when the epub3 vocabulary isn’t enough, and our own SE semantics for when z3998 isn’t enough.

    Use the semanticate tool to do some common cases for you:

    ~/tools/semanticate .

    semanticate tries its best to correctly add semantics, but sometimes it’s wrong. For that reason you should review the changes it made before accepting them:

    git difftool

    Beyond that, adding semantics is mostly a by-hand process. See our semantics manual for a detailed list of the kinds of semantics we expect in a Standard Ebook.

    Here’s a short list of some of the more common semantic issues you’ll encounter:

    After you’ve added semantics according to the semantics manual, do another commit.

    git add -A git commit -m "Semanticate"
  12. Modernize spelling and hyphenation

    Many older works use outdated spelling and hyphenation that would distract a modern reader. (For example, “to-night” instead of “tonight”). modernize-spelling is a tool to automatically remove hyphens from words that used to be compounded, but aren’t anymore in modern English spelling.

    Do run this tool on prose. Don’t run this tool on poetry.

    ~/tools/modernize-spelling .

    After you run the tool, you must check what the tool did to confirm that each removed hyphen is correct. Sometimes the tool will remove a hyphen that needs to be included for clarity, or one that changes the meaning of the word, or it may result in a word that just doesn’t seem right. Re-introducing a hyphen is OK in these cases.

    Here’s a real-world example of where modernize-spelling made the wrong choice: In The Picture of Dorian Gray chapter 11, Oscar Wilde writes:

    He possessed a gorgeous cope of crimson silk and gold-thread damask…

    modernize-spelling would replace the dash in gold-thread so that it reads goldthread. Well goldthread is an actual word, which is why it’s in our dictionary, and why the script makes a replacement—but it’s the name of a type of flower, not a golden fabric thread! In this case, modernize-spelling made an incorrect replacement, and we have to change it back.

    git provides a handy way for us to visualize these differences:

    git difftool

    After you’ve reviewed the changes that the tool made, do another commit. This commit is important, because it gives purists an avenue to revert modernizing changes to the original text.

    Note how we preface this commit with "[Editorial]". Any change you make to the source text that can be considered a modernization or editorial change should be prefaced like this, so that the git history can be easily searched by people looking to revert changes.

    git add -A git commit -m "[Editorial] Modernize hyphenation and spelling"
  13. Modernize spacing in select words

    Over time, spelling of certain common two-word phrases has evolved into a single word. For example, “someone” used to be the two-word phrase “some one,” which would read awkwardly to modern readers. This is our chance to modernize such phrases.

    Note that we use the interactive-sr tool to perform an interactive search and replace, instead of doing a global, non-interactive search and replace. This is because some phrases caught by the regular expression should not be changed, depending on context. For example, "some one" in the following snippet from Anton Chekhov’s short fiction should not be corrected:

    He wanted to think of some one part of nature as yet untouched...

    Use the following regular expression invocations to correct a certain set of such phrases:

    ~/tools/interactive-sr "/\v([Ss])ome one/\1omeone/" src/epub/text/* git add -A git commit -m "[Editorial] some one -> someone" ~/tools/interactive-sr "/\v(<[Aa])ny one/\1nyone/" src/epub/text/* git add -A git commit -m "[Editorial] any one -> anyone" ~/tools/interactive-sr "/\v([Ee])very one(\s+of)@\!/\1veryone/" src/epub/text/* git add -A git commit -m "[Editorial] every one -> everyone" ~/tools/interactive-sr "/\v([Ee])very thing/\1verything/" src/epub/text/* git add -A git commit -m "[Editorial] every thing -> everything" ~/tools/interactive-sr "/\v(<[Aa])ny thing/\1nything/" src/epub/text/* git add -A git commit -m "[Editorial] any thing -> anything" ~/tools/interactive-sr "/\v([Ff])or ever(>)/\1orever\2/" src/epub/text/* git add -A git commit -m "[Editorial] for ever -> forever" ~/tools/interactive-sr "/\v(in\s+)@<\!(<[Aa])ny way/\1nyway/" src/epub/text/* git add -A git commit -m "[Editorial] any way -> anyway" ~/tools/interactive-sr "/\v([Yy])our self/\1ourself/" src/epub/text/* git add -A git commit -m "[Editorial] your self -> yourself" ~/tools/interactive-sr "/\v([Mm])ean time/\1eantime/" src/epub/text/* git add -A git commit -m "[Editorial] mean time -> meantime" ~/tools/interactive-sr "/\v([Aa])ny how/\1nyhow/" src/epub/text/* git add -A git commit -m "[Editorial] any how -> anyhow" ~/tools/interactive-sr "/\v([Aa])ny body/\1nybody/" src/epub/text/* git add -A git commit -m "[Editorial] any body -> anybody" ~/tools/interactive-sr "/\v([Ee])very body/\1verybody/" src/epub/text/* git add -A git commit -m "[Editorial] every body -> everybody"
  14. Create the cover image

    Cover images for Standard Ebooks books have a standardized layout. The bulk of the work you’ll be doing is locating a suitable public domain painting to use. See our complete art manual for details on assembling a cover image.

    As you search for an image, keep the following in mind:

    • Cover images must be in the public domain. Thanks to quirks in copyright law, this is harder to decide for paintings than it is for published writing. In general, Wikipedia is a good starting point for deciding if a work is in the public domain, but very careful research is required to confirm that status.

    • Find the largest possible cover image you can. Since the final image is 1400 × 2100, having to resize a small image will greatly reduce the quality of the final cover.

    • The image you pick should be a “fine art” oil painting so that all Standard Ebooks have a consistent cover style. This is actually easier than you think, because it turns out most public domain artwork is from the era of fine art.

    • You must provide proof of public domain status to the SE Editor-in-Chief in the form of a page scan of the painting from a 1922-or-older book, and the Editor-in-Chief must approve your selection before you can commit it to your repository.

    • The Standard Ebooks lead has the final say on the cover image you pick, and it may be rejected for, among other things, poor public domain status research, being too low resolution, or not fitting in with the “fine art” style.

    What can we use for Jekyll? In 1885 Albert Edelfelt painted a portrait of Louis Pasteur in a laboratory. A crop of the lab equipment would be a good way to represent Dr. Jekyll’s lab.

    The cover file itself, cover.svg, is easy to edit. It automatically links to cover.jpg. All you have to do is open cover.svg with a text editor and edit the title and author. Make sure you have the League Spartan font installed on your system!

    After we’re done with the cover, we’ll have three files in ./images/:

    • cover.source.jpg is the raw image file we used for the cover. We keep it in case we want to make adjustments later. For Jekyll, this would be the raw Pasteur portrait downloaded from Wikipedia.

    • cover.jpg is the scaled cover image that cover.svg links to. This file is exactly 1400 × 2100. For Jekyll, this is a crop of cover.source.jpg that includes just the lab equipment, and resized up to our target resolution.

    • cover.svg is the completed cover image with the title and author. The build-images tool will take cover.svg, embed cover.jpg, convert the text to paths, and place the result in ./src/epub/images/ for inclusion in the final epub.

  15. Create the titlepage image, build both the cover and titlepage, and commit

    Titlepage images for Standard Ebooks books are also standardized. See our the art manual for details.

    The create-draft tool already created a completed titlepage for you. If the way it arranged the lines doesn’t look great, you can always edit the titlepage to make the arrangement of words on each line more aesthetically pleasing. Don’t use a vector editing program like Inkscape to edit it. Instead, open it up in your favorite text editor and type the values in directly.

    The source images for both the cover and the titlepage are kept in ./images/. Since the source images refer to installed fonts, and since we can’t include those fonts in our final ebook without having to include a license, we have to convert that text to paths for final distribution. The build-images tool does just that.

    ~/tools/build-images .

    This tool takes both ./images/cover.svg and ./images/titlepage.svg, converts text to paths, and embeds the cover jpg. The output goes to ./src/epub/images/.

    Once we built the images successfully, perform a commit.

    git add -A git commit -m "Add cover and titlepage images"
  16. Complete the table of contents

    The table of contents is a structured document that should let the reader easily navigate the book. In a Standard Ebook, it’s stored outside of the readable text directory with the assumption that the reading system will parse it and display a navigable representation for the user.

    For now, you can copy and paste a ToC file from a different Standard Ebook to get an idea of what they should look like. Once you’re done, commit.

    git add -A git commit -m "Add ToC"
  17. Complete content.opf

    content.opf is the file that contains the ebook metadata like author, title, description, and reading order. Most of it will be filling in that basic information, and including links to various resources related to the text.

    The content.opf is standardized. Please see our extensive Metadata Manual for details on how to fill out content.opf.

    As you complete the metadata, you’ll have to order the spine and the manifest in this file. Fortunately, Standard Ebooks has a tool for that too: print-manifest-and-spine. Run this on our source directory and, as you can guess, it’ll print out the <manifest> and <spine> tags for this work.

    ~/tools/print-manifest-and-spine .<manifest> <item href="css/core.css" id="core.css" media-type="text/css"/> <item href="css/local.css" id="local.css" media-type="text/css"/> <item href="images/cover.svg" id="cover.svg" media-type="image/svg+xml" properties="cover-image"/> <item href="images/logo.svg" id="logo.svg" media-type="image/svg+xml"/> <item href="images/titlepage.svg" id="titlepage.svg" media-type="image/svg+xml"/> <!--snip all the way to the end..--> <item href="text/colophon.xhtml" id="colophon.xhtml" media-type="application/xhtml+xml"/> <item href="text/titlepage.xhtml" id="titlepage.xhtml" media-type="application/xhtml+xml" properties="svg"/> <item href="text/unlicense.xhtml" id="unlicense.xhtml" media-type="application/xhtml+xml"/> </manifest> <spine> <itemref idref="titlepage.xhtml"/> <!--snip all the way to the end..--> <itemref idref="colophon.xhtml"/> <itemref idref="unlicense.xhtml"/> </spine>

    The manifest is already in the correct order and doesn’t need to be edited. The spine, however, will have to be reordered to be in the correct reading order. Once we’ve done that, paste it in to content.opf and commit.

    git add -A git commit -m "Complete content.opf"
  18. Complete the colophon

    The create-draft tool put a skeleton colophon.xhtml file in the ./src/epub/text/ folder. Now that we have the cover image and artist, we can fill out the various fields there. Make sure to credit the original transcribers of the text (generally we assume them to be whoever’s name is on the file we download from Gutenberg) and to include a link back to the Gutenberg text we used, along with a link to any scans we used (from archive.org or hathitrust.org, for example).

    You can also include your own name as the producer of this Standard Ebooks edition. Besides that, the colophon is standardized; don’t get too creative with it.

    The release and updated dates should be the same for the first relase, and they should match the dates in content.opf. For now, leave them unchanged, as the prepare-release tool will automatically fill them in for you as we’ll describe later in this guide.

    git add -A git commit -m "Complete the colophon"
  19. Complete the imprint

    There’s also a skeleton imprint.xhtml file in the ./src/epub/text/ folder. All you’ll have to change here is the links to the transcription and page scans you used.

  20. Clean and lint before building

    Before you build the final ebook for you to proofread, it’s a good idea to check the ebook for some common problems you might run in to during production.

    First, run clean one more time to both clean up the source files, and to alert you if there are XHTML parsing errors. Even though we ran the clean tool before, it’s likely that in the course of production the ebook got in to less-than-perfect markup formatting. Remember you can run clean as many times as you want—it should always produce the same output.

    If you’re using a Mac, and thus the badly-behaved Finder program, you may find that it has carelessly polluted your work directory with useless .DS_Store files. Before continuing, you should find a better file manager program, then delete all of that litter with the following command:

    find . -name ".DS_Store" -type f -delete

    Next, run the lint tool. If your ebook has any problems, you’ll see some output listing them. If everything’s OK, then lint will complete silently.

    ~/tools/clean . ~/tools/lint .
  21. Build and proofread, proofread, proofread!

    At this point we’re just about ready to build our proofreading draft! The build tool does this for us. We’ll run it with the --check flag to make sure the epub we produced is valid, and with the --kindle and --kobo flag to build a file for Kindles and Kobos too. If you won’t be using a Kindle or Kobo, you can omit those flags.

    ~/tools/build --output-dir=$HOME/dist/ --kindle --kobo --check .

    If there are no errors, we’ll see five files in the brand-new ~/dist/ folder in our home directory:

    • the-strange-case-of-dr-jekyll-and-mr-hyde.epub3 is a pure epub3 file—basically just a zipped up version of our source. Unfortunately most ebook readers don’t fully support all of epub3’s capabilities yet, so we’re more interested in…

    • the-strange-case-of-dr-jekyll-and-mr-hyde.epub, the epub2 version of our ebook. If you don’t have a Kindle, this is the file you’ll be using to proofread.

    • the-strange-case-of-dr-jekyll-and-mr-hyde.kepub.epub is the Kobo version of our ebook. You can copy this to a Kobo using a USB cable.

    • the-strange-case-of-dr-jekyll-and-mr-hyde.azw3 is the Kindle version of our ebook. You can copy this to a Kindle using a USB cable.

    • thumbnail_xxxx_EBOK_portrait.jpg is a thumbnail file you can copy to your Kindle to have the cover art appear in your reader. A bug in Amazon’s software prevents the Kindle from reading cover imags in side-loaded files; contact Amazon to complain.

    This is the step where you read the ebook and make adjustments to the text so that it conforms to our typography manual.

    All Standard Ebooks productions must be proofread at this stage to confirm that there are no typos, formatting errors, or typography errors. It’s extremely common for transcriptions sourced from Gutenberg to have various typos and formatting errors (like missing italics), and it’s also not uncommon for one of Standard Ebook’s tools to make the wrong guess about things like a closing quotation mark somewhere. As you proofread, it’s extremely handy to have a print copy of the book with you. For famous books that might just be a trip to your local library. For rarer books, or for those without a library nearby, there are several sites that provide free digital scans of public domain writing:

    If you end up using scans from one of these sources, you must mention it in the ebook’s colophon and as a <dc:source> item in content.opf.

  22. Initial publication

    Now that we’ve proofread the work and corrected any errors we’ve found, we’re ready to release the finished ebook!

    It’s a good idea to run typogrify and clean one more time before releasing. Make sure to review the changes with git difftool before accepting them—typogrify is usually right, but not always!

    • If you’re submitting your ebook to Standard Ebooks for review:

      Don’t run prepare-release on an ebook you’re submitting for review!

      Contact the mailing list with a link to your GitHub repository to let them know you’re finished. A reviewer will review your production and work with you to fix any issues. They’ll then release the ebook for you.

    • If you’re producing this ebook for yourself, not for release at Standard Ebooks:

      Complete the initial publication by adding a release date, modification date, and final word count to content.opf and colophon.xhtml. The prepare-release tool does all of that for us.

      ~/tools/prepare-release .

      With that done, we commit again using a commit message of “Initial publication” to signify that we’re all done with production, and now expect only proofreading corrections to be committed. (This may not actually be the case in reality, but it’s still a nice milestone to have.)

      git add -A git commit -m "Initial publication"

    Finally, build everything again.

    ~/tools/build --output-dir=$HOME/dist/ --kindle --kobo --check .

    If the build completed successfully, congratulations! You’ve just finished producing a Standard Ebook!