Standard Ebooks

Structure and Semantics Manual

General do’s and don’ts

  • Don’t wrap source code to a certain column width. This makes it difficult to search through source code for a particular sentence, because a line break could be anywhere. Instead, use the clean tool in the Standard Ebooks toolset to format XHTML source code, and the word wrap feature in your text editor to make long paragraphs readable in the source code.

  • You have the full vocabulary of HTML5 at your disposal, so use semantically-appropriate elements whenever possible. Don’t settle for a <div> when a <blockquote> or <section> would be more descriptive.

    <div class="quotation">

    <blockquote>

  • Don’t style elements with inline CSS. Prefer clever CSS selectors first, then prefer CSS classes.

  • When styling with CSS classes, use semantic class names. Name classes based on what they’re styling, not based on a description of how their style looks.

    <div class="small-caps">

    <blockquote class="inscription">

  • Don’t use <pre> elements to format text requiring tricky spacing, like poetry. There should never be a <pre> element in a Standard Ebook. See the poetry section for patterns to use to format poetry. Anything can be formatted with CSS if you give it a little thought!

Semantic inflection

The epub spec allows for semantic inflection, which is a way of adding pertinent semantic metadata to certain elements. For example, you may want to convey that the contents of a certain <section> are actually a part of a chapter. You would do that by using the epub:type attribute:

<section epub:type="chapter">...</section>

The epub spec includes a list of supported keywords that you can use in the epub:type attribute. Many of these keywords apply to content divisions, like chapter breaks, prefaces, introductions, and so on.

An additional spec, the sexily-named z39.98-2012 Structural Semantics vocabulary, gives us a more robust vocabulary for adding semantic inflection. This vocabulary includes ways of marking fiction vs. non-fiction, letters, poetry, and so on. All Standard Ebooks include a reference to this vocabulary by default, so you should use it if the regular epub vocabulary isn’t enough.

Finally, Standard Ebooks has its own vocabulary to add even more finely grained semantics. For example, names of ships are italicized with the <i> element. But to convey that the otherwise-meaningless <i> element contains the name of a ship, we would add the Standard Ebooks semantic inflection of se:name.vessel.ship:

They set sail on the <abbr class="initialism">HMS</abbr> <i epub:type="se:name.vessel.ship">Bounty</i>.

In a perfect world, Standard Ebooks wouldn’t have to maintain its own list of semantic vocabulary. We’re actively looking for a suitable replacement—if you have a suggestion, get in touch!

Semantic inflection in Standard Ebooks

Part of the Standard Ebooks mission is to add as much semantic information to ebooks as possible. To that end, use semantic inflection liberally and in detail. Since we have so many vocabulary options to use, use them in this order of preference:

  1. The built-in epub vocabulary. If what you’re trying to mark up is here, use this first.

  2. The z3998 vocabulary. If something isn’t included in the regular epub vocabulary, stop here next.

  3. The Standard Ebooks vocabulary. If neither the regular epub vocabulary nor the z33998 vocabulary have a keyword you’re looking for, check our own vocabulary. You can also suggest additions to this vocabulary.

XHTML and CSS code formatting style

In XHTML

The clean tool does a good job of pretty-printing your XHTML according to our requirements, so make sure to run it often. In case you want to review the style requirements, they are:

  • Use tabs for indentation.

  • Tags whose content is phrasing content should be on a single line. So, don’t open a <p> tag, then move to the next line for the tag’s contents; put it all on the same line.

  • Attributes should be in alphabetical order.

In CSS

  • Use tabs for indentation.

  • Always move to a new line for CSS properties. Even if the selector only has one property, don’t put the selector and the property on one line.

  • Where possible, properties should be in alphabetical order. (This isn’t always possible if you’re attempting to override a previous style in the same selector; in those cases that’s OK.)

Abbreviation semantic patterns

  • There are three types of abbreviations:

    An acronym is a term made up of initials and pronounced as one word: NASA, SCUBA, TASER.

    An initialism is a term made up of initials in which each initial is pronounced separately: ABC, HTML, CSS.

    A contraction is an abbreviation of a longer word: Mr., Mrs., lbs.

  • All abbreviations must be wrapped in an <abbr> element.

  • All abbreviations that include periods (for example, Latinisms) and terminate a clause must include the “eoc” (end-of-clause) class in the <abbr> element. Since a clause ending in an abbreviation omits the trailing period, it’s useful for us to know when such an abbreviation marks the end of a clause.

    Result Code
    He wanted to meet at 6:40  p.m. I was excited to see him! He wanted to meet at 6:40nbsp<abbr class="time eoc">p.m.</abbr> I was excited to see him!
  • Certain abbreviations should be marked up with a semantic class:

    • Acronyms

      Any acronym (defined above) that doesn’t fit in the categories below.

      <abbr class="acronym">NASA</abbr> received less funding than usual this year.
    • Initialisms

      Any initialism (defined above) that doesn’t fit in the categories below.

      There are harder languages than <abbr class="initialism">HTML</abbr>.
    • Abbreviated compass directions

      For example: N., S., S.W.

      He traveled <abbr class="compass">N. W.</abbr>, then <abbr class="compass eoc">E. S. E.</abbr>

      This regex is helpful in finding compass directions: [NESW]\.([NESW]\.)*?

    • Compounds

      Molecular compounds.

      <abbr class="compound">H<sub>2</sub>O</abbr>
    • Academic degrees

      Academic degrees, except ones that, like PhD, include a lowercase letter: BA, BD, BFA, BM, BS, DB, DD, DDS, DO, DVM, JD, LHD, LLB, LLD, LLM, MA, MBA, MD, MFA, MS, MSN.

      Judith Douglas, <abbr class="degree">DDS</abbr>.
    • Eras

      The abbreviations AD, BC, CE, BCE.

      Julius Caesar was born around 100 <abbr class="era">BC</abbr>.
    • Initialized names

      A person’s initials, either first name, last name, or both.

      <abbr class="name">J. P.</abbr> Morgan was a wealthy man.
    • State names and postal codes

      Abbreviated state names and postal codes: NY, Washington DC.

      Washington <abbr class="postal">DC</abbr>
    • Temperatures

      Abbreviated temperature scales: F, C. Also see the typography manual.

      It was 5°hairsp<abbr class="temperature">C</abbr> last night.
    • Times

      Time-related Latinisms: a.m., p.m. Also see the typography manual.

      5nbsp<abbr class="time">p.m.</abbr>
    • Timezones

      PST, CST, EST, etc.

      5nbsp<abbr class="time">p.m.</abbr> <abbr class="timezone">PST</abbr>

The <title> tag

The <title> tag should contain an appropriate description of the local file only.

Titles for files that are an individual chapter

In most ebook productions, each chapter will be its own file. In that case, follow these rules:

  • Don’t include the book title in individual chapter <title> tags.

  • Convert chapter numbers that are in Roman numerals to decimal numbers:

    <title>Chapter 10</title>
  • If a chapter has a subtitle, add a colon after the chapter number and place the subtitle after that:

    <title>Chapter 10: A Dark and Stormy Night</title>
  • Subtitles may often contain subtags, like <i>. Because <title> can’t contain subtags, simply remove them when copying into <title>:

    <title>Chapter 8: Mobilis in Mobili</title>

Ids

Each <section> should have an id attribute corresponding to a URL-friendly version of the <section>’s name. For example:

<section id="introduction" epub:type="introduction"> <h2 epub:type="title">Introduction</h2> <!--snip--> </section>

Occasionally you might need to give other elements IDs, for example when an endnote references a specific line or paragraph in the work. In these cases, name the IDs by their tag name, then a dash, then a number representing the tag’s sequential numerical order from the beginning of the containing document.

<section id="introduction" epub:type="introduction"> <h2 epub:type="title">Introduction</h2> <p>Some text...</p> <!--snip 10 more <p> tags--> <p id="p-12">Some text...</p> </section>

Ordered/numbered and unordered lists

All <li> children of <ol> and <ul> tags must have at least one direct child block-level tag. This is usually a <p> tag. (But not necessarily; for example, a <blockquote> tag might also be appropriate.)

<ul> <li> <p><b>Miss Oranthy Bluggage</b>, the accomplished Strong-Minded Lecturer, will deliver her famous Lecture on “<b>Woman and Her Position</b>,” at Pickwick Hall, next Saturday Evening, after the usual performances.</p> </li> </ul>

Blockquotes

In most prose works, we generally want to offset long quotations in the <blockquote> element. However, we want to be able to distinguish when a quotation is from a real-life source (like a quotation from a Shakespeare play), and when the quotation is fictional within the context of the work. To make this distinction, we assume that the <blockquote> inherits the z3998:fiction or z3998:non-fiction semantic inflection of its parent. Thus, if the <blockquote> contents differ from that inherited semantic inflection, we specify whether it’s z3998:fiction or z3998:non-fiction in the blockquote element itself.

For example, if a <blockquote> doesn’t have semantic inflection specified, and it’s within a <section epub:type="z3998:fiction"> parent, then the <blockquote> is also fictional within the context of the work.

In this first example from Dracula, we have a fictional character quoting a real-life source. Since Dracula is fiction but the quotation source exists in the real world, we include the z3998:non-fiction semantic inflection:

<section epub:type="chapter z3998:fiction"> <p>One of my companions whispered to another the line from Burger’s “Lenore”:―</p> <blockquote epub:type="z3998:non-fiction"> <p>“<span xml:lang="de">Denn die Todten reiten schnell</span>”⁠—<br/> (“For the dead travel fast.”)</p> </blockquote> </section>

In this second example from Twenty Thousand Leagues Under the Seas, a fictional character quotes another fictional character. We still use the <blockquote> element, but since the work itself is fiction, the <blockquote> “inherits” the semantic inflection of z3998:fiction:

<section epub:type="chapter z3998:fiction"> <p>Every morning, it was repeated under the same circumstances. It ran like this:</p> <blockquote xml:lang="x-nemo"> <p>“Nautron respoc lorni virch.”</p> </blockquote> </section>

Half title pages

When a work contains frontmatter like an epigraph or introduction, a half title page is required before the body matter begins.

Half title pages without subtitles

<?xml version="1.0" encoding="UTF-8"?> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:prefix="z3998: http://www.daisy.org/z3998/2012/vocab/structure/, se: http://standardebooks.org/vocab/1.0" xml:lang="en-GB"> <head> <title>Half Title</title> <link href="../css/core.css" rel="stylesheet" type="text/css"/> <link href="../css/local.css" rel="stylesheet" type="text/css"/> </head> <body epub:type="frontmatter"> <section id="halftitlepage" epub:type="halftitlepage"> <h1 epub:type="fulltitle">Don Quixote</h1> </section> </body> </html>

Half title pages with subtitles

section[epub|type~="halftitlepage"] span[epub|type~="subtitle"]{ display: block; font-size: .75em; font-weight: normal; } <?xml version="1.0" encoding="UTF-8"?> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:prefix="z3998: http://www.daisy.org/z3998/2012/vocab/structure/, se: http://standardebooks.org/vocab/1.0" xml:lang="en-GB"> <head> <title>Half Title</title> <link href="../css/core.css" rel="stylesheet" type="text/css"/> <link href="../css/local.css" rel="stylesheet" type="text/css"/> </head> <body epub:type="frontmatter"> <section id="halftitlepage" epub:type="halftitlepage"> <h1 epub:type="fulltitle"> <span epub:type="title">The Book of Wonder</span> <span epub:type="subtitle">A Chronicle of Little Adventures at the Edge of the World</span> </h1> </section> </body> </html>

Footnotes and endnotes

Since there’s no concept of a “page” in an ebook, the concept of “footnotes” isn’t very useful. (Where would a footnote go if there’s no bottom of the page?)

Modern ereading systems do, however, offer popup notes. Our task is to combine all footnotes present in a source text into a single endnotes file that provides popup notes to supported readers, and clearly listed notes for other readers.

Endnotes must be numbers that are sequential throughout the entire text. Since many books just used “*” to denote a footnote, when converting to endnotes we have to assign those a number.

Linking to endnotes

In the body text, you refer to an endnote using this pattern:

<p>This is some text followed by a reference to an endnote.<a href="../text/endnotes.xhtml#note-1" id="noteref-1" epub:type="noteref">1</a></p>
  • The id attribute is always “noteref-N” where N is the number of the endnote.

  • The epub:type attribute is set to “noteref”.

  • The endnote goes after ending punctuation.

The endnotes file

The endnotes file is called endnotes.xhtml and looks like this:

<?xml version="1.0" encoding="UTF-8"?> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:prefix="z3998: http://www.daisy.org/z3998/2012/vocab/structure/, se: http://standardebooks.org/vocab/1.0" xml:lang="en-GB"> <head> <title>Endnotes</title> <link href="../css/core.css" rel="stylesheet" type="text/css"/> <link href="../css/local.css" rel="stylesheet" type="text/css"/> </head> <body epub:type="backmatter z3998:fiction"> <section id="endnotes" epub:type="rearnotes"> <h2 epub:type="title">Endnotes</h2> <ol> <li id="note-1" epub:type="rearnote"> <p>The first endnote goes here.</p> <p>Here's another line for the first endnote. <a href="../text/chapter-1.xhtml#noteref-1" epub:type="se:referrer">↩</a></p> </li> <li id="note-2" epub:type="rearnote"> <p>The second endnote goes here. <a href="../text/chapter-1.xhtml#noteref-2" epub:type="se:referrer">↩</a></p> </li> </ol> </section> </body> </html>

An endnote dissected

  • Each individual endnote is a <li> element containing one or more <p> elements.

  • Each <li> requires the following attributes:

    • id is set to the string “note-” followed by the sequential endnote number, beginning with 1.

    • epub:type is set to “rearnote”.

  • The href attribute points to the direct anchor reference to the endnote.

  • If an endnote contains a citation offset with a dash (for example, “—Ed.”), separate the citation from the text with a single space and enclose it in the <cite> tag:

    <li id="note-1" epub:type="rearnote"> <p>Here’s an endnote. <cite>—<abbr class="eoc">Ed.</abbr></cite> <a href="../text/chapter-1.xhtml#note-1" epub:type="se:referrer">↩</a></p> </li>
  • The final <p> element in an endnote contains a link back to the referring anchor. Don’t just point it to the file, make sure it points to the exact link that we came from. For example, chapter-1.xhtml#note-1, not chapter-1.xhtml. If the link is the last element in a longer <p> tag, it must be preceded by one space character; if it is the only child of a <p> tag (for example if the previous text was a <blockquote>) then it can be on its own line. It must have the epub:type set to se:referrer. The text of the link is always the “↩” character.

Thought and section breaks

In printed material, thought and section breaks are typically denoted with a large space between paragraphs, or by a symbol like “* * *”. In Standard Ebooks, use the <hr/> element to denote thought and section breaks.

Section header semantic patterns

There’s a lot of ways authors choose to format chapter headers. Below are patterns for all of the types of chapter headers we’ve encountered so far. If nothing in this list fits the book you’re producing, contact us and we’ll work on a new standard.

General outline

  • The title of the book is an implied <h1> element. Therefore, all chapters titles are <h2> elements or lower.

  • It’s extremely rare to go down to <h3> and below, but you may do so if, for example, the chapter is part of a volume whose title would occupy the <h2> level. Otherwise, if you feel the need to use <h3>, ask yourself if the header is a structural division of the document. For example, in a work of fiction where a fictional newspaper clipping is presented, the headline would not be set in an <h3> element, because the clipping is part of the body text, not a structural division of the book.

Sections without titles

<h2 epub:type="title z3998:roman">XI</h2>

Sections with titles but no chapter numbers

<h2 epub:type="title">A Daughter of Albion</h2>

Sections with titles and chapter numbers

<h2 epub:type="title"> <span epub:type="z3998:roman">XI</span> <span epub:type="subtitle">Who Stole the Tarts?</span> </h2> span[epub|type~="subtitle"]{ display: block; font-weight: normal; }

Sections with unnumbered titles and subtitles

<h2 epub:type="title"> <span>An Adventure</span> <span epub:type="subtitle">(A Driver’s Story)</span> </h2> span[epub|type~="subtitle"]{ display: block; font-weight: normal; }

Sections with bridgeheads

Note that we include trailing punctuation at the end of the bridgehead. If it’s not present in the source text, add it.

Since the text in the bridgehead is italicized, we include CSS to render actual <i> elements contained in the bridgehead as normal text.

<header> <h2 epub:type="title z3998:roman">I</h2> <p epub:type="bridgehead">Which treats of the character and pursuits of the famous gentleman Don Quixote of La Mancha.</p> </header> section > header{ text-align: center; } [epub|type~="bridgehead"]{ display: inline-block; font-style: italic; margin: 0 auto 3em auto; max-width: 60%; text-align: justify; text-indent: 0; } [epub|type~="bridgehead"] i{ font-style: normal; }

Sections with epigraphs

<header> <h2 epub:type="title z3998:roman">XXVIII</h2> <blockquote epub:type="epigraph"> <p>Brief, I pray for you; for you see, ’tis a busy time with me.</p> <cite><i epub:type="se:name.publication.play">Much Ado About Nothing</i></cite> </blockquote> </header> section > header{ text-align: center; } [epub|type~="epigraph"]{ font-style: italic; hyphens: none; margin: 3em; margin-top: 0; text-align: left; text-indent: 0; display: inline-block; } [epub|type~="epigraph"] i{ font-style: normal; } [epub|type~="epigraph"] cite{ margin-top: 1em; font-style: normal; font-variant: small-caps; } [epub|type~="epigraph"] cite i{ font-style: italic; }

Letter semantic patterns

Coming soon!

Poetry, verse, and song semantic patterns

Unfortunately there’s no great way to semantically format poetry in HTML. We have to conscript unrelated elements for use in poetry.

General outline

  • A stanza is represented by a <p> element.

  • Each stanza contains <span> elements, each one representing a line in the stanza. Delimiting lines in <span> elements allows us to use CSS to automatically indent long lines that wrap across the page.

  • Each line is followed by a <br/> element, except for the last line in a stanza. Since <span> is an inline element, unstyled <span>s don’t have line breaks. Including a <br/> emulates line breaks for readers that for some crazy reason might not support CSS.

  • For indented lines, add the i1 class to the <span> element. Do not use nbsp for indentation. You can indent to multiple levels by incrementing the class to i2, i3, and so on, and including the appropriate CSS.

  • If the poem is a shorter part of a longer work, like a novel, then wrap the stanzas in a <blockquote> element.

  • If the poem is a standalone composition and part of a larger collection of poetry, wrap it in an <article> element instead. The semantics of <article> imply that the poem can be pulled out of the collection as a standalone item.

  • Give the containing element the semantic inflection of z3998:poem, z3998:verse, or z3998:song.

Complete HTML and CSS markup examples

Note that below we include CSS for the i2 class, even though it’s not used in the example; it’s included to demonstrate how to adjust the CSS for indentation levels after the first.

[epub|type~="z3998:poem"] p{ text-align: left; text-indent: 0; } [epub|type~="z3998:poem"] p > span{ display: block; text-indent: -1em; padding-left: 1em; } [epub|type~="z3998:poem"] p > span + br{ display: none; } [epub|type~="z3998:poem"] p + p{ margin-top: 1em; } [epub|type~="z3998:poem"] + p{ text-indent: 0; } p span.i1{ text-indent: -1em; padding-left: 2em; } p span.i2{ text-indent: -1em; padding-left: 3em; } <blockquote epub:type="z3998:poem"> <p> <span>“How doth the little crocodile</span> <br/> <span class="i1">Improve his shining tail,</span> <br/> <span>And pour the waters of the Nile</span> <br/> <span class="i1">On every golden scale!</span> </p> <p> <span>“How cheerfully he seems to grin,</span> <br/> <span class="i1">How neatly spread his claws,</span> <br/> <span>And welcome little fishes in</span> <br/> <span class="i1"><em>With gently smiling jaws!</em>”</span> </p> </blockquote>

Images

  • All <img> tags are required to have an alt attribute that uses prose to describe the image in detail; this is what screen reading software will be read aloud.

    • Describe the image itself in words, which is not the same as writing a caption or describing its place in the book.

    • Alt text must be full sentences ended with periods or other appropriate punctuation. Sentence fragments, or complete sentences without ending punctuation, are not acceptable.

    For example:

    <img alt="The illustration for chapter 10" src="...">

    <img alt="Pierre's fruit-filled dinner" src="...">

    <img alt="An apple and a pear inside a bowl, resting on a table." src="...">

    Note that the alt text does not necessarily have to be the same as text in the image’s <figcaption> element. You can use <figcaption> to write a concise context-dependent caption.

  • Include an epub:type attribute to denote the type of image. Common values are z3998:illustration or z3998:photograph.

  • For some images, it’s helpful to invert their colors when the ereader enters night mode. This is particularly true for black-and-white line art and woodcuts. (Note black-and-white, i.e. only two colors, not grayscale!) Include the se:image.color-depth.black-on-transparent semantic in the <img> tag’s epub:type to enable color inversion in some ereaders.

    For that sort of art, save the images as PNG files with a transparent background. You can make the background transparent by using the “Color to alpha” tool available in many image editing programs, like the GIMP.

  • <img> tags that are meant to be aligned on the block level should be contained in a parent <figure> tag, with an optional <figcaption> sibling.

    • If contained in a <figure> tag, the image’s id attribute must be on the <figure> tag.

  • Some sources of illustrations may have scanned them directly from the page of an old book, resulting in yellowed, dingy-looking scans of grayscale art. In these cases, convert the image to grayscale to remove the yellow tint.

Complete HTML and CSS markup examples

/* If the image is meant to be on its own page, use this selector... */ figure.full-page{ margin: 0; max-height: 100%; page-break-before: always; page-break-after: always; page-break-inside: avoid; text-align: center; } /* If the image is meant to be inline with the text, use this selector... */ figure{ margin: 1em auto; text-align: center; } /* In all cases, also include the below styles */ figure img{ display: block; margin: auto; max-width: 100%; } figure + p{ text-indent: 0; } figcaption{ font-size: .75em; font-style: italic; } <figure id="image-10"> <img alt="An apple and a pear inside a bowl, resting on a table." src="../images/image-10.jpg" epub:type="z3998:photograph"/> <figcaption>The Monk’s Repast</figcaption> </figure> <figure class="full-page" id="image-11"> <img alt="A massive whale breaching the water, with a sailor floating in the water directly within the whale’s mouth." src="../images/image-11.jpg" epub:type="z3998:illustration"/> <figcaption>The Whale eats Sailor Jim.</figcaption> </figure>

List of Illustrations (the LoI)

If an ebook has any illustrations that are major structural components of the work (even just one!), then we must include an loi.xhtml file at the end of the ebook. This file lists the illustrations in the ebook, along with a short caption or description.

An illustration is a major strucutral component if, for example: it is an illustration of events in the book, like a full-page drawing or end-of-chapter decoration; it is essential to the plot, like a diagram of a murder scene or a map; or it is a component of the text, like photographs in a documentary narrative.

Illustration that are not major structural components would be, for example: drawings used to represent a person's signature, like an X mark; inline drawings representing text in alien languages; drawings used as layout elements to illustrate diagrams.

If the image has a <figcaption> element, then use that caption in the LoI. If not, use the image’s alt tag, which should be a short prose description of the image used by screen readers.

Links to the images should go directly to their IDs, not just the top of the containing file.

The code below is the template for a basic LoI skeleton. Please copy and paste the entire thing as a starting point for your own LoI:

<?xml version="1.0" encoding="UTF-8"?> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" epub:prefix="z3998: http://www.daisy.org/z3998/2012/vocab/structure/, se: http://standardebooks.org/vocab/1.0" xml:lang="en-GB"> <head> <title>List of Illustrations</title> <link href="../css/core.css" rel="stylesheet" type="text/css"/> <link href="../css/local.css" rel="stylesheet" type="text/css"/> </head> <body epub:type="backmatter"> <section id="loi" epub:type="loi"> <nav epub:type="loi"> <h2 epub:type="title">List of Illustrations</h2> <ol> <li> <a href="../text/preface.xhtml#the-edge-of-the-world">The Edge of the World</a> </li> <!--snip all the way to the end--> </ol> </nav> </section> </body> </html>