Standard Ebooks

Metadata Manual

General principles

Metadata in a Standard Ebooks epub is stored in the ./src/epub/content.opf file. The file contains some boilerplate that you won’t have to touch, and a lot of information that you will have to touch as you produce your ebook.

You should follow the general structure of the content.opf file present in the tools ./templates/ directory. Don't rearrange the order of anything in there and you should be fine.

The <dc:identifier> element

The <dc:identifier> element contains the unique identifier for this ebook. That identifier is always the Standard Ebooks URL for that ebook, prefaced by url:.

<dc:identifier id="uid">url:https://standardebooks.org/ebooks/anton-chekhov/short-fiction/constance-garnett</dc:identifier>

Forming the SE identifier

The SE identifier is formed by the following algorithm. A string can be made URL-safe using the make-url-safe tool.

  1. Start with the URL-safe author of the work, as it appears on the titlepage. If there is more than one author, continue appending subsequent URL-safe authors, separated by an underscore. Do not alpha-sort the author name.

  2. Append a forward slash, then the URL-safe title of the work. Again, do not alpha-sort the title.

  3. If the work is translated, append a forward slash, then the URL-safe translator. If there is more than one translator, continue appending subsequent URL-safe translators, separated by an underscore. Do not alpha-sort translator names.

  4. If the work is illustrated, append a foreward slash, then the URL-safe illustrator. If there is more than one illustrator, continue appending subsequent URL-safe illustrators, separated by an underscore. Do not alpha-sort illustrator names.

  5. Finally, do not append a trailing forward slash.

The <dc:date>, <meta property="dcterms:modified">, and <meta property="se:revision-number"> elements

There are several elements in the metadata describing the publication date, updated date, and revision number of the ebook. Generally you don’t have to update these by hand; instead, use the prepare-release tool to update them automatically both in content.opf and in colophon.xhtml.

The ebook title

Usually titles are fairly easy to represent with the <dc:title> element.

Books without subtitles

These examples shows how to mark up a simple title like The Moon Pool or Chekhov’s Short Fiction.

<dc:title id="title">The Moon Pool</dc:title> <meta property="file-as" refines="#title">Moon Pool, The</meta> <dc:title id="title">Short Fiction</dc:title> <meta property="file-as" refines="#title">Short Fiction</meta>

Books with subtitles

This example shows how to mark up The Man Who Was Thursday: A Nightmare.

<dc:title id="title">The Man Who Was Thursday</dc:title> <meta property="file-as" refines="#title">Man Who Was Thursday, The</meta> <meta property="title-type" refines="#title">main</meta> <dc:title id="subtitle">A Nightmare</dc:title> <meta property="file-as" refines="#subtitle">Nightmare, A</meta> <meta property="title-type" refines="#subtitle">subtitle</meta> <dc:title id="fulltitle">The Man Who Was Thursday: A Nightmare</dc:title> <meta property="file-as" refines="#fulltitle">Man Who Was Thursday, The</meta> <meta property="title-type" refines="#fulltitle">extended</meta>

Books with a more popularly-known alternative title

Some books are commonly referred to by a shorter name than their actual title. For example, The Adventures of Huckleberry Finn is often simply known as Huck Finn.

<dc:title id="title-short">Huck Finn</dc:title> <meta property="title-type" refines="#title-short">short</meta>

Ebook subjects

The <dc:subject> elements allow us to categorize the ebook. We use the Library of Congress categories assigned to the book for this purpose.

If you’re working on a book that has a Project Gutenberg transcription, you can almost always find these categorizations in the ebook’s “bibrec” tab, saving you effort.

If your ebook doesn’t have a Project Gutenberg page, then you can search the Library of Congress catalog to find the categories for your ebook.

This example shows how to mark up the subjects for A Voyage to Arcturus:

<dc:subject id="subject-1">Science fiction</dc:subject> <dc:subject id="subject-2">Psychological fiction</dc:subject> <dc:subject id="subject-3">Quests (Expeditions) -- Fiction</dc:subject> <dc:subject id="subject-4">Life on other planets -- Fiction</dc:subject> <meta property="meta-auth" refines="#subject-1">https://www.gutenberg.org/ebooks/1329</meta> <meta property="meta-auth" refines="#subject-2">https://www.gutenberg.org/ebooks/1329</meta> <meta property="meta-auth" refines="#subject-3">https://www.gutenberg.org/ebooks/1329</meta> <meta property="meta-auth" refines="#subject-4">https://www.gutenberg.org/ebooks/1329</meta>

SE subjects

Along the Library of Congress categories, we include a custom list of SE subjects in the ebook metadata. Unlike Library of Congress categories, SE subjects are purposefully broad. They’re more like subject categories you’d find a medium-sized bookstore, as opposed to the precise, detailed, heirarchal Library of Congress categories.

It’s your task to select appropriate SE subjects for your ebook. Usually just one or two of these categories will suffice.

If you strongly feel like your book deserves a new category, please contact us to discuss it.

Below is a list of all of the recognized SE subjects:

Required subjects for specific kinds of ebooks

Ebook descriptions

An ebook has two kinds of descriptions: a short <dc:description> element, and a much longer <meta property="se:long-description"> element.

The short description

The <dc:description> element contains a short, single-sentence summary of the ebook.

The long description

The <meta property="se:long-description"> element contains a much longer description of the ebook.

The <dc:language> element

The <dc:language> element follows the long description block. It contains the IETF language tag for the language that the work is in. Usually this is either en-US or en-GB.

The <dc:source> elements

The <dc:source> elements represent URLs to sources for both the transcription we based this ebook off of, and page scans of the print sources used to correct or work on the transcriptions.

The <meta property="se:production-notes"> elements

This element can be used by the ebook producers to convey production notes relevant to the production process. For example, a producer might note that page scans were not available, so an editorial decision was made to add commas to sentences deemed to be transcription typos.

If there are no production notes, remove this element.

The <meta property="se:word-count"> and <meta property="se:reading-ease.flesch"> elements

These elements are automatically computed by the prepare-release tool. Don’t compute them by hand.

SE-specific metadata for the ebook

Next, Standard Ebooks also includes two additional custom metadata items about the ebook:

  1. <meta property="se:url.encyclopedia.wikipedia"> contains the Wikipedia URL for this ebook. If there isn’t one, remove this element.

  2. <meta property="se:url.vcs.github"> contains the GitHub URL for this ebook. This is calculated by taking the string “https://github.com/standardebooks/” and appending the ebook identifier (calculated above), without “https://standardebooks.org/ebooks/”, and with forward slashes replaced by underscores.

Author metadata

Next, we include the author metadata block.

The author metadata block always has the ID of “author”. If there is more than one author, the first author is “author-1”, the second “author-2”, and so on. Each block of the following, in this order:

  1. <dc:creator id="author">: the author’s name as it appears on the cover.

  2. <meta property="file-as" refines="#author">: the author’s name as filed alphabetically. Include this even if it’s identical to <dc:creator>.

  3. <meta property="se:name.person.full-name" refines="#author">: the author’s full name, with any initials or middle names expanded. If this is identical to <dc:creator>, remove this element.

  4. <meta property="alternate-script" refines="#author">: the author’s name as it appears on the cover, but transliterated into their native alphabet if applicable. For example, Anton Chekhov’s name would be contained here in the Cyrillic alphabet. Remove this element if not applicable.

  5. <meta property="se:url.encyclopedia.wikipedia" refines="#author">: the URL of the author’s Wikipedia page. Remove this element if not applicable.

  6. <meta property="se:url.authority.nacoaf" refines="#author">: the URL of the author’s Library of Congress Names Database page.

    • This is easily found by visiting the person’s Wikipedia page and looking at the very bottom in the “Authority Control” section, under “LCCN”.

    • If you can’t find it in Wikipedia, you can find it directly by visiting http://id.loc.gov/authorities/names.html.

    • Note that the canonical URLs do not include a trailing .html (the LoC site performs a silent redirect when you load it to append .html to the URL it considers canonical). Remove this element if not applicable.

  7. <meta property="role" refines="#author" scheme="marc:relators">: the MARC relator tag for the roles the author played in creating this book. You will always have one element with the value of aut. You can have additional elements for additional values, if applicable. For example, if the author also illustrated the book, you would include an additional element with the value of ill.

An example of a complete author metadata block

<dc:creator id="author">Anton Chekhov</dc:creator> <meta property="file-as" refines="#author">Chekhov, Anton</meta> <meta property="se:name.person.full-name" refines="#author">Anton Pavlovich Chekhov</meta> <meta property="alternate-script" refines="#author">Анто́н Па́влович Че́хов</meta> <meta property="se:url.encyclopedia.wikipedia" refines="#author">https://en.wikipedia.org/wiki/Anton_Chekhov</meta> <meta property="se:url.authority.nacoaf" refines="#author">http://id.loc.gov/authorities/names/n79130807</meta> <meta property="role" refines="#author" scheme="marc:relators">aut</meta>

Translator metadata

If the work is translated, the translator metadata block follows.

The translator metadata block always has the ID of “translator”. If there is more than one translator, the first translator is “translator-1”, the second “translator-2”, and so on. Each block is identical to the author metadata block, but using <dc:contributor id="translator"> instead of <dc:creator>. The MARC relator tag will be trl. Translators often annotate the work; if this is the case, also include the MARC relator tag ann.

Illustrator metadata

If the work is illustrated by a person who is not the author, the illustrator metadata block follows.

The illustrator metadata block always has the ID of “illustrator”. If there is more than one author, the first illustrator is “illustrator-1”, the second “illustrator-2”, and so on. Each block is identical to the author metadata block, but using <dc:contributor id="illustrator"> instead of <dc:creator>. The MARC relator tag will be ill.

Cover artist metadata

The cover artist metadata block follows.

The cover artist metadata block always has the ID of “artist”. Each block is identical to the author metadata block, but using <dc:contributor id="artist"> instead of <dc:creator>. The MARC relator tag will be art.

Transcriber metadata

If you based this ebook on a transcription by someone else, like Project Gutenberg, then transcriber blocks follow. The first transcriber is “transcriber-1”, the second “transcriber-2”, and so on. Usually, trancribers only have the following two elements:

  1. <meta property="file-as" refines="#transcriber-1">

  2. <meta property="role" refines="#transcriber-1" scheme="marc:relators"> with the value of trc.

Producer metadata

This block is for information about you, the producer of this Standard Ebook. It contains the same type of elements as the author block, but with <dc:contributor id="producer-1">.

  1. You can include the <meta property="se:url.homepage" refines="#producer-1"> element with a link to your personal homepage. This must be a link to a personal homepage only; no products, services, or other endorsements, commercial or otherwise.

  2. Your MARC relator roles will usually be the following:

    • bkp: you are the producer of this ebook.

    • blw: you wrote the blurb (the long description).

    • cov: you selected the cover art.

    • mrk: you wrote HTML markup for this ebook.

    • pfr: you proofread the ebook.

    • tyg: you reviewed the typography of the ebook.

The <manifest> element

The <manifest> element is a required part of the epub spec. This should usually be generated by the print-manifest-and-spine tool and copy-and-pasted into the content.opf file. It must be in alphabetical order, which is handled for you by the print-manifest-and-spine tool.

The <spine> element

The <spine> element is a required part of the epub spec that defines the reading order of the files in the ebook. You can use the print-manifest-and-spine tool to generate a draft of the spine. The tool makes a best guess as to the spine order, but it cannot be 100% correct; please review the output and adjust the reading order accordingly.