Dots in metadata

David Mundie's Avatar

David Mundie

05 May, 2013 02:18 PM

It appears that multimarkdown doesn't handle metadata with dot-notation as expected. This is necessary for using the Dublin Core vocabulary.

My test case:

Calibre_Test (waterthrush): !cat
cat Test2.md
Title: Calibre Test
Author: David Mundie
Language: en
Publisher: David Mundie
DC.Language: EN
Another line of metadata: testing
DC.date.publication: 1001-01-01

# Dot Notation in Multimarkdown Metadata

This is a test
---
Calibre_Test (waterthrush): multimarkdown Test2.md
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8"/>
<title>Calibre Test</title>
<meta name="author" content="David Mundie"/>
<meta name="language" content="en"/>
<meta name="publisher" content="David Mundie
DC.Language: EN"/>
<meta name="anotherlineofmetadata" content="testing
DC.date.publication: 1001-01-01"/>
</head>
<body>

<h1 id="dotnotationinmultimarkdownmetadata">Dot Notation in Multimarkdown Metadata</h1>

<p>This is a test</p>

</body>
</html>

  1. 1 Posted by David Mundie on 05 May, 2013 06:29 PM

    David Mundie's Avatar

    Update: I see that the specification says the metadata key "can consist of letters, numbers, spaces, hyphens, or underscore characters", so if "can" means "must", then the behavior is up to spec. In that case I'd like to submit a change request to allow periods in the metadata key. I can't think of any reason why the range of acceptable characters needs to be so constrained. Being able to include Dublin Core metadata in an HTML file is not exactly an exotic use case.

    In fact, it's worse than I thought. When the spec says "letters", it apparently means "7-bit ASCII characters". Other Unicode characters, even accented Latin letters like "á" or "ó", are not allowed. I don't think that's acceptable in 2013.

  2. Support Staff 2 Posted by Fletcher on 06 May, 2013 11:14 AM

    Fletcher's Avatar

    David,

    Thanks for writing. I'll look at this and see what the ramifications would be across formats and consider it for the future.

  3. 3 Posted by David Mundie on 06 May, 2013 12:43 PM

    David Mundie's Avatar

    OK. I would be interested in knowing where the choice of character sets for metadata keys came from. Is this a LaTeX limitation? I would have expected Dublin Core to be the most common use case for metadata in HTML files.

    Much as I dislike user parameterization as a way to solve design problems, I would find "Metadata Type: Dublin Core" and "Metadata Type: LaTeX" metadata entries perfectly acceptable. Much better than doing without accented characters, anyway.

  4. Support Staff 4 Posted by Fletcher on 06 May, 2013 11:21 PM

    Fletcher's Avatar

    The limitations arose from valid metadata attributes in html. I don't believe it was a latex limitation.

    I don't recall whether "." characters used to be invalid, but now are allowed in HTML or whether that was an over-restrictive limitation as they do appear to be valid at present.

  5. 5 Posted by David Mundie on 07 May, 2013 01:19 AM

    David Mundie's Avatar

    Interesting. Using the W3C validator as much easier than looking in the specs, it looks like "." has been allowed in the name attribute at least since 4.01, but that accented characters have only been allowed since XHTML.

    I note that many Dublin Core attributes are specified as permitted values in the HTML5 metadata names list (http://wiki.whatwg.org/wiki/MetaExtensions).

  6. Support Staff 6 Posted by Fletcher on 09 May, 2013 07:02 PM

    Fletcher's Avatar

    MMD 4 now supports "." --- this was actually a bug in the PEG parser of MMD 3 when translated from the Perl.

    As to Unicode characters --- how many metadata keys do you use that have non ASCII characters in them? This would seem to me to be fairly rare.

    F-

  7. Fletcher closed this discussion on 12 May, 2013 03:18 PM.

Comments are currently closed for this discussion. You can start a new one.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac