aboutsummaryrefslogtreecommitdiffstats
path: root/README.md
blob: 44425e856238369c680a2ee32efe411fa56d4778 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# Market

Market is a dialect of Markdown written in portable Scheme. It is designed
to be easy to extend and understand.

It is not a [CommonMark][1] parser. For a Scheme CommonMark parser, see
[Guile CommonMark][2].

[1]: https://spec.commonmark.org/
[2]: https://github.com/OrangeShark/guile-commonmark/

## Block Syntax

Market uses [significant indentation][3] to make documents more regular and
to make parsing easier.

[3]: https://en.wikipedia.org/wiki/Off-side_rule

Market parse line by line, similar to CommonMark. It also parses blocks
before inline syntax. To parse blocks:

1. The beginning of each line is checked for the correct indentation
   recursively from the top of the document tree. (Unlike other Markdowns,
   *lazy lines are not allowed*.) Indentation is usually space, but can
   include `>`, `|`, etc.
2. The rest of the line is checked for a block start.
3. If there is no block start, then it is a paragraph line.

In practice, each block has a `continues` procedure that checks if the
current fragment of the line continues the block. If it does, then the
prefix of the line is consumed and the process continues with the active
child block of the block. After going as far possible, the line is then
checked with a `starts` procedure (stored in each block) to see if any
new block can be started.

Todo: add no-paragraph blocks?

## Inline Syntax

(This section is not yet implemented.)

CommonMark's rules for inline syntax are very complex, and the authors of
it know:

> By far the trickiest part of inline parsing is handling emphasis,
> strong emphasis, links, and images.

The inline matcher for Market has a list of inline syntax and the action
to be taken after reading each inline syntax.  Upon reading an inline
syntax (disambiguated using greedy match), the stored procedure is then
called with the rest of the line as an argument.

The parser treats `*`, `**`, and `***` as three separate cases. Combining
this with the greedy-match rule causes the following behavior:

1. Starting with `***` and mixing `**` and `*` is not allowed. Example:

       ***bold and italic** just italic*

   is not allowed.
2. Contrary to the [original markdown syntax][4], `*` and `_` can be
   combined. This allows for something like the above pattern:

       _**bold and italic** just italic_

   or

       *emphasized text _with emphasis in it_ and out of it*

   [4]: https://daringfireball.net/projects/markdown/syntax#em