block structure

author: Peter McGoron 2025-01-25 15:37:19 -0500
committer: Peter McGoron 2025-01-25 15:37:19 -0500
commit: b0e41b9af1f9ce89fe6ef90686abd1dac6b86c10 (patch)
tree: a162e0445b007d9e9fdc117a702cfd58e136cd27 /README.md
1 files changed, 71 insertions, 0 deletions
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..44425e8
--- /dev/null
+++ b/README.md
@@ -0,0 +1,71 @@
+# Market
+
+Market is a dialect of Markdown written in portable Scheme. It is designed
+to be easy to extend and understand.
+
+It is not a [CommonMark][1] parser. For a Scheme CommonMark parser, see
+[Guile CommonMark][2].
+
+[1]: https://spec.commonmark.org/
+[2]: https://github.com/OrangeShark/guile-commonmark/
+
+## Block Syntax
+
+Market uses [significant indentation][3] to make documents more regular and
+to make parsing easier.
+
+[3]: https://en.wikipedia.org/wiki/Off-side_rule
+
+Market parse line by line, similar to CommonMark. It also parses blocks
+before inline syntax. To parse blocks:
+
+1. The beginning of each line is checked for the correct indentation
+   recursively from the top of the document tree. (Unlike other Markdowns,
+   *lazy lines are not allowed*.) Indentation is usually space, but can
+   include `>`, `|`, etc.
+2. The rest of the line is checked for a block start.
+3. If there is no block start, then it is a paragraph line.
+
+In practice, each block has a `continues` procedure that checks if the
+current fragment of the line continues the block. If it does, then the
+prefix of the line is consumed and the process continues with the active
+child block of the block. After going as far possible, the line is then
+checked with a `starts` procedure (stored in each block) to see if any
+new block can be started.
+
+Todo: add no-paragraph blocks?
+
+## Inline Syntax
+
+(This section is not yet implemented.)
+
+CommonMark's rules for inline syntax are very complex, and the authors of
+it know:
+
+> By far the trickiest part of inline parsing is handling emphasis,
+> strong emphasis, links, and images.
+
+The inline matcher for Market has a list of inline syntax and the action
+to be taken after reading each inline syntax.  Upon reading an inline
+syntax (disambiguated using greedy match), the stored procedure is then
+called with the rest of the line as an argument.
+
+The parser treats `*`, `**`, and `***` as three separate cases. Combining
+this with the greedy-match rule causes the following behavior:
+
+1. Starting with `***` and mixing `**` and `*` is not allowed. Example:
+
+       ***bold and italic** just italic*
+
+   is not allowed.
+2. Contrary to the [original markdown syntax][4], `*` and `_` can be
+   combined. This allows for something like the above pattern:
+
+       _**bold and italic** just italic_
+
+   or
+
+       *emphasized text _with emphasis in it_ and out of it*
+
+   [4]: https://daringfireball.net/projects/markdown/syntax#em
+
author	Peter McGoron	2025-01-25 15:37:19 -0500
committer	Peter McGoron	2025-01-25 15:37:19 -0500
commit	b0e41b9af1f9ce89fe6ef90686abd1dac6b86c10 (patch)
tree	a162e0445b007d9e9fdc117a702cfd58e136cd27 /README.md