Purpose
The Programming Exercise Markup Language (PEML) is intended to be a simple, easy format for CS and IT instructors of all kinds (college, community college, high school, whatever) to describe programming assignments and activities. We want it to be so easy (and obvious) to use that instructors won't see it as a technological or notational barrier to expressing their assignments.
TL;DR: Wanna Jump In?
If you want to know more about PEML's design and motivations, read on below. Otherwise, if you just want to dive in, start with these links and come back when you want a deeper view:
- Tutorial: Get Started Writing PEML
- PEML Live! lets you edit PEML examples live in your browser
Looking for software to add PEML support to your application? Consider these:
PEML's Goals
We intend for this format to be something that authors of automated grading tools can adopt, so they can provide a very easy, low-energy onboarding path for existing instructors to get programming activities into such tools. As a result, this notation leans heavily on supporting authors and streamlining common cases, even if this may require more work on the part of tool developers--the goal is to make it super easy for authors of programming activities, not to fit into a specific auto-grader or simplify tasks for tool writers.
PEML is designed to achieve the following goals:
- Minimal learning curve
- Plain-text file representation
- Supports references to external resources
- Directory-structured organization of associated assets
- Zip file packaging of multi-file assets with description
- Programming language neutral
- Minimal technology support
For more information about PEML's goals and influences, read about:
Basic Format
The remainder of this description is split into two main parts: first, the format for describing key/value pairs (in this section), and second, the data model (on the following pages). We view these two as independent. As indicated in the Why Not YAML?, we view the data described for a programming assignment as directly representable in PEML, YAML, JSON, etc. We also expect that most tools will support either YAML or JSON directly for tooling purposes, and that conversions between PEML <=> YAML or PEML <=> JSON will be easy (In fact, we already have a REST service that will do it for you!). So users who strongly prefer an alternate notation can probably freely use one. However, we strongly believe that a representation optimized for human authoring of structured text consisting primarily of many multi-line text values is warranted to make authoring easier for those who don't think/write in YAML or JSON regularly.
OK, on to the format itself.
PEML uses a plain-text representation for describing exercises. This format is designed to be easy to edit in a plain text editor. It is based on ArchieML, with a few minor modifications.
Key/Value Pairs
Like YAML, we describe a programming exercise as a series of key/value pairs. Wow, big deal.
In YAML terms, that means the top-level structure of an exercise is a mapping (a hash or dictionary).
Keys are alphanumeric identifiers (starting with a letter, and including underscores). This is more restrictive than YAML, but the more general idea of allowing any representable value to be a key has little utility here and requires more careful parsing and fancier quoting rules that only decrease writability and increase the potential learning curve ... so, PEML uses the simpler notion that is common in many programming language identifier token classes. Note that periods can be used to form dotted names to refer to nested keys, as in ArchieML.
Also as in ArchieML, each key must start at the beginning of a line and be followed by a colon (for single-valued keys; keys that map to collections will instead be either: (a) surrounded by square brackets, or (b) surrounded by curly braces, still following ArchieML).
The corresponding value follows the colon. All values are potentially multi-lined values, and extend up to the beginning of the next property. Any leading/trailing white space is trimmed (including newlines), and multi-line values (i.e., those containing embedded newline(s) after trimming) are automatically terminated with a single newline. As a result blank lines can appear immediately before any key (or before any unquoted value) for visual spacing/chunking as desired without affecting the meaning.
Like ArchieML, PEML is intended to be parsed line by line, with the first non-whitespace sequence on the line determining its role. A simple, line-oriented parsing strategy using a basic state machine should be sufficient, without requiring complex grammar-based parsing strategies.
Comments
PEML allows single-line comments using the #
character,
as in YAML. The # character must be the first (non-whitespace) character
on the line (i.e., only whole-line comments are supported), and the
corresponding line is completely ignored for the purposes of interpreting
the meaning of the PEML. Any line beginning with a # character (and any
leading indentation) is interpreted as a comment line, except in quoted
values.
Inspired by YAML's document start and end markers, PEML uses a specific comment line ("#---", a pound sign followed by three dashes) to signal the start of a PEML description. This marker is optional for the first PEML description in a text stream, but serves as the delimiter between exercises if multiple PEML descriptions are presented in a single file or stream. The current PEML description continues until the next occurrence of this marker (signaling the beginning of a new exercise), or the end of input.
Quoting
On occasion, one may end up including text as part of a value that
might also be recognizable as the start of a key. You can see this where
the word "format:
" appears in the example above, as part
of the value given for the key "instructions:
". In those
cases, PEML uses a variant of
HereDoc-style
syntax, adapted to be more like triple quotes in languages like Python,
Scala, R, etc.:
Any key where the colon is immediately followed by three or more repetitions of the same printing character is treated as having a HereDoc-style quoted value, with the provided sequence of repeated characters serving as the delimiter. This is more flexible than triple-quoting, since triple quotes themselves may appear in program fragments for exercises using particular programming languages. This technique allows authors to choose a custom delimiter (as with HereDocs), but allows them to use repeated punctuation symbols to provide a more identifiable/scannable horizontal delimiter around the value, rather than using a custom identifier.
As with HereDocs in many programming languages, the quoted value is terminated by the first subsequent occurrence of a line containing only the delimiter character sequence.
Of course, many programming languages also use #
as a
comment character.
In PEML, #
has no special meaning inside a quoted value.
As a result, we recommend HereDoc-quoting any values that contain source
code from such a programming language, to prevent a program's comment lines
from being interpreted as PEML comments.
Embedding Markdown (and HTML)
Special formatting in the textual description of the exercise can be written using Markdown, which also supports embedding HTML directly in exercise descriptions. So use Markdown or HTML for adding formatting to your text. Plain, unformatted text also works, when no special formatting markup is desired. Here, we specify git's flavor of markdown.
Note:
It is easy to consider adding a key for text_format:
,
specifying markdown as the default but allowing individual users to
use other markup formats (such as reStructuredText, AsciiDoc, POD,
LaTeX, etc.). In fact, this is already blocked out in the
options.text_format
field within the data model, although it
needs more refinement.
Another Note: Actually, by using
pandoc and a PEML parsing wrapper, it
should be possible to create a web service that can read
a PEML document using a wide variety of text markup formats and render
any of them to HTML, including reStructuredText,
many dialects of markdown, many wiki markup languages, Docbook, LaTeX,
and even Microsoft Word docx files (!). Unfortunately, this doesn't
address AsciiDoc :-(. At this point, it is plausible to consider supporting
other markup formats along with the options.text_format:
key if community effort can generate the necessary support for adopting
tools to make use of it (i.e., adding support under the render
option of our PEML REST
micro-service).
Additional Structuring Features
External Resources
External resources might be referenced in two different ways in PEML.
First, for any key/value pair, the value to the right of the colon
can be provided by using an external reference, rather than by
providing the value directly in the PEML file. Values that are provided
externally can be expressed as absolute or relative URLs using the
"url(...)
" construct (similar to its use in CSS).
While we strongly discourage the use of PDF assignment descriptions, any key value can be farmed out into an external file (or directory of files!). This approach might be most used for source code content stored in separate files, test data stored in separate files, code libraries, and so on.
Here, an absolute URL would specify the web location of the resource, while a relative URL would be resolved relative to the location of the PEML file containing it. As discussed above under Design Goals, if a PEML description is packaged in a zip file so that other resources can be transferred along with it, relative URLs could be used to refer to other contents within the zip file. Similarly, PEML files stored on local disk could refer to local files stored adjacently, and PEML files stored in git repositories or other systems could use the same technique.
Second, it is likely that some embedded Markdown or HTML content
(such as the instructions
for the exercise) may include
HTML tags that use relative or absolute URLs. This may be appropriate for
referring to images, downloadable resources accessible to the student,
etc. While authors can use absolute URLs in these contexts, it may be
preferable in some circumstances to bundle those resources along with
the PEML description. By convention, we encourage authors to place
such files in a directory called public_html/
that is
located alongside the PEML file in the same folder, zip file, or
repository. Within Markdown or HTML keys, relative URLs that start
with "public_html/...
" will then be correctly resolved to
these resources. By adhering to this convention, tools can immediately
determine that external web-accessible resources must also be provided
and also be able to systematically rewrite URLs for user presentation.
Third, it is plausible that feedback generated when processing
author-provided reference tests may wish to use similar relative
URLs to point to images or other resources included as part of the
feedback. Again, any such resources should be placed in the
public_html/
folder.
Note: We are considering using
a "convention over configuration" approach to saying that when a PEML
exercise is bundled with files into a single zip, contents for
nested keys can be provided implicitly by placing files under relative
pathnames that mirror the key structure. Path segments that correspond
to array indices can be taken from the name
, title
,
or language
key (in that preferred order) of the dictionary
items inside the array, or numeric suffices can be used as path names.
This could be particularly useful for adding files in places where
file-based content is desirable (src
and environment
keys, for example, or anywhere a .files
nested key appears).
Avoiding the requirement that external files co-located with the PEML
description be explicitly declared inside the PEML is more desirable in
these cases. We need to work up some examples, though, and a more precise
description of what the implicit mapping is. A short example: a
public_html/
folder located alongside the PEML file can
implicitly be interpreted as the set of resources/files attached to the
instructions, without requiring a redundant line inside the PEML itself
saying: "public_html: url(public_html)
"
Convention Over Configuration
While all of the settings and resources associated with an exercise can
be directly embedded inside the PEML file itself, often it is easier to
provide file-like content as, well, separate files. While you can explicitly
list files as external references using the url(...)
operator,
it is often easier and simpler to just provide files "alongside" the
PEML description as separate files themselves. To support this, PEML uses
a convention for naming subdirectories to locate sets of files (which can
always be overridden using an explicit url(...)
expression as
the value for the file set).
For example, the instructions
in an exercise might need to
refer to external images. The PEML format also provides a
public_html
key to refer to a set of files that are intended
to be public web resources referenced in the instructions
.
Relative path names for images and links inside the instructions
refer to file resources in the public_html
file set. While
all of these files could be explicitly listed, if no public_html
key is provided in the PEML description, then by convention
the subdirectory with the same name as this key "public_html/
"
located alongside/adjacent to this PEML description (for example, packaged
in the same zip file, or located in the same directory/folder) is
assumed to contain the files in the public_html
file set.
Similarly, instead of specifying a src.starter.files
file set,
the author can just place files in the src/starter
subdirectory adjacent to the PEML source (for any key representing a file set
with a name ending in ".files", that suffix is omitted from the corresponding
directory name). In most cases, we envision that
authors will generally provide external resources by convention, rather than
explicitly specifying them. In cases where the same set of physical files
will be shared across multiple exercises, explicit url(...)
locations can be used to refer to shared file sets without a huge effort.
Placing these in a PEML fragment and using the :include
directive (discussed in the next section) may also be useful.
Splitting Up PEML Descriptions
In addition to allowing individual key values to be provided in external
files, PEML adds an :include
directive that allows parts of
the PEML description to be included from another external location. While
this directive is not strictly necessary, it might be used by some
authors to factor out repeated key/value pairs (for the license
,
author
, environment definitions, etc.) so they can be written
once and reused across multiple PEML descriptions without repeating the
content.
Another use for :include
is to allow an author to separate
out the definition of the test cases and test environment for an exercise
so they are placed in a separate file. This might be useful so that the
exercise description itself might be public/accessible, but the test cases
or grading criteria applied to the exercise are managed separately and
only available to some users.
String Interpolation with Variable Values
Note: it is possible that some tools may choose not to implement this feature, since it has to do with use of exercises as opposed to simply parsing PEML descriptions.
In some cases, authors may wish to write "parameterized" exercise descriptions where many instances of the exercise can be produced using different parameter values. For example, a parameterized exercise may allow for individualized or unique instances of the exercise to be programmatically generated on demand for each new user/student. To allow for tools that support such features, PEML allows for parameterized contents in instructions, tests, code, etc.
PEML uses mustache-compatible
notation for string interpolation, which is also compatible with a number
of templating systems. It is analogous to Ruby's #{...}
string
interpolation syntax, Python's string interpolation syntax, and similar to
using braces in string interpolation in Perl. For any exercise, the
author can use any desired number of user-defined variables, and
any occurrences of {{variable-name}}
in the title,
instructions, src
code assets, or test suites
will be substituted when an
instance of the exercise is needed.
Since different tools implementing PEML may use different templating implementations to achieve interpolation, extensions or variants of {{mustache}} syntax might be supported, so check your tool's documentation when in doubt.
PEML does not support escaping of literal "{{" and "}}" marking interpolated
values (although PEML-supporting tools may support custom notational
extensions that allow this, it isn't part of the PEML definition).
PEML authors
are then advised to ensure that if their instructions or code use
{{...}}
notation, they keep the variable names used for
substitution in
the PEML description disjoint from those appearing natively in the text.
Where necessary, use the
options.interpolation.delimiters
key to set the delimiters to something different (similar to the
mustache set
delimiter feature).
The
options.interpolation.enable
key can be used to enable/disable interpolation if necessary (default is
enabled, for tools that support this feature).
Nested Structure
Beyond these basics, nested properties follow Archie's conventions for dotted keys (nested key structure), object blocks, and arrays. The main differences here compared to ArchieML is the use of multi-line values by default, the use of a HereDoc/triple-quote hybrid rather than a specific end marker with escaping of special characters when a delimiter is necessary, and support for comments.
As in ArchieML, an array is signified with a key enclosed in square brackets
([...]
), and is terminated with a part of empty square brackets
([]
). ArchieML allows any trailing empty bracket pairs (or brace
pairs) at the end of the file to be omitted, but all closing array delimiters
have been included here for clarity.
As in YAML and JSON, structures can be arbitrarily nested in PEML. Array keys
that start with a period (.
) are used to indicate arrays
nested inside other arrays (from ArchieML).
When providing arrays, remember that PEML (like ArchieML) uses repeated occurrences of the first key provided for the first array item to mark where each new item starts, so which ever key is provided first should consistently be used to start each new item in the array.
Further details about nested mappings and sequences (and how they are terminated) are available in the ArchieML definition.
Side by Side
The (very brief) example shown above can be directly represented in JSON (or YAML):