About PEML


The Programming Exercise Markup Language (PEML) is designed to provide an ultra-human-friendly authoring format for describing automatically graded programming assignments.


The Programming Exercise Markup Language (PEML) is intended to be a simple, easy format for CS and IT instructors of all kinds (college, community college, high school, whatever) to describe programming assignments and activities. We want it to be so easy (and obvious) to use that instructors won't see it as a technological or notational barrier to expressing their assignments.

We intend for this format to be something that authors of automated grading tools can adopt, so they can provide a very easy, low-energy onboarding path for existing instructors to get programming activities into such tools. As a result, this notation leans heavily on supporting authors and streamlining common cases, even if this may require more work on the part of tool developers--the goal is to make it super easy for authors of programming activities, not to fit into a specific auto-grader or simplify tasks for tool writers.

OK, so this is a new notation. That sucks. But there doesn't seem to be an obvious alternative that meets our goals, so ...

Also, in terms of scope of "programming activity", here we are trying to capture everything from very small programming activities, such as "here's a very short function, fill in the blank to make it work", to very large programming activities, such as "write dozens of classes spanning thousands of lines of code to implement this programming language", or whatever. Yes, that is a very broad range, but the aim is to cover the full range, rather than making simplifying assumptions that limit PEML to a narrower subset. The reason for using "Exercise" in the name of the language is to remind developers that it covers a broader range of activities than what many instructors see as pure "programming assignments".

Why not YAML? or JSON?

Actually, one of our design goals is for PEML descriptions to be directly mappable to YAML and JSON, so that people who prefer one of those notations (or tools that use those notations) can use the data model. Converting PEML to YAML or JSON should be a direct/easy tool translation that we expect will be provided as a service and/or library at some point.

But why not just use YAML (or JSON) directly, since parsers already exist?

The main reasons are:

  1. These existing formats are not writer-friendly enough for descriptions that contain large amounts of free-form text. Face it, programming activities usually require large-ish chunks of multi-line text to describe most of the interesting properties, whether you're talking about the specification for a program, or starter code to provide to a student, or reference tests to check a solution, or a sample/reference solution to provide for other instructors to look at, or whatever. In most cases, PEML is not about simple key/value pairs where values are small pieces of data, or about deeply structured nested object descriptions. It's about writer-friendly input of structured text where most of the values are multi-line text written by humans.

  2. Are not syntax-friendly enough to present a minimal-effort entry path. Of course, this isn't an obstacle for programming-oriented instructors who have already used YAML (or JSON) and are familiar with working with them. However, the sticky bits of those formats do require a small learning curve that we hope to minimize further.

    In particular, YAML's reliance on whitespace/indentation to indicate nested structure can pose an obstacle to more free-form text input. JSON's reliance on JavaScript quoting and lack of true multi-line values makes it challenging for these tasks in similar ways.

Of course, other practitioners have made the same complaints about both YAML and JSON before, so "more human-friendly" alternatives to both formats have been proposed by others (see Influences below). Naturally, we were unable to find an existing format that exactly met the needs here ... If you happen to know of one we've overlooked that would be a better fit, please say so! What we are aiming for is a format that allows instructors to copy-and-paste content from existing sources (existing assignment writeups, program solutions, program text, whatever) with minimal additional reformatting work in order to create a readable, streamlined programming exercise representation.

Design Goals

OK, with the context for this description language out of the way, here are the specific objectives we are hoping to achieve:

  1. Minimal learning curve: To paraphrase Cay Horstmann, we are aiming for a format that an average computing instructor can learn in under an hour. Further, after a little practice, we hope that an instructor who has an existing assignment can convert or map it to this representation in less than 10 minutes. We want the syntax to be super simple/direct, so that instructors can focus just on the specific properties they are describing.

  2. Plain-text file representation: PEML uses a plain-text format so that it is easy to edit with any text editor. As mentioned above, we want instructors to be able to directly copy plain-text content, including relevant code assets or instructions, right into the PEML representation. For straightforward assignments that do not require any external resources or a special execution enviroment setup, the entire description can be written in one simple text file. Only instructors with more advanced situations or requirements need to use PEML's facility to connect with external resources.

  3. Reference to external resources: While the central representation for an exercise is a text file, we recognize than instructors may run into situations where one or more supporting resources are also necessary for a given exercise (such as custom data files, a special library, existing PDF documents, etc.). PEML provides a clean way for specifying values through relative and absolute URLs. These URLs may refer to remote resources (located online, including in git repositories or other locations, docker containers, etc.), or local resources located alongside the PEML file.

  4. Directory structure: While exercises described as a single text file are easy to transport and process, sometimes it may be useful for an exercise to refer to external resources stored locally. This can be done by including relative URLs in the exercise description (see below). In this situation, we can view the subdirectory containing the PEML file as the root of an exercise description, with relative URLs referring to other file paths within that subdirectory (or subdirectory tree). This allows a single PEML file along with its associated local resources to be managed as a single local entity.

  5. Zip file packaging: Similarly, while a PEML file plus its associated local resources can be represented as a subdirectory (or subdirectory tree), they can also be packaged into a zip file. The PEML file should be in the root of the zip file's internal directory structure, named exercise.peml. Relative URLs inside the PEML file will then be interpreted within the ZIP relative to the location of the PEML file (i.e., relative to the ZIP's root). This makes it easy to zip a subdirectory representing an exercise to make it easy to transport or upload the PEML file and all of the associated local resources, and also makes it easy to unzip a packaged exercise to produce a subdirectory representation.

  6. Programming language neutral: PEML should support descriptions of programming activities in any programming language, rather than being specific to just one. Some fields/assets within one exercise description will naturally be programming-language-specific, but the notation used to describe the exercise itself should not be.

  7. Minimal technology support: Basic PEML descriptions do not require the use of any specific supporting technologies to manage build environments, execution environments, or external dependencies. This follows from the goal to use a simple text file representation (with no external resources) for basic programming activities similar to those found in many textbooks. We believe simple, low overhead assignments will be the most common case. However, we also realize that some exercise authors may prefer to use specific tools or technologies to package or compartmentalize execution environment features, supporting libraries, run-time dependencies, build environments, custom testing tools, etc. As a result, PEML is set up to allow such services to be used by exercise authors who desire it, but they are not required for mainstream (or simple) exercise descriptions. In other words, instructors who use "vanilla" assignments without any special tooling or infrastructure should be able to hop in and describe their existing exercises in PEML without having to learn about new tools or technologies.


In terms of inspiration, PEML has been influenced heavily by YAML and its relatives (including HAML), as well as many alternative notations that have been developed as alternatives for describing structured textual data.

One of the most prominent influences has been ArchieML, another format with some overlapping goals used by the New York Times to write certain types of online content. PEML reuses big chunks of ArchieML's design.

The Awesome JSON page lists a nice collection of extensions to or alternatives to JSON that address some of the shortcomings of using it as a human-authored notation, including CSON, MSON, and HOCON. Other languages like TOML and YAML variants have also influenced this design.

Finally, the data model and some of the packaging ideas have been influenced by the work of an ITiCSE 2008 working group on the topic, which produced this report: Developing a Common Format for Sharing Programming Assignments.