PEML Exercise Data Model

 

The Programming Exercise Markup Language (PEML) is designed to provide an ultra-human-friendly authoring format for describing automatically graded programming assignments.

Data Model Schema for PEML Exercises

This page presents the data model for PEML. While PEML is its own notation, the data model's structure is also described in the form of a JSON Schema for PEML:

http://cssplice.github.io/peml/schemas/PEML.json

Even though PEML uses its own notation, the data model's structure can easily be mapped into JSON or YAML, and a JSON Schema provides a program-checkable way of expressing the intended data model structure. Snippets from the schema are included below to show the definitions for each key/value field in PEML. PEML fields that have their own substructure are described separately as building blocks in the definitions of recurring model elements.

The main attributes of a PEML exercise description are broken down into three groups:

Required keys

Recommended keys

Optional keys

Keys under development

It is important to note that PEML allows the use of additional keys beyond those described here, which may be custom-supported by specific tools. THe list of keys described here is intended to provide a common vocabulary that can be used by many tools for representing programming exercises, to facilitate authoring, importing, and exporting these exercises. Some keys may relate to features or content that is not supported in every tool, but the goal is to streamline the ability of instructors (or "people" in general) to get exercises into (and potentially out of) educational tools.

Required Keys

Required keys must be present in each PEML description. We keep these to a minimum. However, to promote some aspects of interoperability and data management, considering these required elements on every exercise will help authors keep information organized when it is imported into tools.

exercise_id required: string

Schema:

"exercise_id": { "type": "string", "minLength": 1, "pattern": "^[^\s]+$" }

The exercise_id is a globally unique, human-written identifier created by the exercise author to uniquely identify this exercise on any system. Any non-empty sequence of non-whitespace unicode characters can be used. We imagine that authors might construct these identifiers in similar ways to programming package names or URLs. For example, the following ID includes a university and course identifier around an exercise name that is presumed to be unique within that context. By combining the context and the name, a globally unique identifier can be formed:

PEML example:

exercise_id: edu.vt.cs.cs1114.palindromes

By convention, exercises should start with the exercise_id key first, so that multiple exercises can be concatenated together in a single file but still sliced/parsed as separate exercises easily.

The purpose of the exercise_id is to serve as an external identifier that tools (and people) can use to determine whether two PEML descriptions describe "the same thing". Exercises with different ids should be considered as distinct entities, rather than as different versions of the "same thing". The version key is used to identify basic version info for long-lived exercises.

When a user externally edits an exercise representation and re-imports it into a tool, or exports an exercise from one tool to use in another context, tools can use the exercise_id to determine whether imported information is an update to an existing artifact or whether it defines a new exercise.

title required: string

Schema:

"title": { "type": "string", "minLength": 1 }

The title is a string name or title used as a human-readable label for the exercise. The intent is for this to be the "title" shown to students in various contexts, either when viewing a single exercise or when viewing lists of exercises. While there is no specific length limit, ideally titles should be no more than "one line" in size, because of the various contexts where they might be displayed. How much of the title is displayed (or truncated) when collections of exercises are shown is tool-dependent.

PEML example:

title: Palindromes (A Simple PEML Example)

author required: string or object

Schema:

"author": {
  "oneOf": [
    { "$ref": "#/definitions/email_address" },
    {
      "type": "object",
      "required": ["email"],
      "properties": {
        "email": { "$ref": "#/definitions/email_address" },
        "name": { "$ref": "#/definitions/nonempty_string" }
      }
  ]
}

The author tag is used to identify the author of the exercise (or at least of the PEML exercise description). Recommended practice is to identify the author by a unique e-mail address. This field can be used to provide a single e-mail address. Alternatively, the key authors can be used to provide an array of multiple authors.

If the optional license key is provided and the license.owner is the same as the author, then the author key can be omitted.

In addition to providing an email address, the author tag can use sub-keys to specify both an email address and an author name.

PEML example:

author: edwards@cs.vt.edu

PEML example:

# Providing name and email (email is always required)
author.name: Stephen Edwards
author.email: edwards@cs.vt.edu

PEML example:

# Multiple authors, specifying email only:
[authors]
* edwards@cs.vt.edu
* ayaan@vt.edu
[]

# Multiple authors with names:
[authors]
name: Stephen Edwards
email: edwards@cs.vt.edu
name: Ayaan Kazerouni
email: ayaan@vt.edu
[]

instructions recommended: string

Schema:

"instructions": { "type": "string" }

The instructions is where you can provide the exercise's instructions for the student describing the task to complete. This is the meat of the "assignment" or "exercise" in many cases. The value associated with this key is a string, but probably a long one. As with any key/value pairs in PEML, quoting can be used. Instructions written in Markdown (or, as a subset, vanilla HTML) are useful, and some tools may support other markup formats.

PEML example:

instructions:----------
Write your full assignment instructions here. Inline text instead of
a separate PDF resource is preferred.
...
----------

Note that some educational tools that support PEML may not use the instructions, or may expect that the instructions are provided through some means external to the tool. In these cases, the instructions field can be omitted, although in practice either instructions (describing the assignment) or suites (describing how a solution would be tested, either as a top-level key or nested inside one of the systems supported) are required.

If in a specific situation the exercise's instructions are intended to be accessible through a course management system, an instructor-provided website, or some other mechanism, a URL can be used:

PEML example:

instructions: url(https://canvas.myschool.edu/courses/12345/assignments/12345)

While assignments can be farmed out into external files or web pages in this way, we strongly discourage the use of PDF assignment descriptions as limiting the value/utility of a PEML resource. However, in many cases that may be the fastest/cleanest way for an author to get started, and who then may move more into embedding markup in PEML descriptions on future assignments as time permits.

systems recommended: object

The systems key maps to an array of nested dictionaries (objects) that describe the programming language(s) or system(s) in which the exercise can be conducted. The full definition of what is included under systems is described in the code data model.

version recommended: object

Schema:

"version": { "type": "object" }

Just as in YAML or JSON, a PEML description represents a set of key/value pairs (a dictionary, hash, or map, also called an "object" in JSON terms), where keys can map to nested structured values. In PEML, dotted names represent nesting structure. The version key maps to a nested dictionary (object) that identifies the version of this exercise that is described. Many authors may use forms of version control to manage their sources (which is recommended), so fields under version can be provided to capture access paths to an exercise description's version history.

PEML example:

version.timestamp: 2018-08-25T15:23:22.635-05:00
version.type: git
version.id: 2ab880a
version.repository.url: url(https://github.com/CSSPLICE/peml.git)
version.repository.path: url(test/peml/palindrome.peml)

Some tools may be able to deduce the type, repository, id, and location all from a single URL, such as with direct URLs to files on github.com. In such a situation, only the full location URL needs to be specified:

PEML example:

version.timestamp: 2018-08-25T15:23:22.635-05:00
version.repository: url(https://github.com/CSSPLICE/peml/blob/master/test/peml/palindrome.peml)

version.type optional: string

Schema:

"id": { "type": "string", "minLength": 1 }

The version.type captures the kind of version control system or repository format used for this PEML description's version history. Examples include: git, mercurial, CVS, Visual SourceSafe, etc.

PEML example:

version.type: git

version.id optional: string

Schema:

"type": { "type": "string", "minLength": 1 }

The version.id is intended as a way to identify the commit within the repository holding this description's contents. It could be a tag, a branch name, a version number, a commit hash, etc. It's exact meaning is dependent on the nature of the version control system being used.

PEML example:

version.id: 2ab880a

version.timestamp recommended: string

Schema:

"timestamp": {
  "type": "string",
  "minLength": 1,
  "pattern": "^(?:[1-9]\\d{3}-(?:(?:0[1-9]|1[0-2])-(?:0[1-9]|1\\d|2[0-8])|(?:0[13-9]|1[0-2])-(?:29|30)|(?:0[13578]|1[02])-31)|(?:[1-9]\\d(?:0[48]|[2468][048]|[13579][26])|(?:[2468][048]|[13579][26])00)-02-29)T(?:[01]\\d|2[0-3]):[0-5]\\d:[0-5]\\d(?:\\.\\d{1,9})?(?:Z|[+-][01]\\d:[0-5]\\d)$",
}

The version.timestamp is a human-readable timestamp indicating the time at which this version of the exercise was last modified. For lack of a better option, at the moment this should be an RFC 3339/ISO 8601 UTC timestamp (if you know of something more user-friendly but equally unambiguous, let us know!). That format is: YYYY-MM-DDThh:mm:ss.nnn±hh:mm.

PEML example:

version.timestamp: 2018-08-25T15:23:22.635-05:00

We expect that tool-edited exercise descriptions will likely generate this field's contents automatically. Also, tools will no doubt have to cope with the fact that authors who externally edit PEML representations might make multiple edits and re-import an exercise multiple times, while "forgetting" to manually update the timestamp. The point of the timestamp is to help authors (and tools) to distinguish between multiple edits/versions of a single exercise (with one exercise_id). However, tool developers are encouraged to keep hash fingerprints of exercise descriptions internally so that when externally edited/modified PEML descriptions are re-uploaded, they can (a) detect meaningful content changes, (b) use version.timestamp values to determine whether an import is a "newer" revision and inform users if the internally stored version is newer than what is being imported, and (c) in the case of version.timestamp "ties", prompt the author/user for more information (for example, suspecting failure to update the timestamp in a changed PEML description for an exercise that has already been imported).

version.repository optional: object or string

The version.repository is is a string or object intended to provide an access path to the repository containing this PEML description's version history. This is most likely a URL (see the discussion of URLs in Design Goals), although relative URLs that are resolved relative to the location of this PEML description can be used.

Repositories are a recurring structure that can appear in multiple places in a PEML description. For details of how a repository can be described, see the common repository substructure definition.

PEML example:

version.repository.url: https://github.com/CSSPLICE/peml.git

license recommended: object

The license key maps to a nested dictionary (object) that identifies the license that applies to use of the exercise. At a minimum, the license should include an id identifying which license governs use of the exercise. Additional information can be provided in the other optional fields if desired.

If the license is provided, both the license.id and the license.owner are required.

license.id required: string

Schema:

"id": { "type": "string", "minLength": 1 }

The license.id identifies the license used for this exercise. The id can be specified by a URL that identifies the license, or by a name (or abbreviated name) that is in common use, such as any of the license keywords used by github (an excellent source for potential license choices).

PEML example:

license.id: cc-sa-4.0

license.owner required: string or object

Schema:

"owner": {
  "anyOf": [
    { "$ref": "#person" },
     { "$ref": "#nonempty_string" }
  ]}

The license.owner identifies the person who "owns" the exercise being described, in the sense of intellectual property. This could be an individual, a publisher, a corporation, or whoever. For individual authors, unique e-mail addresses are preferred as a method of identification, although any string that unambiguously identifies the copyright holder/licenser for this work can be used here. A separate "email" sub-key and optional "name" sub-key can be provided, as in the author tag.

PEML example:

# simple form
license.owner: edwards@cs.vt.edu

# or provide email and name
license.owner.email: edwards@cs.vt.edu
license.owner.name: Stephen Edwards

license.book optional: string

Schema:

"book": { "type": "string", "minLength": 1 }

In a situation where an exercise is part of a textbook or another copyrighted resource, the license.book key can be used to identify the source. In such cases, the "license" for use of the exercise is presumably the same as the license for the corresponding book or containing work. For most textbooks, reuse of resources from the book presumably requires owning a copy of the book. Such exercises should normally be limited to use in situations where the textbook is required or optional for a given pool of users (e.g., students in a course that use that textbook). The value of the license.book can either be a bibliographic-style citation for the book, or a URL that identifies the book.

PEML example:

license.book:
Cay S. Hortsmann, _Big Java: Early Objects, 5th Edition_,
Wiley, 2013. ISBN: 9788126554010

Tool developers are expected to be flexible and forgiving in terms of allowing for a wide variety of human-authored variations in specifying books, although tools should be free to "normalize" these to a standard representation internally (and even for presentation).

license.attribution optional: string

Schema:

"attribution": { "type": "string", "minLength": 1 }

The license.attribution, if provided, contains an acknowledgement string that the license owner wishes for users of the exercise to include when using the work. The license.attribution should be provided for licenses that require users to provide attribution (such as Creative Commons licenses that include the "BY" requirement, or other licenses that require attribution).

PEML example:

license.attribution:
"Palindromes (A Simple PEML Example)" by edwards@cs.vt.edu is licensed
under <a href="https://creativecommons.org/licenses/by/4.0/">CC BY 4.0</a>

license.acknowledgements optional: string

Schema:

"acknowledgements": { "type": "string" }

The license.acknowledgements, if provided, contains all the attributions this exercise makes for licensed use of other (separate) content. While the license.attribution contains content for the users of this exercise to include in derived works, the license.acknowledgements contains attributions acknowledging other content this exercise uses.

Both the spelling "acknowledgments" (more common in the U.S.) and "acknowledgements" (more common in Britain and elsewhere) should be supported by tools (as synonyms).

license.permissions recommended: string

Schema:

"permissions": { "enum": [
  "none",
  "read",
  "fork",
  "fork-with-tests",
  "contribute",
  "all"
] }

Although the license (as identified by the license.id governs the rights that other users have with this exercise, the license.permissions field provides a tool-processable shorthand notation to capture the access permissions granted to others by the license terms. The value should be one of:

  • none: "all rights reserved" (for the author(s)).

  • read: all other users can read/practice the exercise, use it in assignments, etc., but may not create any

  • fork: In addition to read permissions, other users can "fork" this one--that is, use this exercise as a starting point to create a new derived exercise. Forking includes access to all aspects of the exercise except the test suites and the test environment definition. Users are expected to be aware of and obey any licensing restrictions imposed by the license associated with the original exercise.

  • fork-with-tests: Implies the same access permissions as fork but all test suites and test environment details are also included.

  • contribute: In addition to full fork-with-tests access, contribute adds the ability to edit and/or import updated versions of the original exercise.

  • all: Implies full access, which is the permissions level of the author(s) and/or license owner.

Optional Keys

tag optional: object

The tag key maps to a nested dictionary (object) that defines categorical/classification/metadata about the exercise. The full definition of what is included under tag is described in the tagging definitions.

src optional: object

The src key maps to a nested dictionary (object) that defines the source code assets associated with the exercise. The full definition of what is included under src is described in the code data model.

suites recommended: object

The suites key maps to a nested dictionary (object) that defines the test suites associated with the exercise. The full definition of what is included under suites is described in the code data model.

difficulty optional: string

Schema:

"difficulty": { "type": "integer", "minimum": 0, "maximum": 100 }

The difficulty is a subjective rating of question difficulty on an integer scale from 0 (easiest) to 100 (hardest). Difficulty is relative to the presumed level of the target audience intended for the exercise. Typically, an author might use tags to indicate the topics/skills that they expect the user to be familiar with, and also use tags to indicate the topics/skills that the intended audience would be practicing through this exercise. Together, the prerequisites and the topics for the exercise communicate the author's idea of the target audience, and difficulty should be interpreted relative to that target audience.

Intuitively, the difficulty can be thought of as a rough approximation of the percentage of the target audience who might be unable to complete the exercise successfully. One would normally imagine that extreme values of 0 or 100 would not typically be used, since exercises no one can complete (difficulty == 100), or that everyone can trivially succeed at (difficulty == 0)) may have little value. Instead, a difficulty of 50 should be thought of as "average" difficulty, where an average student may have a 50/50 chance of completing the exercise successfully.

However, don't overthink it, since difficulty ratings are both subjective and relative. The author's intuitive reaction to the question of "how hard or easy is this exercise compared to an 'average' exercise for this target audience" is a better way to quickly assign a difficulty value that can still be of value to others reading the exercise description.

PEML example:

difficulty: 60

vendor optional: object

Schema:

"vendor": { "type": "object" }

The vendor key is intended to be a nested map containing any tool-specific keys or extension properties that individual educational tools might support, but that are not intended to be portable across a wide range of tools. The vocabulary and structure for the contents within this dictionary/map do not have any restrictions on how they are modeled.

Note: This is a great place to build out keys/properties that identify grading schemes, late policies, submission contraints, options for processing pipelines, etc. It would be nice if there were examples of tool-specific encodings of these kinds of details that might be used here.

Keys Under Development

The keys in this section are still under active development and are not fully defined or implemented. Consider them as ideas for future work.

options optional: object

Schema:

"options": { "type": "object" }

The options key represents option settings that affect the interpretation of the PEML exercise description itself. These control things like what markup notation is used in text fields, whether mustache-style variable substitution is performed, support for random exercise generation, etc.

options.text_format optional: string

Schema:

"text_format": { "type": "string", "minLength": 1 }

The options.text_format ...

options.interpolation.enable optional: boolean

Schema:

"enable": { "type": "boolean" }

The options.interpolation.enable ...

options.interpolation.delimiters optional: string

Schema:

"delimiters": { "type": "string", "minLength": 1 }

The options.interpolation.delimiters ...

options.variables optional: object

Schema:

"variables": { "type": "object" }

The options.variables ...

options.generator optional: object

Schema:

"generator": { "type": "object" }

The options.generator ...

options.instances optional: array

Schema:

"instances": { "type": "array", "items": { "type": "object" }, "minItems": 1 }

The options.instances ...

origin optional: object

Schema:

"origin": { "type": "object" }

The origin key is intended to be a nested map for exercises that are derived from (or forked from) others. It is intended to contain information about the original upstream exercise that was used as the starting point for this one.

origin.derived_from optional: string

Schema:

"derived_from": { "type": "string", "minLength": 1 }

Contains the exercise_id of the exercise this one was "forked" from, used when one exercise is created as a derived work based on another existing exercise.

origin.family optional: string

Schema:

"family": { "type": "string", "minLength": 1 }

One exercise might be created from another by changing the form of the question. For example, one exercise might be a code-writing exercise that asks "implement code that solves the following problem". From that, one might create a different style of exercise, such as "here's a buggy implementation for this problem, find and fix the bug". Or yet a third style of exercise: "what output does this code produce on the following input(s)?"

At the same time, all these exercises are related in some way if the underlying task being performed by the artifact is the same, even if the skills the user is exercising are different. We can say these different styles of questions are all part of a related "family", where the relation is the underlying task being achieved by the code artifact at the heart of the question.

Different styles of questions might commonly be created by forking an existing question and creating a derived version using a different style (code-writing, multiple-choice, output prediction, bug finding, bug fixing, etc.). The purpose of this key is to identify the family this exercise belongs to using some kind of unique identifier.

Note: We could use some nice ideas here about how to identify these.