Data Model Schema for PEML Exercises
This page presents the data model for PEML. While PEML is its own notation, the data model's structure is also described in the form of a JSON Schema for PEML:
http://cssplice.github.io/peml/schemas/PEML.json
Even though PEML uses its own notation, the data model's structure can easily be mapped into JSON or YAML, and a JSON Schema provides a program-checkable way of expressing the intended data model structure. Snippets from the schema are included below to show the definitions for each key/value field in PEML. PEML fields that have their own substructure are described separately as building blocks in the definitions of recurring model elements.
The main attributes of a PEML exercise description are broken down into three groups:
Required keys
- exercise_id
- title
- author (or
authors
) (at least one ofauthor
orauthors
orlicense.owner
is required)
Recommended keys
- instructions
(at least one of
instructions
orsuites
orsystems
is required) - systems
- version
- license
Optional keys
Keys under development
It is important to note that PEML allows the use of additional keys beyond those described here, which may be custom-supported by specific tools. THe list of keys described here is intended to provide a common vocabulary that can be used by many tools for representing programming exercises, to facilitate authoring, importing, and exporting these exercises. Some keys may relate to features or content that is not supported in every tool, but the goal is to streamline the ability of instructors (or "people" in general) to get exercises into (and potentially out of) educational tools.
Required Keys
Required keys must be present in each PEML description. We keep these to a minimum. However, to promote some aspects of interoperability and data management, considering these required elements on every exercise will help authors keep information organized when it is imported into tools.
exercise_id required: string
Schema:
The exercise_id
is a globally unique, human-written
identifier created by the exercise
author to uniquely identify this exercise on any system. Any non-empty
sequence of non-whitespace unicode characters can be used. We imagine
that authors might construct these identifiers in similar ways to
programming package names or URLs. For example, the following ID includes
a university and course identifier around an exercise name that is
presumed to be unique within that context. By combining the context
and the name, a globally unique identifier can be formed:
PEML example:
By convention, exercises should start with the exercise_id
key first, so that multiple exercises can be concatenated together in a
single file but still sliced/parsed as separate exercises easily.
The purpose of the exercise_id
is to serve as an external
identifier that tools (and people) can use to determine whether two
PEML descriptions describe "the same thing". Exercises with different
ids should be considered as distinct entities, rather than as different
versions of the "same thing". The version
key is used to
identify basic version info for long-lived exercises.
When a user externally edits an exercise representation and re-imports
it into a tool, or exports an exercise from one tool to use in another
context, tools can use the exercise_id
to determine whether
imported information is an update to an existing artifact or whether it
defines a new exercise.
title required: string
Schema:
The title
is a string name or title used as a human-readable
label for the
exercise. The intent is for this to be the "title" shown to students
in various contexts, either when viewing a single exercise or when
viewing lists of exercises. While there is no specific length limit,
ideally titles should be no more than "one line" in size, because of
the various contexts where they might be displayed. How much of the
title is displayed (or truncated) when collections of exercises are
shown is tool-dependent.
PEML example:
author required: string or object
Schema:
The author
tag is used to identify the author of the
exercise (or at least of the PEML exercise description). Recommended
practice is to identify the author by a unique e-mail address. This
field can be used to provide a single e-mail address. Alternatively,
the key authors
can be used to provide an array of
multiple authors.
If the optional license
key is provided and the
license.owner
is the same as the author
, then
the author
key can be omitted.
In addition to providing an email address, the author
tag can use sub-keys to specify both an email address and an author
name.
PEML example:
PEML example:
PEML example:
Recommended Keys
instructions recommended: string
Schema:
The instructions
is where you can provide the exercise's
instructions for the student describing the task to complete. This
is the meat of the "assignment" or "exercise" in many cases. The value
associated with this key is a string, but probably a long one. As with
any key/value pairs in PEML, quoting can
be used. Instructions written
in Markdown (or, as a subset, vanilla HTML) are useful, and some
tools may support other
markup formats.
PEML example:
Note that some educational tools that support PEML may not use the
instructions, or may expect that the instructions are provided through
some means external to the tool. In these cases, the
instructions
field can be omitted, although in practice
either instructions
(describing the assignment) or
suites
(describing how a solution would be tested,
either as a top-level key or nested inside one of the
systems
supported) are required.
If in a specific situation the exercise's instructions are intended to be accessible through a course management system, an instructor-provided website, or some other mechanism, a URL can be used:
PEML example:
While assignments can be farmed out into external files or web pages in this way, we strongly discourage the use of PDF assignment descriptions as limiting the value/utility of a PEML resource. However, in many cases that may be the fastest/cleanest way for an author to get started, and who then may move more into embedding markup in PEML descriptions on future assignments as time permits.
systems recommended: object
The systems
key maps to an array of nested dictionaries
(objects) that
describe the programming language(s) or system(s) in which the exercise
can be conducted. The full
definition of what is included under systems
is described
in the code data model.
version recommended: object
Schema:
Just as in YAML or JSON, a PEML description represents a set of key/value
pairs (a dictionary, hash, or map, also called an "object" in JSON terms),
where keys can map to nested structured values. In PEML, dotted names
represent nesting structure.
The version
key maps to a nested dictionary (object) that
identifies the version of this exercise that is described.
Many authors may use forms of
version control to manage their sources (which is recommended), so
fields under version
can be provided to
capture access paths to an exercise description's version history.
PEML example:
Some tools may be able to deduce the type, repository, id, and location all from a single URL, such as with direct URLs to files on github.com. In such a situation, only the full location URL needs to be specified:
PEML example:
version.type optional: string
Schema:
The version.type
captures the kind of version control
system or repository format used for this PEML description's
version history. Examples include: git, mercurial, CVS, Visual
SourceSafe, etc.
PEML example:
version.id optional: string
Schema:
The version.id
is intended as a way to identify the
commit within the repository holding this description's contents.
It could be a tag, a branch name, a version number, a commit hash,
etc. It's exact meaning is dependent on the nature of the version
control system being used.
PEML example:
version.timestamp recommended: string
Schema:
The version.timestamp
is a human-readable timestamp indicating the time at which this version
of the exercise was last modified. For lack of a better option, at
the moment this should be an RFC 3339/ISO 8601 UTC timestamp (if you
know of something more user-friendly but equally unambiguous, let
us know!). That format is: YYYY-MM-DDThh:mm:ss.nnn±hh:mm.
PEML example:
We expect that tool-edited exercise descriptions will likely generate
this field's contents automatically. Also, tools will no doubt have to
cope with the fact that authors who externally edit PEML representations
might make multiple edits and re-import an exercise multiple times,
while "forgetting" to manually update the timestamp. The point of the
timestamp is to help authors (and tools) to distinguish between multiple
edits/versions of a single exercise (with one exercise_id
).
However, tool developers are encouraged to keep hash fingerprints of
exercise descriptions internally so that when externally edited/modified
PEML descriptions are re-uploaded, they can (a) detect meaningful content
changes, (b) use version.timestamp
values to determine
whether an import is a "newer" revision and inform users if the internally
stored version is newer than what is being imported, and (c) in the
case of version.timestamp
"ties", prompt the author/user
for more information (for example, suspecting failure to update the
timestamp in a changed PEML description for an exercise that has already
been imported).
version.repository optional: object or string
The version.repository
is is a string or object
intended to provide an
access path to the repository containing this PEML description's
version history. This is most likely a URL (see the discussion of
URLs in
Design Goals), although
relative URLs that are resolved relative to the location of this
PEML description can be used.
Repositories are a recurring structure that can appear in multiple places in a PEML description. For details of how a repository can be described, see the common repository substructure definition.
PEML example:
license recommended: object
The license
key maps to a nested dictionary (object) that
identifies the license that applies to use of the exercise. At a minimum,
the license
should include an
id identifying which license governs use
of the exercise. Additional information can be provided in the other
optional fields if desired.
If the license
is provided, both the license.id
and the license.owner
are required.
license.id required: string
Schema:
The license.id
identifies the license used for this
exercise. The id can be specified by a URL that identifies the
license, or by a name (or abbreviated name) that is in common
use, such as any of the
license
keywords used by github (an excellent source for potential
license choices).
PEML example:
license.owner required: string or object
Schema:
The license.owner
identifies the person who "owns"
the exercise being described, in the sense of intellectual
property. This could be an individual, a publisher, a corporation,
or whoever. For individual authors, unique e-mail addresses are
preferred as a method of identification, although any string that
unambiguously identifies the copyright holder/licenser for this
work can be used here. A separate "email" sub-key and optional
"name" sub-key can be provided, as in the author
tag.
PEML example:
license.book optional: string
Schema:
In a situation where an exercise is part of a textbook or another
copyrighted resource, the license.book
key can
be used to identify the source. In such cases, the "license" for
use of the exercise is presumably the same as the license for
the corresponding book or containing work. For most textbooks,
reuse of resources from the book presumably requires owning a
copy of the book. Such exercises should normally be limited to
use in situations where the textbook is required or optional for
a given pool of users (e.g., students in a course that use that
textbook). The value of the license.book
can either
be a bibliographic-style citation for the book, or a URL
that identifies the book.
PEML example:
Tool developers are expected to be flexible and forgiving in terms of allowing for a wide variety of human-authored variations in specifying books, although tools should be free to "normalize" these to a standard representation internally (and even for presentation).
license.attribution optional: string
Schema:
The license.attribution
, if provided, contains an
acknowledgement string that the license owner wishes for users
of the exercise to include when using the work. The
license.attribution
should be provided for licenses
that require users to provide attribution (such as Creative
Commons licenses that include the "BY" requirement, or other
licenses that require attribution).
PEML example:
license.acknowledgements optional: string
Schema:
The license.acknowledgements
, if provided, contains
all the attributions this exercise makes for licensed
use of other (separate) content. While the
license.attribution
contains content for the users
of this exercise to include in derived works, the
license.acknowledgements
contains attributions
acknowledging other content this exercise uses.
Both the spelling "acknowledgments" (more common in the U.S.) and "acknowledgements" (more common in Britain and elsewhere) should be supported by tools (as synonyms).
license.permissions recommended: string
Schema:
Although the license (as identified by the license.id
governs the rights that other users have with this exercise, the
license.permissions
field provides a tool-processable
shorthand notation to capture the access permissions granted to
others by the license terms. The value should be one of:
none
: "all rights reserved" (for the author(s)).read
: all other users can read/practice the exercise, use it in assignments, etc., but may not create anyfork
: In addition toread
permissions, other users can "fork" this one--that is, use this exercise as a starting point to create a new derived exercise. Forking includes access to all aspects of the exercise except the test suites and the test environment definition. Users are expected to be aware of and obey any licensing restrictions imposed by the license associated with the original exercise.fork-with-tests
: Implies the same access permissions asfork
but all test suites and test environment details are also included.contribute
: In addition to fullfork-with-tests
access,contribute
adds the ability to edit and/or import updated versions of the original exercise.all
: Implies full access, which is the permissions level of the author(s) and/or license owner.
Optional Keys
tag optional: object
The tag
key maps to a nested dictionary (object) that
defines categorical/classification/metadata about the exercise. The full
definition of what is included under tag
is described
in the tagging definitions.
src optional: object
The src
key maps to a nested dictionary (object) that
defines the source code assets associated with the exercise. The full
definition of what is included under src
is described
in the code data model.
suites recommended: object
The suites
key maps to a nested dictionary (object) that
defines the test suites associated with the exercise. The full
definition of what is included under suites
is described
in the code data model.
difficulty optional: string
Schema:
The difficulty
is a subjective rating of question
difficulty on an integer scale from 0 (easiest) to 100 (hardest).
Difficulty is relative to the presumed level of
the target audience intended for the exercise. Typically, an author
might use tags
to indicate the topics/skills that they
expect the user to be familiar with, and also use tags
to indicate the topics/skills that the intended audience would be
practicing through this exercise. Together, the prerequisites
and the topics for the exercise communicate the author's idea of the
target audience, and difficulty should be interpreted relative to
that target audience.
Intuitively, the difficulty
can be thought of as a rough
approximation of the percentage of the target audience who
might be unable to complete the exercise successfully. One
would normally
imagine that extreme values of 0 or 100 would not typically be used,
since exercises no one can complete (difficulty == 100), or that
everyone can trivially succeed at (difficulty == 0)) may have little
value. Instead, a difficulty
of 50 should be thought of
as "average" difficulty, where an average student may have a 50/50
chance of completing the exercise successfully.
However, don't overthink it, since difficulty ratings are both subjective and relative. The author's intuitive reaction to the question of "how hard or easy is this exercise compared to an 'average' exercise for this target audience" is a better way to quickly assign a difficulty value that can still be of value to others reading the exercise description.
PEML example:
vendor optional: object
Schema:
The vendor
key is intended to be a nested map containing
any tool-specific keys or extension properties that individual
educational tools might support, but that are not intended to be
portable across a wide range of tools. The vocabulary and structure for
the contents within this dictionary/map do not have any restrictions
on how they are modeled.
Note: This is a great place to build out keys/properties that identify grading schemes, late policies, submission contraints, options for processing pipelines, etc. It would be nice if there were examples of tool-specific encodings of these kinds of details that might be used here.
Keys Under Development
The keys in this section are still under active development and are not fully defined or implemented. Consider them as ideas for future work.
options optional: object
Schema:
The options
key represents option settings that affect
the interpretation of the PEML exercise description itself. These
control things like what markup notation is used in text fields,
whether mustache-style variable substitution is performed, support
for random exercise generation, etc.
options.text_format optional: string
Schema:
The options.text_format
...
options.interpolation.enable optional: boolean
Schema:
The options.interpolation.enable
...
options.interpolation.delimiters optional: string
Schema:
The options.interpolation.delimiters
...
options.variables optional: object
Schema:
The options.variables
...
options.generator optional: object
Schema:
The options.generator
...
options.instances optional: array
Schema:
The options.instances
...
origin optional: object
Schema:
The origin
key is intended to be a nested map for
exercises that are derived from (or forked from) others. It is
intended to contain information about the original upstream
exercise that was used as the starting point for this one.
origin.derived_from optional: string
Schema:
Contains the exercise_id
of the exercise this
one was "forked" from, used when one exercise is created as
a derived work based on another existing exercise.
origin.family optional: string
Schema:
One exercise might be created from another by changing the form of the question. For example, one exercise might be a code-writing exercise that asks "implement code that solves the following problem". From that, one might create a different style of exercise, such as "here's a buggy implementation for this problem, find and fix the bug". Or yet a third style of exercise: "what output does this code produce on the following input(s)?"
At the same time, all these exercises are related in some way if the underlying task being performed by the artifact is the same, even if the skills the user is exercising are different. We can say these different styles of questions are all part of a related "family", where the relation is the underlying task being achieved by the code artifact at the heart of the question.
Different styles of questions might commonly be created by forking an existing question and creating a derived version using a different style (code-writing, multiple-choice, output prediction, bug finding, bug fixing, etc.). The purpose of this key is to identify the family this exercise belongs to using some kind of unique identifier.
Note: We could use some nice ideas here about how to identify these.