TerminusDB schema language based on simple JSON syntax.
The TerminusDB schema language enables documents and their relationships to be specified using simple JSON syntax. This syntax makes it as easy as possible to specify a JSON object to automatically convert to a graph. This approach enables data to be viewed as collections of documents or as knowledge graphs of interconnected objects.
A JSON object in TerminusDB schema is composed of key-value pairs.
A key is one of two values, keyword or property, described in the table below. The full schema definition is a stream or list of these values or JSON objects.
The basic unit of specification is a class. A class definition is a schema object with the keyword @type
with type value Class
. The keyword @id
specifies the name of the class. The example below define a class named Person
with a property name
of type xsd:string
. Search XSD definitions for more information about types.
The context object is a special schema object affecting the entire schema. The context object is specified by the special @type
value @context
. An example:
This example does the following:
Defines default prefixes in @schema
and @base
to use for the schema and data.
Defines the prefix xsd
enabling vocabulary based on different URL prefixes.
For example, specify xsd:string
to denote http://www.w3.org/2001/XMLSchema#string
Documents the schema in the @documentation
value, providing:
@title
@authors
@description
All properties in the context object that do not start with @
, such as xsd
, are URI definitions. They must be of the form shown below. Prefix and URI are defined by their respective regular expressions. That is, a prefix has an identifier starting with an alphabetic character followed by alphanumeric characters. The URI has a protocol followed by valid URI characters. Each prefix is paired with a URI.
A list of keywords used in the context object.
The @schema
keyword specifies the default URI expansion to use for all elements of the schema. In the example below, the class name NamedQuery
expands to http://terminusdb.com/schema/woql#NamedQuery
.
@base
specifies the default URI expansion used for all elements of instance data. In the previous schema definition, and given the document in the instance graph example below, the id NamedQuery_my_query
expands to terminusdb://woql/data/NamedQuery_my_query
.
@documentation
specifies documentation global to the entire schema. See the @documentation
section in the previous context object example. The @documentation
tag can be a single value, or it can be a list with each element having an additional @langugage
tag. The @language
tag must have an IANA language code, and this will be used to select appropriate descriptions when internationalising the schema.
The documentation section contains the keywords:
The @title
of the schema to display.
A long-form @description
of the purpose of the schema, the type of documents contained in the schema, and keywords useful for searching for the type of content that the schema encodes.
A list of strings of @authors
involved in writing the schema.
If you would like to add arbitrary JSON structured metadata to a schema, you can place it in the @metadata
field of the context object. This can be used to store data product-wide information in a structured format. For instance:
If you use the @language
code, specific documentation results can appear in different circumstances depending on the users language preferences.
An example of the @language
tag for a context is as follows:
A document definition includes several properties, and the keywords, prefixed @
, describing class behavior.
The @type
of the object. At the schema level, this is one of: Enum
, Class
, TaggedUnion
and Unit
.
If you would like to add arbitrary JSON structured metadata to a class, you can place it in the @metadata
field of the class. This can be used to direct various approaches to display of the class, or associated information for backend or front-ends which may have different requirements. It is generally good practice to keep important metadata one level deeper in a JSON object so as to leave space for other kinds of metadata. For instance:
The three varieties of document are described below:
Class
designates a standard class document. It contains the definition of several properties and keywords describing various class attributes. An example of a class, and an instance of the class:
An Enum
is a non-standard class in which each instance is a simple URI with no additional structure. To be a member of the class, you must be one of the referent URIs. An Enum
example with an extension Blue
is s shown below. In the database, the actual URI for an Enum is expanded with the preceding type name, so the Blue
extension becomes http://s#PrimaryColour/Blue
A TaggedUnion
specifies mutually exclusive properties. This is useful when there is a disjoint choice between options.
Examples below of a schema with a TaggedUnion and a concrete TaggedUnion class extension. In these examples, the BinaryTree
class specifies a TaggedUnion
enabling a choice between a leaf
(with no value), or a node
class with a value and branches.
The TaggedUnion
is a special case and syntactic sugar for the more general case of collections of disjoint properties. These more complex cases can be represented by inheriting from a number of TaggedUnion
s, but they may also be given explicitly using the @oneOf
field, together with a Class.
The value of the @oneOf
field is a set, so can be any number of documents all of which have mutually disjoint properties, but which can coexist. Examples with more than one disjoint property are given below.
@oneOf
class extensionsBut not:
The Unit
type has a single extension []
. This is used when only the presence of the property is interesting, but it has no interesting value. See the BinaryTree
in the TaggedUnion class extension example above.
The @id
key of a class defines the class name and identifier. The name uniquely defines the class, enabling the class to be updated, retrieved, and deleted. In the example below, the class is named NamedQuery
. It does not have a fully qualified URL or prefix, so it is implicitly based on the URI given for @schema
.
@key
specifies the mechanism to define the @id
of documents in the database, similar to a primary key in relational database terms. Valid key types are Lexical
, Hash
, ValueHash
, Random
.
If the key @base
is specified in the class, then this is pre-pended to the key. If this is a fully qualified URI then it is complete, otherwise, it is combined with the value of @base
from the context.
A Lexical
key specifies a URI name formed from a URI encoded combination of all @fields
arguments provided, in the order provided. An example is shown below. With this key type (or key strategy) a URI is formed from the combination of first_name
and last_name
. If @base
is specified in the class, this is prepended.
Given the simple document definition below, this will either generate (if @id
is not supplied) or check that the URI http://example.com/people/Person_Hasdrupal_Barca
is the @id
element.
Hash
is generated in the same way as Lexical
except that values are first hashed using the SHA-256 hash algorithm.
Use this where there:
Are numerous items that form the key making the URI unwieldy.
Is no need for the URI to inform the user of the content of the object.
Is a requirement that data about the object is not be revealed by the key.
Define a Hash
in the same way as the Lexical key strategy example in the previous section, replacing the @key
@type
value from Lexical
to Hash
.
Given the simple document definition in the previous section, the @id
Person_5dd7004081e437b3e684075fa3132542f5cd06c1
is generated.
The ValueHash
key generates a key defined as the downward transitive closure of the directed acyclic graph from the root of the document. This means you can produce a key that is entirely based on the entire data object. Note ValueHash
:
Takes no additional keywords.
Objects must be directed acyclic graphs, they cannot be cyclic.
In the example below, ValueHash
is formed only from the value of layer:identifier
.
Use Random
as a convenient key type when an object has no important characteristics that inform a key or does not need to be constructed such that it is reproducible. In the example below, the @key
@type
is defined as Random
, meaning each new database that is added is unique regardless of label.
Use @documentation
to add documentation to the class and the property fields or values of the class. The @documentation
can either be an object, or a list of objects with specified languages (and at most one default unspecified). An example using multiple languages might be:
The keywords of the @documentation
object are @comment
and either @properties
or @values
for standard classes or Enums
respectively. Each of the @properties
or @values
can likewise have either a simple label, or an object with @label
and @comment (as above)
.
For Enum
we can write as follows:
For a standard Class
with one default language, we can write as follows:
The @comment
is the class description.
The @properties
keyword is a JSON object with pairs of the form:
or with properties pointingn to JSON objects, as:
@base
specifies a prefix to prepare to the @key
. This prefix is absolute if @base
is a fully qualified URI, otherwise, it will, in turn, be prefixed by the system-wide @base
definition. In the example below, the @base
for the class is fully qualified after the layer_data
prefix is expanded. This means the layer URIs have the form terminusdb://layer/data/Layer_
followed by a random string.
The @subdocument
key is present with the value []
or it is not present.
A class designated as a sub-document is considered to be completely owned by its containing document. It is not possible to directly update or delete a subdocument, but it must be done through the containing document. Currently, subdocuments must have a key that is Random
or ValueHash
(this restriction may be relaxed in the future.)
See below for examples of a subdocument declaration in a schema, and a corresponding subdocument.
The @abstract
key is present with the value []
or it is not present.
An abstract class has no concrete referents. It provides a common superclass and potentially several properties shared by all of its descendants. Create useful concrete members using the @inherits
keyword.
An example of the abstract keyword in a schema, and a concrete instance of the Person
class, but not of the NamedEntity
class:
@inherits
enables classes to inherit properties (and the @subdocument
designation) from parent classes. It does not inherit key strategies.
This inheritance tree is also available as a subsumption
relation in the WOQL query language and provides semantics for frames in the schema API.
The range of @inherits
can be a class or a list of classes. For example:
Or
Multiple inheritance is allowed as long as all inherited properties of the same name have the same range class. If range classes conflict, the schema check fails.
An example of inheritance of properties and an object meeting this specification:
The @unfoldable
key is present with the value []
or it is not present.
In the document API, when retrieving documents, the default behavior is for any linked document to be returned as an IRI, while subdocuments are fully unfolded and returned as a nested document. With the @unfoldable
option set, linked documents will behave just like subdocuments, and will also be unfolded on retrieval.
The @unfoldable
option can only be set on a class which does not directly or indirectly link to itself. This prevents a self-referencing document from being unfolded infinitely.
The purpose of @unfoldable
is to be able to treat linked (top-level) documents as subdocuments in representation. Subdocuments can only be linked by one document, its owner, whereas normal documents can be linked by any number of other documents. If the desired result is to have a document linked by several other documents, but still have it fully unfolded on retrieval like a subdocument, use this option.
The above example shows both Doug and Phil using the same address document. On retrieval of all Persons, the document API returns these documents:
The address is fully unfolded in both documents despite not being a subdocument.
All non-keywords are treated as properties of the class, with the form:
Or
A range class is a concrete base type defined as any of the xsd types (see XSD), or a class defined in the current schema, including the current class.
In the example range class below, first_name
and last_name
are strings, year_of_birth
is a year, and friend
is any number of Person
objects, in no particular order and without duplication. Also, see below an example of a concrete set of documents with this form.
Two special JSON types exist in TerminusDB. One is for use as a subdocument, and is called "sys:JSON"
and the type "sys:JSONDocument"
which is used for type level. Both allow un-constrained and untypechecked documents which can be stored or retrieved as apparently unmodified JSON, but which are still indexed and searchable using WOQL.
Ids for subdocuments of type "sys:JSON"
are formed from a hash of the content, meaning that subdocuments are shared if their content is the same.
However, those of type "sys:JSONDocument"
are assigned a random id, such that they can be retrieved, modified etc. Alternatively they can be assigned an id by passing in an id of the form { "@id" : "JSONDocument/my_id_here", ...}
making sure to use the prefix "JSONDocument"
so as not to ensure we do not have any id conflicts with other document types.
"sys:JSON"
We can now have a well typed "Person"
which contains a metadata field of type "sys:JSON"
which is unconstrained JSON as follows:
"sys:JSONDocument"
Using the { "json" : true }
option to the insert API, or using the TerminusDB CLI with the -j
or --json=true
flag we can insert an arbitrary JSON document.
Using the CLI we can write:
Use type families to construct optionality or collections of values. Type families are List
, Set
, Array
, and Optional
.
Use Optional
as a type family where a property is not required.
Supply an optional comment
field in CodeBlock
. Both of the following documents are valid:
OR
Use List
to specify an ordered collection, with multiplicity, of values of a class or datatype.
An example of an object Task
contained in a List
of elements known as a TaskList
. This list is retrieved in the same order that it is inserted. It is also capable of storing duplicates.
Use Set
to specify an unordered set of values of a class or datatype.
An example of an object Person
that can have 0 to any number of friends. This list has no order and is retrieved from the database in a potentially different order. Inserted duplicates do not create additional linkages and only a single of the multiple supplied results are returned.
Use Foreign
to specify types which are to be references to external data products. Foreign types are types which are opaque in the current data product. This allows us to give them identifiers although we don't actually store the objects locally. Foreign types have no referential integrity checking, and as they refer to opaque identifiers, the schema is checked by the data product in which they are referred.
A foreign type must be declared explicitly by giving the name of the type to be treated as foreign using the Foreign
designation in the schema.
For instance, to add a foreign type of type Person, we can write:
The actual definition of person might be given in its home data product as:
From the command line we can see how an HR data product might interact with an Events data product.
Create the HR data product:
Add the HR schema:
Create the Events data product:
Add events, and a foreign type designation:
Add a person to HR:
Add an event referring to the person:
Recover the event:
Use Cardinality
to specify an unordered set of values of a class or datatype in which the property has a limited number of elements as specified by the cardinality constraint properties.
The relevant properties are:
@cardinality
When specified, the number of elements for the given property must be exactly the cardinality specified. This is equivalent to specifying both @min_cardinality
and @max_cardinality
as the same cardinality.
@cardinality
An example of an object Person
that can have exactly threefriends. As with Set
This list has no order and is retrieved from the database in a potentially different order.
@min_cardinality
When specified, the number of elements for the given property must be at least the cardinality specified.
@max_cardinality
When specified, the number of elements for the given property must be no more than the cardinality specified.
When set to 1, this is functionally equivalent to the Optional
constraint.
Use Array
to specify an ordered collection, with multiplicity, of values of a class or datatype in which you may want random access to the data and which may be multi-dimensional. Array
is implemented with intermediate indexed objects, with a sys:value
and indexes placed at sys:index
, sys:index2
, ... sys:indexN
for each of the array indices of the multi-dimensional array. However when extracted as JSON they will appear merely as lists (possibly of lists), with possible null values representing gaps in the array.
An example of a polygon object GeoPolygon
points to a 2D array of coordinates which specify a polygon encompassing the Phoneix Park.
TerminusDB is equiped with a type inference engine which allows types to be inferred under certain conditions.
The algorithm attempts to find a unique type that can successfully be ascribed to a document. In the event that no type is found, you will get an error that no type applies. If several types might apply, you will see the list of candidate types in the error. If TerminusX is able to find the unique type which applies, it will ascribe the type automatically.
Type ascription is perhaps most useful in cases in which abstract types are used as ranges of a property, but in which there are only sibling concrete types that might apply. In this case, it is easy to ensure a unique typing for the range class and improves the flexibility of the interface.
It should also be considered that the type being ascribed is based on the schema as it is when the document is inserted. For this reason, in some cases it may be better to tag the document explicitly with the @type
keyword.
Given the following schema:
We can insert the following document through the document interface:
This document will be ascribed type "Person"
and the two documents linked will likewise be typed as "Person"
In the case of certain well defined JSON documents schemata however, such as GeoJSON, there is never a possibility of ambiguity and so the type-inferencing helps to make it much more convenient.
This schema provides the "Point"
type with a singleton enum tag. This singleton enum tag will help to uniquely assign the type.
We can then insert a point document which might be written as:
Key type | Example | Description |
---|---|---|
keyword
@id
Starts with @
, has a value with a special meaning.
property
name
Does not start with @
, has a value with a range type.