Dataset Sites

Dataset Sites help data users to find your data, and interact with you if they find any issues. They are both human and machine readable, and allow your data to appear everywhere from the OpenActive Status Page to the Google Dataset Search.

Example dataset sites: GLL, Fusion Lifestyle

Overview

  • In order for data users to find your data feeds, and for your dataset to be featured on the OpenActive status page, you must deploy the Dataset Site Template (a simple mustache template) within your codebase.

  • For booking systems or bespoke websites with a single database and one set of OpenActive data feeds, a single Dataset Site is likely to be sufficient for your organisation. This can be achieved by simply hard-coding the JSON passed into the mustache template.

  • If you are a booking system with multiple databases, each of which has their own set of OpenActive data feeds, a Dataset Site is required for each customer. This can be achieved using customer configuration to drive the mustache template.

  • You need to create a GitHub issues board for each Dataset Site.

  • If you are publishing multiple dataset sites you also need to provide a Data Catalog.

  • Data publishers should be encouraged to provide links to their Dataset Site from their own website.

What is a Dataset Site?

  • A web page that can be referenced when discussing the dataset.

  • A human and machine readable licence associated with the data (the Dataset Site contains invisible metadata which allows its details to be read automatically).

  • A human and machine readable rights statement to specify how dataset users (innovators who want to build on top of/use your data) should attribute your data.

  • An accessible "single point of truth" that explains where the data can be found.

  • Links to documentation relating to the format of the data, including the specifications it follows, and the data fields it contains.

  • A place where the community can contribute with comments, and raise issues - all Dataset Sites are linked to a GitHub issues board (e.g. this one) that allows data users to raise issues in the open.

A machine-readable dataset site is essential when publishing open data, and every dataset published within the OpenActive community to date has had one. However, the specification that describes a standard OpenActive dataset site is still yet to be formally defined, and has instead evolved as a de facto standard.

As such, this documentation is still based on a draft model that is designed to inform the OpenActive specification work with implementation feedback. It is mostly stable and has been largely unchanged for 2 years. However, it is still subject to change, as the Dataset API Discovery specification is yet to be formally released, and feedback is very welcome, both within the relevant OpenActive repository and on the related schema.org PR and issue.

To minimise any uplift work required to conform to the formal specification when it is released, it is recommended that you use one of the libraries below where possible. These libraries will be updated to meet the latest specification, and when used in their simplest mode (RenderSimpleDatasetSite, renderSimpleDatasetSite or TemplateRenderer.new) will only require a simple dependency update from you to do so.

Step 1: Build Dataset Sites into your system

The Dataset Site Template is very easy to use and quick to apply - it's essentially a single mustache template and associated JSON structure. It is designed to work with minimal effort with an extremely wide range of platforms and languages.

The dataset site template repository contains a mustache template for creating an OpenActive dataset site.

.NET, PHP and Ruby Libraries

Several libraries are available that make it really easy to render the dataset site template, accepting basic settings to configure your dataset site automatically.

The table below lists the available OpenActive libraries:

Library

.NET

PHP

Ruby

Open Opportunity Data Feeds

OpenActive.NET

openactive/models

openactive

Dataset Site

OpenActive.DatasetSite.NET

openactive/dataset-site

openactive-dataset_site

Other Languages

A basic example of following the below render steps can be found here, and can be readily ported into other languages. An explanation of how this works is included below.

The Dataset Site Template is a single self-contained mustache template of an HTML page that contains embedded CSS, an embedded encoded image, and references to CDNs of Font Awesome and Google Fonts. It works across all browsers, and includes fully compliant DCAT and schema.org machine-readable metadata to ensure it is compatible with Google Dataset Search.

Steps to render the template:

  1. Construct the JSON-LD to match the format found in this example, following this documentation.

  2. Find a mustache library for your platform or language.

  3. Write code to do the following:

    • Stringify the input JSON, and place the contents of the string within the "json" property at the root of the JSON itself (i.e. serialised JSON embedded in the original deserialised object).

    • Use the resulting JSON with the mustache template to render the dataset site.

    • Keep in mind that OpenActive will be providing updates to the mustache template in the future, so it is best to write code that anticipates this.

JavaScript Prototype

The JSFiddle below simply demonstrates the Dataset Site Template render steps outlined above using plain JavaScript - it is not intended for protection use.

Please note this is only an example to demonstrate the logic and is not intended for production use. The mustache template must be copied locally and rendered server-side for production use, for security (to prevent XSS attacks) and as one of its primary purposes is SEO.

Click the Result tab below to see the result of a template render.

Step 2: Personalising the Dataset Site

The Dataset Site Template is designed to carry the customer's brand with minimal configuration.

Single database

For booking systems or bespoke websites with a single database and one set of OpenActive data feeds, a single Dataset Site is likely to be sufficient for your organisation. This can be achieved by simply hard-coding the JSON passed into the mustache template (see documentation and example), or hard-coding the settings passed to the library (see the relevant library documentation).

Note a single Dataset Site must only be used when all feeds it includes are part of the same dataset - for example a SessionSeries feed and ScheduledSession feed that together constitute the dataset of all providers in the booking system. Where multiple feeds exist that represent distinct datasets (e.g. SessionSeries feed for Provider A, SessionSeries feed for Provider B), they must be referenced from distinct Dataset Sites, which can be constructed as per the instructions in Multiple databases below.

Multiple databases

For large booking systems with multiple databases, usually a separate database for each customer, a separate Dataset Site may be created for each database. The list below illustrates the minimal number of configurable properties that can be used to generate the whole dataset site in a way that is personalised to each customer. See the example here for how these map into the JSON data structure, for your reference - in practice the libraries supplied above take care of this mapping for you.

We suggest if you can provide the customer with a means of customising the logo and background image (e.g. via uploading an image to the cloudinary.com CDN, using their widget, which is free at low volume), these have the largest effect on the brand feel of the page.

Although the customer will likely be able to fill in most properties specific to them, there are two where they will require guidance:

  • datasetDiscussionUrl - the URL of the GitHub issues board for the dataset. If your customers are sufficiently large, you will need to create a GitHub issues board for each customer, either manually or automatically. See here for an example of Gladstone's GitHub organization containing a GitHub issues board for each customer.

  • datasetDocumentationUrl - as a booking system you should provide at least a single page on your website that explains the OpenActive feeds. Each customer may have the option of providing their own documentation for their dataset site that links to this, or just linking to your documentation direct. If you do not have your own documentation page, you can just link to "https://developer.openactive.io/".

Step 3: GitHub Issues Board creation

The discussionUrl is the url of the GitHub issues board for that specific dataset site.

We recommend that you create each GitHub repository (that will include a GitHub Issues Board) within your own GitHub organisation either manually or via an API call.

If you have multiple databases and customers with large data volumes, you should create one GitHub repository (that will include a GitHub Issues Board) for each customer. Single database systems need only create one GitHub repository (that will include a GitHub Issues Board).

Helpdesk integration

If you "follow" these GitHub repositories using a new GitHub account created with your support e-mail address then you will receive notifications for each query, and be able to reply via e-mail to the notifications from your support e-mail address - these replies then appear directly in GitHub. Note that any administrator accounts automatically follow newly created GitHub repositories within your organisation.

GitHub Organisation Creation

You must first create a parent GitHub organisation on the free tier:

  • For booking systems we recommend naming the parent GitHub organisation after your own organisation

  • For agencies or in-house tech teams we recommend naming parent GitHub organisation after your data publishing organisation.

Manual Issues Board Creation

A guide for creating a new GitHub repository for each customer can be found below.

Automatic Issues Board Creation

The GitHub API provides a mechanism to automatically create GitHub repositories. The recommended properties for a new repository are included below:

{
"name": "AshfordLeisureTrust",
"description": "Issues relating to open data from Ashford Leisure Trust",
"homepage": "https://ashfordleisuretrust.leisurecloud.net/OpenActive/",
"private": false,
"has_issues": true,
"has_projects": false,
"has_wiki": false,
"auto_init": false
}

Step 4: Validating your Dataset Site

Use the validator to check that the JSON-LD within your Dataset Site is conformant, by using the Load URL feature in the menu to load your Dataset Site URL, while in the "Dataset Sites" mode. The validator will automatically extract the JSON-LD from your Dataset Site's HTML and validate it.

Step 5: Providing a Data Catalog (multiple databases only)

For booking systems with multiple databases, a Data Catalog must also be provided to allow the many Dataset Sites that are created to be easily indexed by the OpenActive Status Page and other data users.

A Data Catalog is very simply an array of the URLs of all your Dataset Sites (the dataset array), presented within a DataCatalog wrapper following a specific format. An example of a live Data Catalog from the Gladstone system can be found here, and another example here.

Please use the validator to check that your DataCatalog is conformant, using the "Data Catalog" mode.

Step 6: Adding your Dataset Site or Data Catalog to the OpenActive Data Catalog Collection

OpenActive Data Catalogs provide a mechanism for registering OpenActive Datasite Sites so that they can be discovered and harvested by data users.

Single database

If you have created a new Dataset Site, simply create a Pull Request for the OpenActive Data Catalog for Singular Datasets and add your Dataset Site's production URL to the dataset array.

Multiple databases

If you have created a new Data Catalog that links to your Dataset Sites, simply create a Pull Request for the OpenActive Data Catalog Collection and add your Data Catalog's production URL to the hasPart array.