Harvesting opportunity data

Libraries

Although the logic to harvest and use OpenActive data is straightforward, there are several libraries that provide helpers to aid the consumption of opportunity data feeds.

The table below lists these libraries:

Language

Dataset Discovery

Harvesting Feeds

JavaScript / TypeScript

@openactive/dataset-utils

N/A

Python

openactive-python

Ruby

openactive.rb *

openactive.rb

* Note that the Ruby library requires updating before it can be used for dataset discovery

Dataset discovery

As described in the data catalogue processing guidance, OpenActive datasets can be discovered automatically by "spidering" links within the canonical OpenActive Data Catalog Collection JSON-LD file.

Harvesting feeds

Considerations

Combining feed pairs
- Ensure that updates and deletes from both parent and child feeds are considered (such as SessionSeries/ScheduledSession or FacilityUse/Slot - see Types of RPDE feed for more information)
Harvest frequency
- To ensure your resources are not wasted, especially as you scale feed consumption, ensure that sleep and live modes are respected (i.e. wait 8 seconds if there are no items in the feed before making the next request). Due to caching, more frequent requests will simply hit a CDN and return the same response, so there is no advantage in polling faster than this.
De-serializing RPDE Feeds:
- Ensure that the data type used to de-serialize the modified timestamp can support signed 64-bit integers. More info here.
Resyncs
- Consuming an RPDE feed from the beginning is termed a "resync".
- RPDE feeds are not designed to be resynced frequently.
- The feed consumer must continue to consume updates from the end of the feed to ensure the data stays up-to-date, rather than downloading all data from each update.
- Resyncing any individual feed more than once each week is not recommended, as it increases the load on the open data publisher's servers, which will likely result in a high number of 429 responses and could cause your IP address to be blacklisted.

Common Pitfalls

Storing RPDE `modified` with less than 64-bit integers

A common approach to creating modified values for an RPDE feed is to use SQL Server's timestamp/rowversion data types. This approach is suggested in the RPDE specification.

This data type has been seen to generate integers up to values of 2⁶⁰.

Therefore, it is recommend to implementers to use data types that can store at least a signed 64-bit integer with precision.

Language specific guidelines:

JavaScript / TypeScript:
- JavaScript numbers are 64-bit floating point numbers, which means that integers cannot be represented with sufficient precision beyond 2⁵³.
- Getting around this is a bit complicated, but possible. See this page for more info: Large Integers in JavaScript.
C#: Use a long as opposed to an int
Other languages: Ensure that the integer type that is being used to de-serialize the modified timestamps from RPDE feed pages has at least as much size and precision as a signed 64-bit integer.

PreviousData Quality NextLarge Integers in JavaScript

Last updated 9 months ago