Harvesting opportunity data
Last updated
Last updated
Although the logic to harvest and use OpenActive data is straightforward, there are several libraries that provide helpers to aid the consumption of opportunity data feeds.
The table below lists these libraries:
JavaScript / TypeScript
N/A
Python
Ruby
* Note that the Ruby library requires updating before it can be used for dataset discovery
As described in the data catalogue processing guidance, OpenActive datasets can be discovered automatically by "spidering" links within the canonical OpenActive Data Catalog Collection JSON-LD file.
Combining feed pairs
Ensure that updates and deletes from both parent and child feeds are considered (such as SessionSeries/ScheduledSession or FacilityUse/Slot - see Types of RPDE feed for more information)
Harvest frequency
To ensure your resources are not wasted, especially as you scale feed consumption, ensure that sleep and live modes are respected (i.e. wait 8 seconds if there are no items in the feed before making the next request). Due to caching, more frequent requests will simply hit a CDN and return the same response, so there is no advantage in polling faster than this.
De-serializing RPDE Feeds:
Ensure that the data type used to de-serialize the modified
timestamp can support signed 64-bit integers. More info here.
Resyncs
Consuming an RPDE feed from the beginning is termed a "resync".
RPDE feeds are not designed to be resynced frequently.
The feed consumer must continue to consume updates from the end of the feed to ensure the data stays up-to-date, rather than downloading all data from each update.
Resyncing any individual feed more than once each week is not recommended, as it increases the load on the open data publisher's servers, which will likely result in a high number of 429 responses and could cause your IP address to be blacklisted.
A common approach to creating modified
values for an RPDE feed is to use SQL Server's timestamp
/rowversion
data types. This approach is suggested in the RPDE specification.
This data type has been seen to generate integers up to values of 2⁶⁰.
Therefore, it is recommend to implementers to use data types that can store at least a signed 64-bit integer with precision.
Language specific guidelines:
JavaScript / TypeScript:
JavaScript numbers are 64-bit floating point numbers, which means that integers cannot be represented with sufficient precision beyond 2⁵³.
Getting around this is a bit complicated, but possible. See this page for more info: Large Integers in JavaScript.
C#: Use a long
as opposed to an int
Other languages: Ensure that the integer type that is being used to de-serialize the modified
timestamps from RPDE feed pages has at least as much size and precision as a signed 64-bit integer.
*