Harvesting opportunity data
Libraries
Although the logic to harvest and use OpenActive data is straightforward, there are several libraries that provide helpers to aid the consumption of opportunity data feeds.
The table below lists these libraries:
Language | Dataset Discovery | Harvesting Feeds |
---|---|---|
JavaScript / TypeScript | N/A | |
Python | Coming soon | Coming soon |
Ruby |
* Note that the Ruby library requires updating before it can be used for dataset discovery
Dataset discovery
As described in the data catalogue processing guidance, OpenActive datasets can be discovered automatically by "spidering" links within the canonical OpenActive Data Catalog Collection JSON-LD file.
Harvesting feeds
Considerations
Combining feed pairs
Ensure that updates and deletes from both parent and child feeds are considered (such as SessionSeries/ScheduledSession or FacilityUse/Slot - see Types of RPDE feed for more information)
Harvest frequency
To ensure your resources are not wasted, especially as you scale feed consumption, ensure that sleep and live modes are respected (i.e. wait 8 seconds if there are no items in the feed before making the next request). Due to caching, more frequent requests will simply hit a CDN and return the same response, so there is no advantage in polling faster than this.
De-serializing RPDE Feeds:
Ensure that the data type used to de-serialize the
modified
timestamp can support signed 64-bit integers. More info here.
Resyncs
Consuming an RPDE feed from the beginning is termed a "resync".
RPDE feeds are not designed to be resynced frequently.
The feed consumer must continue to consume updates from the end of the feed to ensure the data stays up-to-date, rather than downloading all data from each update.
Resyncing any individual feed more than once each week is not recommended, as it increases the load on the open data publisher's servers, which will likely result in a high number of 429 responses and could cause your IP address to be blacklisted.
Common Pitfalls
Storing RPDE `modified` with less than 64-bit integers
A common approach to creating modified
values for an RPDE feed is to use SQL Server's timestamp
/rowversion
data types. This approach is suggested in the RPDE specification.
This data type has been seen to generate integers up to values of 2⁶⁰.
Therefore, it is recommend to implementers to use data types that can store at least a signed 64-bit integer with precision.
Language specific guidelines:
JavaScript / TypeScript:
JavaScript numbers are 64-bit floating point numbers, which means that integers cannot be represented with sufficient precision beyond 2⁵³.
Getting around this is a bit complicated, but possible. See this page for more info: Large Integers in JavaScript.
C#: Use a
long
as opposed to anint
Other languages: Ensure that the integer type that is being used to de-serialize the
modified
timestamps from RPDE feed pages has at least as much size and precision as a signed 64-bit integer.
Last updated