openwpm.storage.arrow_storage module¶
- class openwpm.storage.arrow_storage.ArrowProvider[source]¶
Bases:
StructuredStorageProvider
This class implements a StructuredStorage provider that serializes records into the arrow format
- async finalize_visit_id(visit_id: VisitId, interrupted: bool = False) Task[None] [source]¶
This method is the reason the finalize_visit_id interface returns a task. This was necessary as we needed to enable the following pattern.
token = await structured_storage.finalize_visit_id(1) structured_storage.flush_cache() await token
If there was no task returned and the method would just block/yield after turning the record into a batch, there would be no way to know, when it’s safe to flush_cache as I couldn’t find a way to run a coroutine until it yields and then run a different one.
With the current setup token aka a wait_on_condition coroutine will only return once it’s respective event has been set.
- async flush_cache(lock: Lock | None = None) None [source]¶
We need to hack around the fact that asyncio has no reentrant lock So we either grab the storing_lock ourselves or the caller needs to pass us the locked storing_lock
- async init() None [source]¶
Initializes the StorageProvider for use
Guaranteed to be called in the process the StorageController runs in.
- async shutdown() None [source]¶
Close all open resources After this method has been called no further calls should be made to the object
- async store_record(table: TableName, visit_id: VisitId, record: Dict[str, Any]) None [source]¶
Submit a record to be stored The storing might not happen immediately
- storing_lock: Lock¶