openwpm.storage.arrow_storage module

class openwpm.storage.arrow_storage.ArrowProvider[source]

Bases: StructuredStorageProvider

This class implements a StructuredStorage provider that serializes records into the arrow format

async finalize_visit_id(visit_id: VisitId, interrupted: bool = False) Task[None][source]

This method is the reason the finalize_visit_id interface returns a task. This was necessary as we needed to enable the following pattern.

token = await structured_storage.finalize_visit_id(1)
structured_storage.flush_cache()
await token

If there was no task returned and the method would just block/yield after turning the record into a batch, there would be no way to know, when it’s safe to flush_cache as I couldn’t find a way to run a coroutine until it yields and then run a different one.

With the current setup token aka a wait_on_condition coroutine will only return once it’s respective event has been set.

async flush_cache(lock: Lock | None = None) None[source]

We need to hack around the fact that asyncio has no reentrant lock So we either grab the storing_lock ourselves or the caller needs to pass us the locked storing_lock

async init() None[source]

Initializes the StorageProvider for use

Guaranteed to be called in the process the StorageController runs in.

async shutdown() None[source]

Close all open resources After this method has been called no further calls should be made to the object

async store_record(table: TableName, visit_id: VisitId, record: Dict[str, Any]) None[source]

Submit a record to be stored The storing might not happen immediately

storing_lock: Lock
abstract async write_table(table_name: TableName, table: Table) None[source]

Write out the table to persistent storage

This should only return once it’s actually saved out