openwpm.storage.in_memory_storage module¶
This module contains implementations for various kinds of storage providers that store their results in memory. These classes are designed to allow for easier parallel testing as there are no shared resources between tests. It also makes it easier to verify results by not having to do a round trip through a persistent storage provider
- class openwpm.storage.in_memory_storage.MemoryArrowProvider[source]¶
Bases:
ArrowProvider
- class openwpm.storage.in_memory_storage.MemoryProviderHandle(queue: Queue)[source]¶
Bases:
object
Call poll_queue to load all available data into the dict at self.storage
- class openwpm.storage.in_memory_storage.MemoryStructuredProvider[source]¶
Bases:
StructuredStorageProvider
This storage provider passes all it’s data to the MemoryStructuredProviderHandle in a process safe way.
This makes it ideal for testing
It also aims to only save out data as late as possible to ensure that storage_controller only relies on the guarantees given in the interface.
- cache1: DefaultDict[VisitId, DefaultDict[TableName, List[Dict[str, Any]]]]¶
The cache for entries before they are finalized
- cache2: DefaultDict[TableName, List[Dict[str, Any]]]¶
For all entries that have been finalized but not yet flushed out to the queue
- async finalize_visit_id(visit_id: VisitId, interrupted: bool = False) Task[None] [source]¶
This method is invoked to inform the StructuredStorageProvider that no more records for this visit_id will be submitted
This method returns once the data is ready to be written out. If the data is immediately written out nothing will be returned. Otherwise an awaitable will returned that resolve onces the records have been saved out to persistent storage
- async init() None [source]¶
Initializes the StorageProvider for use
Guaranteed to be called in the process the StorageController runs in.
- lock: Lock¶
- class openwpm.storage.in_memory_storage.MemoryUnstructuredProvider[source]¶
Bases:
UnstructuredStorageProvider
This storage provider stores all data in memory under self.storage as a dict from filename to content. Use this provider for writing tests and for small crawls where no persistence is required
- async init() None [source]¶
Initializes the StorageProvider for use
Guaranteed to be called in the process the StorageController runs in.