openwpm.storage.cloud_storage.gcp_storage module

class openwpm.storage.cloud_storage.gcp_storage.GcsStructuredProvider(project: str, bucket_name: str, base_path: str, token: str | None = None, sub_dir: str = 'visits')[source]

Bases: ArrowProvider

This class allows you to upload Parquet files to GCS. This might not actually be the thing that we want to do long term but seeing as GCS is the S3 equivalent of GCP it is the easiest way forward.

Inspired by the old S3Aggregator structure the GcsStructuredProvider will by default store into base_path/visits/table_name in the given bucket.

Pass a different sub_dir to change this.

file_system: GCSFileSystem
async init() None[source]

Initializes the StorageProvider for use

Guaranteed to be called in the process the StorageController runs in.

async shutdown() None[source]

Close all open resources After this method has been called no further calls should be made to the object

async write_table(table_name: TableName, table: Table) None[source]

Write out the table to persistent storage

This should only return once it’s actually saved out

class openwpm.storage.cloud_storage.gcp_storage.GcsUnstructuredProvider(project: str, bucket_name: str, base_path: str, token: str | None = None)[source]

Bases: UnstructuredStorageProvider

This class allows you to upload arbitrary bytes to GCS. They will be stored under bucket_name/base_path/filename

file_name_cache: Set[str]

The set of all filenames ever uploaded, checked before uploading

file_system: GCSFileSystem
async flush_cache() None[source]

Blockingly write out any cached data to the respective storage

async init() None[source]

Initializes the StorageProvider for use

Guaranteed to be called in the process the StorageController runs in.

async shutdown() None[source]

Close all open resources After this method has been called no further calls should be made to the object

async store_blob(filename: str, blob: bytes, overwrite: bool = False) None[source]

Stores the given bytes under the provided filename