openwpm.storage.cloud_storage.gcp_storage module¶
- class openwpm.storage.cloud_storage.gcp_storage.GcsStructuredProvider(project: str, bucket_name: str, base_path: str, token: str | None = None, sub_dir: str = 'visits')[source]¶
Bases:
ArrowProvider
This class allows you to upload Parquet files to GCS. This might not actually be the thing that we want to do long term but seeing as GCS is the S3 equivalent of GCP it is the easiest way forward.
Inspired by the old S3Aggregator structure the GcsStructuredProvider will by default store into base_path/visits/table_name in the given bucket.
Pass a different sub_dir to change this.
- file_system: GCSFileSystem¶
- async init() None [source]¶
Initializes the StorageProvider for use
Guaranteed to be called in the process the StorageController runs in.
- class openwpm.storage.cloud_storage.gcp_storage.GcsUnstructuredProvider(project: str, bucket_name: str, base_path: str, token: str | None = None)[source]¶
Bases:
UnstructuredStorageProvider
This class allows you to upload arbitrary bytes to GCS. They will be stored under bucket_name/base_path/filename
- file_name_cache: Set[str]¶
The set of all filenames ever uploaded, checked before uploading
- file_system: GCSFileSystem¶
- async init() None [source]¶
Initializes the StorageProvider for use
Guaranteed to be called in the process the StorageController runs in.