openwpm.storage.cloud_storage.s3_storage module

class openwpm.storage.cloud_storage.s3_storage.S3StructuredProvider(bucket_name: str, base_path: str, sub_dir: str = 'visits', **kwargs: Any)[source]

Bases: ArrowProvider

This class allows you to upload Parquet files to S3.

S3StructuredProvider will by default store into base_path/visits/table_name in the given bucket. Pass a different sub_dir to change this.

**kwargs get passed on to S3FileSystem.__init__ Please look at https://s3fs.readthedocs.io/en/latest/api.html#s3fs.core.S3FileSystem for further information

file_system: S3FileSystem
async init() None[source]

Initializes the StorageProvider for use

Guaranteed to be called in the process the StorageController runs in.

async write_table(table_name: TableName, table: Table) None[source]

Write out the table to persistent storage

This should only return once it’s actually saved out

class openwpm.storage.cloud_storage.s3_storage.S3UnstructuredProvider(bucket_name: str, base_path: str, **kwargs: Any)[source]

Bases: UnstructuredStorageProvider

This class allows you to upload arbitrary bytes to S3. They will be stored under bucket_name/base_path/filename

**kwargs get passed on to S3FileSystem.__init__ Please look at https://s3fs.readthedocs.io/en/latest/api.html#s3fs.core.S3FileSystem for further information

file_name_cache: Set[str]

The set of all filenames ever uploaded, checked before uploading

file_system: S3FileSystem
async flush_cache() None[source]

Blockingly write out any cached data to the respective storage

async init() None[source]

Initializes the StorageProvider for use

Guaranteed to be called in the process the StorageController runs in.

async shutdown() None[source]

Close all open resources After this method has been called no further calls should be made to the object

async store_blob(filename: str, blob: bytes, overwrite: bool = False) None[source]

Stores the given bytes under the provided filename