openwpm.commands.browser_commands module

class openwpm.commands.browser_commands.BrowseCommand(url, num_links, sleep)[source]

Bases: BaseCommand

execute(webdriver, browser_params, manager_params, extension_socket)[source]

Calls get_website before visiting <num_links> present on the page.

Note: the site_url in the site_visits table for the links visited will be the site_url of the original page and NOT the url of the links visited.

class openwpm.commands.browser_commands.DumpPageSourceCommand(suffix)[source]

Bases: BaseCommand

execute(webdriver, browser_params, manager_params, extension_socket)[source]

This method gets called in the Browser process

Parameters:
  • webdriver – WebDriver is a Selenium class used to control browser. You can simulate arbitrary interactions and extract almost all browser state with the tools that Selenium gives you

  • browser_params – Contains the per browser configuration E.g. which instruments are enabled

  • manager_params – Per crawl parameters E.g. where to store files

  • extension_socket

    Communication channel to the storage provider

    TODO: Further document this once the StorageProvider PR has landed This allows you to send data to be persisted to storage.

class openwpm.commands.browser_commands.FinalizeCommand(sleep)[source]

Bases: BaseCommand

This command is automatically appended to the end of a CommandSequence

It’s apperance means there won’t be any more commands for this visit_id

execute(webdriver, browser_params, manager_params, extension_socket)[source]

Informs the extension that a visit is done

class openwpm.commands.browser_commands.GetCommand(url, sleep)[source]

Bases: BaseCommand

goes to <url> using the given <webdriver> instance

execute(webdriver: WebDriver, browser_params: BrowserParams, manager_params: ManagerParams, extension_socket: ClientSocket) None[source]

This method gets called in the Browser process

Parameters:
  • webdriver – WebDriver is a Selenium class used to control browser. You can simulate arbitrary interactions and extract almost all browser state with the tools that Selenium gives you

  • browser_params – Contains the per browser configuration E.g. which instruments are enabled

  • manager_params – Per crawl parameters E.g. where to store files

  • extension_socket

    Communication channel to the storage provider

    TODO: Further document this once the StorageProvider PR has landed This allows you to send data to be persisted to storage.

class openwpm.commands.browser_commands.InitializeCommand[source]

Bases: BaseCommand

The command is automatically prepended to the beginning of a CommandSequence

It initializes state both in the extensions as well in as the StorageController

execute(webdriver, browser_params, manager_params, extension_socket)[source]

This method gets called in the Browser process

Parameters:
  • webdriver – WebDriver is a Selenium class used to control browser. You can simulate arbitrary interactions and extract almost all browser state with the tools that Selenium gives you

  • browser_params – Contains the per browser configuration E.g. which instruments are enabled

  • manager_params – Per crawl parameters E.g. where to store files

  • extension_socket

    Communication channel to the storage provider

    TODO: Further document this once the StorageProvider PR has landed This allows you to send data to be persisted to storage.

class openwpm.commands.browser_commands.RecursiveDumpPageSourceCommand(suffix)[source]

Bases: BaseCommand

execute(webdriver, browser_params, manager_params, extension_socket)[source]

Dump a compressed html tree for the current page visit

class openwpm.commands.browser_commands.SaveScreenshotCommand(suffix)[source]

Bases: BaseCommand

execute(webdriver, browser_params, manager_params, extension_socket)[source]

This method gets called in the Browser process

Parameters:
  • webdriver – WebDriver is a Selenium class used to control browser. You can simulate arbitrary interactions and extract almost all browser state with the tools that Selenium gives you

  • browser_params – Contains the per browser configuration E.g. which instruments are enabled

  • manager_params – Per crawl parameters E.g. where to store files

  • extension_socket

    Communication channel to the storage provider

    TODO: Further document this once the StorageProvider PR has landed This allows you to send data to be persisted to storage.

class openwpm.commands.browser_commands.ScreenshotFullPageCommand(suffix)[source]

Bases: BaseCommand

execute(webdriver, browser_params, manager_params, extension_socket)[source]

This method gets called in the Browser process

Parameters:
  • webdriver – WebDriver is a Selenium class used to control browser. You can simulate arbitrary interactions and extract almost all browser state with the tools that Selenium gives you

  • browser_params – Contains the per browser configuration E.g. which instruments are enabled

  • manager_params – Per crawl parameters E.g. where to store files

  • extension_socket

    Communication channel to the storage provider

    TODO: Further document this once the StorageProvider PR has landed This allows you to send data to be persisted to storage.

openwpm.commands.browser_commands.bot_mitigation(webdriver)[source]

Performs three optional commands for bot-detection mitigation when getting a site

openwpm.commands.browser_commands.close_other_windows(webdriver)[source]

close all open pop-up windows and tabs other than the current one

openwpm.commands.browser_commands.tab_restart_browser(webdriver)[source]

kills the current tab and creates a new one to stop traffic