openwpm.command_sequence module

class openwpm.command_sequence.CommandSequence(url: str, reset: bool = False, blocking: bool = False, retry_number: int | None = None, site_rank: int | None = None, callback: Callable[[bool], None] | None = None)[source]

Bases: object

A CommandSequence wraps a series of commands to be performed on a visit to one top-level site into one logical “site visit,” keyed by a visit id. An example of a CommandSequence that visits a page and saves a screenshot of it would be:

sequence = CommandSequence(url) sequence.get() sequence.save_screenshot() task_manager.execute_command_sequence(sequence)

CommandSequence guarantees that a series of commands will be performed by a single browser instance.

append_command(command: BaseCommand, timeout: int = 30) None[source]
browse(num_links=2, sleep=0, timeout=60)[source]

browse a website and visit <num_links> links on the page

dump_page_source(suffix='', timeout=30)[source]

Dumps rendered source of current page to ‘sources’ directory.

dump_profile(tar_path: Path, close_webdriver: bool = False, compress: bool = True, timeout: int = 120) None[source]

dumps from the profile path to a given file (absolute path)

get(sleep=0, timeout=60)[source]

goes to a url

get_commands_with_timeout() List[Tuple[BaseCommand, int]][source]

Returns a list of all commands in the command_sequence appended by a finalize command

mark_done(success: bool) None[source]
recursive_dump_page_source(suffix='', timeout=30)[source]

Dumps rendered source of current page visit to ‘sources’ dir. Unlike dump_page_source, this includes iframe sources. Archive is stored in manager_params.source_dump_path and is keyed by the current visit_id and top-level url. The source dump is a gzipped json file with the following structure:

 1{
 2    "document_url": "http://example.com",
 3    "source": "<html> ... </html>",
 4    "iframes": {
 5        "frame_1": {"document_url": "...",
 6                    "source": "...",
 7                    "iframes": "{ ... }"},
 8        "frame_2": {"document_url": "...",
 9                    "source": "...",
10                    "iframes": "{ ... }"},
11        "frame_3": "{ ... }"
12    }
13}
save_screenshot(suffix='', timeout=30)[source]

Save a screenshot of the current viewport.

screenshot_full_page(suffix='', timeout=30)[source]

Save a screenshot of the entire page.

NOTE: geckodriver v0.15 only supports viewport screenshots. To screenshot the entire page we scroll the page using javascript and take a viewport screenshot at each location. This method will save the parts and a stitched version in the screenshot_path. We only scroll vertically, so pages that are wider than the viewport will be clipped. See: https://github.com/mozilla/geckodriver/issues/570

The screenshot produced will only include the area originally loaded at the start of the command. Sites which dynamically expand as the page is scrolled (i.e. infinite scroll) will only go as far as the original height.

NOTE: In geckodriver v0.15 doing any scrolling (or having devtools open) seems to break element-only screenshots. So using this command will cause any future element-only screenshots to be mis-aligned