openwpm.command_sequence module¶
- class openwpm.command_sequence.CommandSequence(url: str, reset: bool = False, blocking: bool = False, retry_number: int | None = None, site_rank: int | None = None, callback: Callable[[bool], None] | None = None)[source]¶
Bases:
object
A CommandSequence wraps a series of commands to be performed on a visit to one top-level site into one logical “site visit,” keyed by a visit id. An example of a CommandSequence that visits a page and saves a screenshot of it would be:
sequence = CommandSequence(url) sequence.get() sequence.save_screenshot() task_manager.execute_command_sequence(sequence)
CommandSequence guarantees that a series of commands will be performed by a single browser instance.
- append_command(command: BaseCommand, timeout: int = 30) None [source]¶
- browse(num_links=2, sleep=0, timeout=60)[source]¶
browse a website and visit <num_links> links on the page
- dump_page_source(suffix='', timeout=30)[source]¶
Dumps rendered source of current page to ‘sources’ directory.
- dump_profile(tar_path: Path, close_webdriver: bool = False, compress: bool = True, timeout: int = 120) None [source]¶
dumps from the profile path to a given file (absolute path)
- get_commands_with_timeout() List[Tuple[BaseCommand, int]] [source]¶
Returns a list of all commands in the command_sequence appended by a finalize command
- recursive_dump_page_source(suffix='', timeout=30)[source]¶
Dumps rendered source of current page visit to ‘sources’ dir. Unlike dump_page_source, this includes iframe sources. Archive is stored in manager_params.source_dump_path and is keyed by the current visit_id and top-level url. The source dump is a gzipped json file with the following structure:
1{ 2 "document_url": "http://example.com", 3 "source": "<html> ... </html>", 4 "iframes": { 5 "frame_1": {"document_url": "...", 6 "source": "...", 7 "iframes": "{ ... }"}, 8 "frame_2": {"document_url": "...", 9 "source": "...", 10 "iframes": "{ ... }"}, 11 "frame_3": "{ ... }" 12 } 13}
- screenshot_full_page(suffix='', timeout=30)[source]¶
Save a screenshot of the entire page.
NOTE: geckodriver v0.15 only supports viewport screenshots. To screenshot the entire page we scroll the page using javascript and take a viewport screenshot at each location. This method will save the parts and a stitched version in the screenshot_path. We only scroll vertically, so pages that are wider than the viewport will be clipped. See: https://github.com/mozilla/geckodriver/issues/570
The screenshot produced will only include the area originally loaded at the start of the command. Sites which dynamically expand as the page is scrolled (i.e. infinite scroll) will only go as far as the original height.
NOTE: In geckodriver v0.15 doing any scrolling (or having devtools open) seems to break element-only screenshots. So using this command will cause any future element-only screenshots to be mis-aligned