Multi-Tab Data Extraction
Pline’s Multi-Tab Extraction feature in Browse & Capture mode allows you to capture data across multiple tabs of the same domain without triggering rate limits or anti-scraping measures.
Once a workflow is set up in one tab, it automatically applies to other tabs within the same domain. Select the relevant data fields in each new tab—there is no need to reconfigure the workflow.
Getting Started With Multi-Tab Data Extraction
Follow these steps to set up and use Multi-Tab Data Extraction in Pline:
Step 1: Create the Workflow
Navigate to the webpage where you want to start the data extraction.
Open Pline, click "Browse and Capture," and select your workflow.
Choose the appropriate page type, and select the required data fields.
Click "Next," enter a name for the workflow, and then click "Save Workflow."
Step 2: Run the Workflow
Click "Use workflow" to create a new dataset or append to an existing one.
The Pline logo will appear to show the workflow is active.
Preview the data and edit selectors if required.
Approve one record, navigate through other tabs within the same domain, and approve the remaining data.
You can also auto-approve by clicking the checkbox below.
Step 3: Stop the Workflow
Once data extraction is complete, click "Stop Collection"
A confirmation prompt will appear. Click "Stop."
Download the data as a CSV or gain full access to it in the Pline Portal.
Benefits of Multi-Tab Data Extraction
This powerful feature optimizes your extraction process while addressing common challenges in data scraping, including:
Real-time Validation: Validate data as you extract it.
Lower Risk of Blocking: Avoid being flagged or blocked by websites.
Higher Data Quality: Extract only the most relevant information.
Greater Flexibility: Adapt to changes on the fly.
Tips for Efficient Multi-Tab Extraction
Enable the auto-approve feature to approve data from all tabs automatically.
Ensure the data selectors you configure in one tab are consistent and logical across all tabs.
Limit the number of tabs you use for better performance and to minimize the risk of timeouts or slowdowns.
Last updated