Inner Page Data Extraction
Pline's inner page data extraction feature extracts and gathers detailed information from grouped listing pages at go.
Last updated
Pline's inner page data extraction feature extracts and gathers detailed information from grouped listing pages at go.
Last updated
Inner Page Extraction builds on Pline's Automated Data Extraction feature, enabling you to collect data from the listing page and detailed pages simultaneously without having to manually open each detailed page.
This guide builds upon the steps in the How to Build an Automation Workflow section.
Launch the Pline extension and open the Automated Data Extraction mode.
Identify and group similar data fields you want to extract, such as product names or prices.
Select a "field" containing the link to the detailed or inner page and extract it as a "Link."
In our example, clicking on shoe titles provides an inner product details page, so they should be selected as links. Other field data types can be selected as required.
This is crucial to ensure successful navigation to the correct detailed page.
Select the appropriate pagination type for your data source (Click Next for our example)
Pline will automatically display all available links for extraction. Select the correct link, and Pline will navigate to the corresponding inner page.
Capture additional data from the inner pages:
Choose the field name (e.g., "Color", "Rating").
Set the Data Field Type (e.g., Text for attributes like color or rating).
Click Save after selecting each field.
The Pline extension panel gives an overview of your automated workflow. Click "View sample data" to preview the data fields to be extracted using the workflow.
Add a workflow name and click "Save Workflow" to configure the workflow that will extract data from both the listing and detail pages simultaneously.
Then, click "Use workflow now".
To execute a workflow and store all extracted records, you will need to create a dataset.
Click "Create Dataset" to initiate the data extraction process.
Once the data extraction process is completed, you can access your dataset in the Pline Platform dashboard or download it as a CSV or JSON file for analysis.
Maximize the effectiveness of inner page extraction with these best practices :
Ensure the target website remains open until the data extraction process is complete.
Track status: "Records Collected" processed, not saved; "Records Saved" confirms secure storage.
Group similar data fields to optimize data selection.
Confirm the correct pagination type is selected to capture all necessary data.
Regularly monitor the target website for any structural changes.