

Turn on “Workflow” button to preview workflow you design.Ģ. Now in Octoparse 7.X, task name is automatically generated right on the top of configuration interface. Select "Extract Data" to complete the text extraction action.ġ. You can edit the field names by clicking or leave till later. Notice the data you clicked on is now showing on the Action Panel. When the data is selected properly, the selection will be highlighted in green.Ĭlick on the title of the post, the posted date or the content to capture. Now, start to capture the data you need by clicking directly on the various pieces of information. Then click on "Save URL" and Octoparse will open the web page in the built-in browser. Suppose our goal is to extract the blog information from the page.Ĭopy and paste the URL in "Extraction URL" textbox. Here we take one of our blog posts as an example. It allows you to customize individual action needed to perform the extraction including keywords searching, login authentication, opening dropdowns, etc. Scraping tasks would tell Octoparse: which website to open and what data to crawl, etc.Īdvanced Mode is an incredibly powerful mode offering extended flexibility to accommodate scraping all different kinds of websites. Crawlers in Octoparse are determined by the scraping tasks configured. Then enter one or more URLs.Ī task means a crawler for scraping data from usually one website with unlimited Page/URL inquiries. Once you're logged in, click "+ Task " button of Advanced Mode to create a new task. 1) Start a new task and enter the URL of target web page
