When a Web page includes one or several HTML list -like in the example below- use lists view, in the data section --refresh if the data doesn't appear--, to extract the data and move it to 'the Catch' or export it in MS Excel, HTML, SQL, CSV formats.
(Please scroll down for a more advanced mode of extraction.)
TIP: In the 'lists' view, like in most other views, right-clicking on selected rows gives you access to a wealth of features to edit and clean the data.
Largest metropolitan areas by continent
|
||
In many cases automatic data extraction methods ( tables, lists, guess) will be enough, and you will manage to extract and export the data in just a few clicks. If, however, the page is too complex, or if your needs are more specifc, there is a way to extract data manually, by creating a scraper. Scrapers will be saved to your personal database and you will be able to re-apply them on the same URL or on other URLs starting, for instance, with the same domain name. A scraper can even be applied to whole lists of URLs. You can also export your scrapers and share them with other users.
In our present example, if the data, as extracted in the list widget, is not structured enough for your needs, you will have to create a specific scraper for this page. The Scraper Editor is rather easy to use. Go to the scrapers view and you will see the colorized HTML source of the page:
The text in black is what is actually displayed in the page. This colorization makes it very easy to
identify the data you are interested in. Building a scraper is simply telling the program what comes
immediately before and/or after the data you want to extract and/or what its format is.
If it is your very first scraper, you are directly in the scraper editor, otherwise you are in the
scraper manager and see the list of your other scrapers.
In the latter case, hit the "New" button and type in a name for your new scraper. Once in the scraper
editor, just fill the description and marker cells (double-click on a cell to edit it). Your first
version should look like this:
Hit Execute, and... that's it! You are running your first scraper.
Brava! or Bravo! For indeed you did it:
You just need to go to the scraped view, and here is your result:
OK, the present example is not all that exciting and the figures are already out of date. It would almost be faster to do the 15 rows manually. But, what if the data filled 20 pages and if we updated the population figures tomorrow? Better: what if the data was changing every morning, like job ads, sport results or stock market indices?... No problem; you would simply re-apply your new scraper.