OutWit Hub Pro - Extracting Tables (2)

Application walkthrough: using base extractors

userSpace.setWizardPrefs = function setWizardPrefs(){ wizardKit.setWizardPref("browse.tempo.min", "2000"); wizardKit.setWizardPref("browse.tempo.max", "3500"); wizardKit.setWizardPref("images.ondemandonly", false); // witscript.setPreference("page.ignorePlugins", false); wizardKit.setWizardPref("page.ignoreImages", false); wizardKit.setWizardPref("tableMinRows", "1"); //alert(witscript.getPreference("DOMSourceDontWarn")); wizardKit.setWizardPref("DOMSourceDontWarn", true); //alert(witscript.getPreference("DOMSourceDontWarn")); }

Extracting One Table
Among Several in a Web Page

This tutorial will take you through the process of extracting data from a particular table in a page.

While this window is showing instructions, the user interface of OutWit Hub remains operational.

You can still interact normally with the application and you can move this tutorial window around on the screen to better see the parts of the interface that you want.

userSpace.waitOK = witscript.version("4") || !/Firefox\/2\d\./.test(navigator.userAgent); userSpace.eyeCatcherOK = true; if (/Firefox\/[23]\./.test(navigator.userAgent)){ alert("OutWit wizards cannot run on your version of Firefox. Please update to the current version and try again."); wizard.close(); } else if (!("witscript" in window) || !witscript.version || !witscript.version("3.0.1.3")){ alert("This wizard is not compatible with your version of the OutWit Kernel. Please download the latest version (3.0.1.3 or higher)"); wizard.close(); } if(witscript.version("3")){$(".owui-wizard-homelink").html("Hub Tutorials")}; wizardKit.hideCatch(); wizardKit.hideLog(); witscript.views.page.load("http://en.wikipedia.org/wiki/Major_League_Baseball"); //userSpace.storeOriginalPrefs(); userSpace.setWizardPrefs(); witscript.logPanel.setAttribute("height",0);

wizardKit.say(this.parentNode); witscript.views.page.display();

wizardKit.say(this.parentNode); wizardKit.hideCatch(); wizardKit.hideLog(); if(!(/wiki\/Major_League_Baseball/.test(witscript.toolbar.urlBar.getValue()))){ witscript.views.page.load("http://en.wikipedia.org/wiki/Major_League_Baseball"); } witscript.views.page.display(); page.findBar.textbox.setValue("Rafael Palmeiro"); witscript.menutree.focus(); if(userSpace.waitOK) { witscript.wait(300); } page.findBar.textbox.setValue("");

MLB Top Teams

If you scroll down this wikipedia page, you will find several HTML tables.

wizardKit.say(this.parentNode); witscript.views.page.display(); page.findBar.textbox.setValue("World Series Records"); witscript.menutree.focus(); witscript.wait(300); page.findBar.textbox.setValue(""); witscript.views.tables.bottomPanel.selectIf.textBox.setValue("");

We want to extract the table of World Series Records.

wizardKit.say(this.parentNode); witscript.views.tables.display(); wizardKit.hideCatch(); wizardKit.hideLog(); if(!(/wiki\/Major_League_Baseball/.test(witscript.toolbar.urlBar.getValue()))){ witscript.views.page.load("http://en.wikipedia.org/wiki/Major_League_Baseball"); } witscript.views.tables.exportPreview.setAttribute("width",100) witscript.views.tables.previewSplitter.collapseAfter() witscript.views.tables.exportPreview.exportType.setValue("excel") witscript.views.tables.display(); witscript.menutree.focus();

Here is the data contained in the page's HTML tables

When selecting the 'tables' view in the left side panel, the program displays all the HTML table content in the view's datasheet.

wizardKit.say(this.parentNode); witscript.menutree.focus(); if(views.tables.datasheet.getRowCount() < 100) { witscript.views.tables.bottomPanel.reapplyButton.click(); witscript.wait(500); } witscript.views.tables.bottomPanel.selectIf.textBox.setValue("World Series Records"); witscript.views.tables.focus(); witscript.wait(1000); witscript.views.tables.datasheet.clickCell(views.tables.datasheet.getSelectedRowIndexes()[0],3,"right"); witscript.wait(300); witscript.views.tables.datasheet.contextMenu.selectMenu.selectBlock.click();

You just need to select a row within the table we are interested in, right-click on it and choose "Select Block".

The program will select the whole block of data corresponding to the desired table.

Then, to delete the rows you do not want, right-click on the selection and choose "Delete Unselected".

// wizardKit.say(this.parentNode); // witscript.menutree.focus(); // if(views.tables.datasheet.getRowCount() < 100) { // views.tables.bottomPanel.reapplyButton.click(); // witscript.wait(500); // }

wizardKit.say(this.parentNode); witscript.menutree.focus(); witscript.views.tables.focus(); if(views.tables.datasheet.getRowCount() > 100){ witscript.views.tables.datasheet.contextMenu.deleteMenu.deleteUnselected.click(); } witscript.wait(700); if(views.tables.datasheet.getSelectedRowIndexes()[0] == 0 &&views.tables.datasheet.getCell(0,3) == "World Series Records") { //alert([1,views.tables.datasheet.getCell(0,3)]); witscript.views.tables.datasheet.select(0,3); witscript.wait(500); witscript.views.tables.datasheet.contextMenu.selectMenu.click(); }

In the remaining data, the top line is a title we do not want to keep, neither do we need the footnotes, at the bottom. We can simply delete these with the delete key.

wizardKit.say(this.parentNode); witscript.menutree.focus(); witscript.views.tables.bottomPanel.selectIf.textBox.setValue("World Series Records"); for(var i;i<views.tables.datasheet.getRowCount();i++){ if(views.tables.datasheet.getCell(i,3)) views.tables.datasheet.select(i); } witscript.views.tables.focus(); if(views.tables.datasheet.getSelectedRowIndexes()[0] == 0 &&views.tables.datasheet.getCell(0,4) == "World Series Records") { // XXXX JC: Doesn't work. (?) //views.tables.datasheet.contextMenu.deleteMenu.delete.click(); witscript.views.tables.datasheet.deleteSelectedRows(); }

The data is now ready to be exported.

wizardKit.say(this.parentNode); witscript.views.tables.previewSplitter.uncollapse() witscript.views.tables.exportPreview.exportType.setValue("excel") witscript.views.tables.exportPreview.setAttribute("width",400) wizardKit.resize(views.tables.exportPreview, "width", 15, 2,100,true); //wizardKit.eyeCatcher(views.links.exportPreview.layoutRenderer,1,1,0,0); witscript.wait(2000); witscript.views.tables.exportPreview.exportType.setValue("csv") //wizardKit.eyeCatcher(views.links.exportPreview.layoutRenderer,1,1,0,0); witscript.wait(1500); witscript.views.tables.exportPreview.exportType.setValue("html")

The export preview panel displays the extracted data as it will be exported in the format that you select in the top left menu.

Try saving an export file on your hard disk in the format you prefer.

//alert(userSpace.WTI); wizardKit.say(this.parentNode); // wizardKit.restoreOriginalPrefs(); // XXX JC: This should not be here. Move to the close button (or event) witscript.menutree.focus(); $(".owui-wizard-homelink").attr("style","color: #DFFFF9 !important; float:left;").html("More Tutorials");

Now try on your own pages

You can now grab virtually any tables from Web pages. With these functions and the many others you will find in the help center, you can feed excel spreadsheets, databases or websites with readily usable data.

We will publish other tutorials to lead you through the main features of OutWit Hub. Stay tuned.

This is an OutWit Tutorial file.

Here is the data contained in the page's HTML tables