OutWit Hub - Version History

Version 9.0.0.8 - release 9.0
- Feature - Data refining tools: #decode()# several algorithms were added to this function to decode an obfuscated or encrypted string extracted by the scraper line into plain text.
- Feature - Data refining tools: Compute: allow to perform basic operations on selected numerical cells of a colum.
- Feature - Data refining tools: Delete columns > to the right: deletes the selected column and all additional columns to its right.
- Feature - Export: It is now possible to export each scraped row to a separate file.
- Feature - Scrapers: #clickOnNodes# instructs the scraper to click on page elements matching a css selector.
- Feature - Scrapers: #decode()# several algorithms were added to this function to decode an obfuscated or encrypted string extracted by the scraper line into plain text.
- Feature - Scrapers: #EARLIEST# and #LATEST# allow to return the first/last date matching the scraper line.
- Feature - Scrapers: #enableNodes# and #disableNodes# allow to directly change the state of page elements matching a css selector.
- Feature - Scrapers: #ifURLContains#, #ifURLDoesNotContain# allows to execute a scraper line or not depending on the URL being scraped.
- Feature - Scrapers: #ignoreIfField# instructs the scraper to ignore this page or record if a field has a certain value.
- Feature - Scrapers: #lowerCase()#, #upperCase()#, #properCase()#, #sentenceCase()# alters the case of the string extracted by a scraper line.
- Feature - Scrapers: #PAGESTATUS# replacement function returns info on the current page (errors, title...).
- Feature - Scrapers: #pressKey# allows the scraper to simulate a key press in certain cases.
- Feature - Scrapers: #select# adds elements matching a css selector to the selection in the current page.
- Feature - Scrapers: #setValue# now also allows to check radio buttons, checkboxes, etc.
- Feature - Scrapers: A ^ suffix in the description (myURLFieldName^) in a scraper line destined to extract a URL, only returns the "top" url in the hierarchy (out of example.com/products/shoes and example.com/products/, only the latter is returned.)
- Feature - Scrapers: Multiple required fields (descriptions ending with "!") can now be interpreted as AND or OR conditions.
- Enhancement - contact recognition & filtering: Implementation of job title recognition (Debug stage. mostly English for now). Better elimination of example/bogus email addresses and phone numbers. Enhancements in name, company, address and copyright fields recognition. Enhancements throughout the program in first/last names split and in physical address split. Better handling of obfuscated email addresses.
- Enhancement - Data refining tools: Clean up > Normalize All Figure: enhanced and optimized.
- Enhancement - Enhancement of date recognition, including dates without a year.
- Enhancement - Export: performance enhancements in export functions.
- Enhancement - The application window is resized at launch if it exceeds the dimensions of the screen.
- Enhancement - Updated the list of User Agents in the advanced preferences.
- Fix - Fixed source colorizing problems in the case of line breaks inside HTML tags.
- Fix - Multiple fixes and enhancements in String Generation functions.
- Fixes - Many more enhancements and fixes throughout the code.

Version 8.0.0.57
- Feature - The Search Engine Query Builder (Tools menu and toolbar button) allows you to create complex queries for the main search engines.
- Feature - alt-shift-click on a page in the browser forces the reapplication of the current extractor. A useful alternative to changing the preferences in some AJAX pages where small alterations to the rendered page do not trigger the extraction.
- Enhancement - Updated list of pre-defined User Agents in the preferences with recent devices and browsers.

Version 8.0.0.46 - release 8.0
- Feature - #emptyDirectory# Empties the first directory of the queries view matching the passed name.
- Feature - #splitField# Splits the passed field as a post-process, using the values in the separator and labels columns. (Can allow consecutive splits.)
- Feature - #decodeEntities# Decodes HTML entities (like & or >) to their plain text equivalent.
- Feature - #decodeURL# Decodes URL encoded characters (like %20) to their plain text equivalent.
- Feature - #save# Saves the string extracted by the scraper line to a separate text file.
- Feature - #screenshot# Saves a screenshot of the current page into a file using the passed file name.
- Feature - #hideNodes# Makes the nodes matching the passed css selector invisible.
- Feature - #scrollBy# Scrolls the page loaded in the OutWit Hub browser by the passed number of pixels.
- Feature - #resetPrefOnStop# Reset the passed preference to its default value at the end of the scrape process.
- Feature - #uniqueField# Makes sure that no duplicate values are extracted for the specified field(s) during the same exploration. (An alternative to deduplication while scraping, in case volumes are too large to post-process it.)
- Feature - #setValue# Sets the value of the <select> or <input> HTML block matching the format column, to the value passed in the replace column.
- Feature - #restartEvery# Sets 'auto-explore on startup' flag to true and restarts the application, every n pages or seconds.
- Feature - #uncheckURLInQuery# Unchecks the 'OK' checkbox of the first line containing the current URL in the passed query directory.
- Feature - #uncheckItemInQuery# Unchecks the 'OK' checkbox of the first line containing the string extracted by the scraper line in the passed query directory.
- Enhancement - It is now possible to set the field name with a variable in the #default# directive.
- Feature - #readFromQueries# Reads the next active string from the passed query directory and stores its value in the passed variable, then unchecks the line in the query directory.
- Feature - #switchTo# Changes the current view to the value set in the replace column.
- Feature - #reapply# now accepts parameters for the number of applications and the delay between them.
- Feature - #adler32()# Used in the replacement column, allows you to generate a short hash from the string extracted by the scraper line. (This can be useful for deduplication although it is not 100% reliable as, even if it is unlikely, two different strings can result in the same hash.)
- Feature - #encodeBase64()#, #decodeBase64()# Converts the string extracted by the scraper line into a base64 encoded string or decodes it into plain text.
- Feature - #decode()# Decodes the string extracted by the scraper line into plain text, trying several algorithms.
- Feature - #unique()# Only returns the string extracted by the scraper line if the values is unique during the same exploration. (An alternative to deduplication while scraping, in case volumes are too large to post-process it.)
- Feature - #WEEK# was added to the time variables. Returns the week number in the year.
- Feature - #LAST-POST-QUERY# returns the last POST query send. #LAST-POST-QUERY#param# returns the value of the passed parameter in the last POST query sent.
- Feature - Several tools were added to the right-click menu on datasheets: Insert Index Column, Duplicate Column, Indexed Duplicate Column, Copy from Column..., Select if in...
- Feature - When scraping a self-updating AJAX page, the #reapply# directive now allows to do the extraction n times at the frequency you choose.
- Enhancement - Faster start preparation and end of process cleaning in large volume Fast Scrapes.
- Enhancement - The contact recognition module and its dictionary were enhanced, lax recognition and dummy email addresses elimination, improved.
- Enhancement - Improved dictionary of multilingual words, acronyms and roots, frequently used in company names addresses etc. to enhance recognition.
- Fixes - Many more enhancements and fixes throughout the code.

Version 7.0.0.56
- Feature - (Expert & Enterprise) Added 'Duplicate Column', 'Insert Index Column', etc., to the right-click menu on datasheets.
- Fixes - Many minor fixes and optimizations.
- Fix - there was a regression in 7.0.0.55 that could prevent correct scraping in Fast mode. This was fixed in 7.0.0.56

Version 7.0.0.36 - release 7.0
- Feature - #exportEvery#n#, #exportAndDeleteEvery#n#, #catchEvery#n#, #catchOnStop# Catches or exports the extracted data when you desire during the process.
- Feature - #abortAfter#, #abortAfterNPages#n#, #abortAfterNResults#n# Aborts the current extraction after a text is found or a certain number of pages or results have been reached.
- Feature - #decodeJSCharcodes#, #zapGremlins# to decode hexadecimal, remove unwanted control or invisible characters, correct badly encoded characters, etc.
- Feature - #clearForms#, #clearAllHistory#, #clearBrowsingHistory#, #clearCookie#, #clearCookieEvery#n#, #clearCookieIf#, #clearCookiesEvery#n#, #clearCookiesIf#, #clearCookiesIfNot# allow you to manage history and cookies from within the scraper.
- Feature - Use #autoEmpty#, #autoCatch#, #emptyOnDemand#, #deduplicate# to set the value of the scraped view options from a scraper.
- Feature - #keepForms#, #removeScripts#, #removeTags#, #allFrames#, #originalHTML#... allow you to determine exactly how you want the source before the scraper is applied.
- Feature - #replaceInField#fieldName# replaces value (litteral of RegExp) in a given field at the end of the process.
- Feature - #fieldGroup# Makes sure that the fields indexes in a same group are incremented together even if some of the fields are empty.
- Feature - #oneRow# Makes sure that all extracted data in the page will be presented as a single row in the datasheet.
- Feature - #allowCrossDomain# removes js restrictions, which is sometimes useful to simulate clicks and other interactions with the page.
- Feature - #rename#, #unzip# give you post-processing access to files that you have downloaded.
- Feature - The #match()# function, used in the replacement column allows you to search the other occurrences of a string (or matches of a RegExp) that you grab (or build) from the page itself. It allows very powerful conditional extractions.
- Feature - The syntax myFieldName<n, myFieldName>n, myFieldName= in the description column allows you to manage multiple results, duplicates etc..
- Feature - Used in the Enterprise edition, the #exportAndDeleteEvery#n# scraping directive can define an SQLite database as the destination (using a filename with the .sqlite extension), allowing you to process and store extremely large volumes of data.
- Feature - You can now save (and restore) the state of preferences in (from) a directory of the queries view.
- Features - With the #HEADER# keyword added to the POST query format you can add custom parameters to the header of the query. #CHARSET# defines the encoding, #TYPE#, the contentType and #REFERER#, the referrer.
- Features - Plus a large number of enhancements and features which are not listed here.
- Enhancement - Improved recognition and extraction of rss feeds, publication dates in more locales, addition of universal identifier (guid)...
- Enhancement - The contact recognition module was further enhanced, lax recognition and dummy email addresses elimination, improved.
- Enhancement - A large dictionary of multilingual words, acronyms and roots, frequently used in company names addresses etc. was added to enhance recognition.
- Editions - OutWit Hub still comes in three different editions (license levels): Pro, Expert and Enterprise but we now propose a streamlined version of the Hub that can do Web explorations and run scrapers but has no editing capacities. Don't hesitate to enquire about this on the customer support system.
- Fixes - Many enhancements and fixes throughout the code.

Version 6.0.0.72
- Feature - The text size control from the View menu now increases or decreases the size of the page text as well as the extracted data.
- Fixes - Many fixes, in particular on inline editing in sorted datasheets and managers and in the scrollToEnd function .

Version 6.0.0.51 - Release 6.0
- Editions - New Expert Edition: OutWit Hub now comes in three different editions: Pro, Expert and Enterprise. Expert is single user and contains all features that were reserved to the Enterprise edition until version 5.0. Enterprise now allows several users or instances to share common automators.
- Feature - (Expert & Enterprise editions) #suspend#n#, #suspendIf#n#, #suspendIfNot#n#: added a parameter to wait for n seconds before resuming when the OK button is clicked. (Useful to give the user time to interract with the page, solve a captcha, etc. ).
- Feature - (Expert & Enterprise editions) #firstName(string)#, #lastName(string)#, #firstLastName(string)#, #gender(string)#: tries to finds the most likely first, last, first & last name or the gender in the passed full name string.
- Feature - Pro users can now organize their automators (scrapers, macros, jobs, queries), grouping them by projects.
- Features - A large series of directives and functions was added to the pro version: #autoEmpty#, #autoCatch#, #emptyOnDemand#, #deduplicate#, #default#, #default#fieldName#, #pauseBefore#, #checkIfURL# and #checkIfNotURL#, #encodeURL()#, #SECOND# ... #FIFTH#, #LOCALIP#.
- Features - New Directives were added to Expert & Enterprise editions: #scope# (outside or within domain, all links or with a depth of 1 or 2), #deduplicateOnStop#criterionColumnName#, #deduplicateWithinPage#, #scrollToEnd#cssSelector#...
- Feature - (Expert & Enterprise editions) Added preference to create additional Gender column when using the Insert First/Last Name function in the right-click menu. The column contains the string defined in the preference (like "Dear Mr", "Dear Ms") when the gender is recognized and a fallback value (like "Dear Customer") otherwise.
- Feature - (Expert & Enterprise editions) The words view now includes a text box where to type or paste the words to count in the page.
- Feature - Added a prference to instruct the program to check the page header before loading it, in order to avoid errors and login dialogs that could block an automatic exploration.
- Enhancement - The Scroll to End directive was enhanced to work in more AJAX pages.
- Enhancement - The email recognition module now allows for diacritic characters, more dummy email addresses (user@example.com...) are eliminated, lax recognition (jackie at mysite dot com...) is much more efficient.
- Enhancement - The export module was refactored and optimized in v6.0, fixing bugs, enhancing data cleaning and performance and adding features like additional preference settings for SQL exports, VARCHAR(xxx)....

Version 5.0.1.57
- Feature - added #checkIfURL# and #checkIfNotURL# scraping directives for extraction conditions on the current URL.
- Fix - fixes in abortIf abortIfNot and abortAfter.

Version 5.0.1.42
- Feature - It is now possible to use a multiple character string as the CONCAT separator.
- Feature - Added preference to name the fields in the queries of SQL exports.
- Feature - Added #MaxColumns# directive to limit the number of columns in the extracted data.
- Fix - fixed stalling explorations in certain cases when the server did not answer.
- Fix - #REQUESTED-URL# works in more cases.
- Fixes - several fixes and optimizations in contact extractions on large lists of URLs.
- Enhancement - Enhancements and fixes in #suspendIf# and #formatDate()#.

Version 5.0.1.9
- Enhancement - Modified 'Zap Gremlins' preference for scrapers, so that it doesn't remove non-latin characters.
- Fix - fixed export problem on columns with a % in the header.
- Fix - made #showAlert# work even if it is the only line in scraper.
- Fixes - Various fixes in scrapers and macros.

Version 5.0.0.294
- Feature - Added a preference to set the list of fields in INSERT instructions for SQL exports.
- Feature - Added preference "Proceed If Page Contains / Does Not Contain".
- Feature - Now resolves generation patterns to first value when typed in address bar.
- Feature - Automatic corrections are now performed when pasting URLs into a directory of queries.
- Fix - Removed save dialog on command line execution with -url parameter and a MAU.
- Fix - Corrected header problem that could occur when editing a cell in table view.
- Fix - Corrected #showAlert# which did not execute when no other line in scraper.
- Enhancements - Enhancements and fixes in contact recognition and extraction.
- Enhancements - Enhanced queue performance and fast scraping on very large numbers of URLs.
- Fixes - Many performance and security enhancements and fixes.

Version 5.0.0.239
- Feature - Max number of retries can now be set in the Exploration preference panel.
- Feature - New format, added milliseconds and some additional changes in date eval.
- Feature - Now forcing contact column extraction if unhidden in column picker (if applicable).
- Feature - Browse and fast contacts (all links) is implemented in all editions.
- Feature - Added replacement functions #MACHINE-NAME# (set in the preferences) and #RANDOM-PHRASE#[adjective] [character]# to generate random strings.
- Fix - Fixed automator selection in tutorials that could lead to editing the wrong scraper during the execution of the tutorial.
- Fix - Corrected a rare problem which could cause the application to start in full screen mode.
- Fix - Fixes in first name recognition (removed short ambiguous names & corrected a recent regression).
- Fix - Fixed #DISTINCT-COUNT# which was not creating two columns.
- Fix - Multiple fixes in contacts.
- Fix - Fix for oversized query directories which could be truncated and rendered unusable if RAM was not large enough (could only happen in extreme cases of many hundreds of thousands or millions of items).
- Enhancement - Now corrects URLs sent to (or pasted in) queries without protocol (adding http://).
- Enhancement - Allowed slash character in some phone numbers (mostly for Belgian phone formats).
- Enhancement - Fixes and enhancements in text cleaning and script/style/comment removal.
- Enhancement - Made #ignoreErrors# work even for timeouts, unreachable, no data....

Version 5.0.0.127
- Feature - Added preference to prevent link extraction in Enterprise edition (useful when loading extremely large documents into OutWit Hub).
- Feature - Added epub format to document recognition in documents view.
- Enhancement - Recognizes more Next Page links automatically.
- Enhancement - Tested and enhanced upgrade/update/downgrade functions in large number of configurations.
- Known Issue - Not signed for Firefox 43+. Can only be installed as an add-on to Firefox 43+ if the preference named xpinstall.signatures.required is set to to false.

Version 5.0.0.107 - Release 5.0
- Feature - Directive library and help in the scraper editor right-click.
- Feature - Recognition of "onclick" javascript links as Next Page links.
- Feature - First implementation of selectors in the bottom panels.
- Feature - 'Show in source' function from the browser. (Right-click on the page.)
- Feature - Split directory function in the queries view. (Right-click on a directory.)
- Feature - Script execution timeout preference.
- Feature - FTP upload as a new destination for macro data exports in Pro and Enterprise editions.
- Feature - User replacements on source load and on export.
- Features - Refactoring, dozen of additional features, new scraper directives.
- Feature - Enterprise edition: Scraper directives: #nextPageReferrer#, #skipIfIn#queryDirectory#, #deduplicateWithinPage#
- Feature - Enterprise edition: Scraper click commands to be used in the Replace column: #CLICK-ID#nodeID#, #CLICK-SELECTOR#cssSelector#, #CLICK-SELECTOR-FIRST-NODE#cssSelector#, #CLICK-SELECTOR-LAST-NODE#cssSelector#, #CLICK-SELECTOR-FIRST-LINK#cssSelector#, #CLICK-SELECTOR-LAST-LINK#cssSelector#, #CLICK-SELECTOR-ALL#cssSelector#, #CLICK-SELECTOR-NEXT-NODE#cssSelector#, #CLICK-CLASS-ALL#cssClass#, #CLICK-CLASS#cssClass#, #CLICK-CLASS-FIRST-NODE#cssClass#, #CLICK-CLASS-LAST-NODE#cssClass#, #CLICK-CLASS-FIRST-LINK#cssClass#, #CLICK-CLASS-LAST-LINK#cssClass#, #CLICK-CLASS-NEXT-NODE#cssClass#, #CLICK-CLASS-NEXT-LINK#cssClass#
- Enhancement - Verification of profile files and config consistency at startup and correction of known possible problems.
- Enhancement - Handles combined Fast Dig and Browse with the 'include selected data' option on (or in macros with 'catchData').
- Fix - Blinking scrollbars on Macintosh in the scraper manager.
- Fix - Small fixes and enhancements throughout the code.

Version 4.1.2.18
- Fix - fixed contact extractor for emails like alt=name@example.com.
- Fix - Several minor corrections throughout the code.
- Enhancement - Refactoring and optimization in scraper engine.
- Enhancement - Enhanced paste links function.