- Version 9.0.0.8 - release 9.0
- Feature - Data refining tools:
#decode()# several algorithms were added
to this function to decode an obfuscated
or encrypted string extracted by the
scraper line into plain text.
- Feature - Data refining tools:
Compute: allow to perform basic
operations on selected numerical cells
of a colum.
- Feature - Data refining tools:
Delete columns > to the right:
deletes the selected column and all
additional columns to its right.
- Feature - Export: It is now
possible to export each scraped row to a
separate file.
- Feature - Scrapers: #clickOnNodes#
instructs the scraper to click on page
elements matching a css selector.
- Feature - Scrapers: #decode()#
several algorithms were added to this
function to decode an obfuscated or
encrypted string extracted by the
scraper line into plain text.
- Feature - Scrapers: #EARLIEST# and
#LATEST# allow to return the first/last
date matching the scraper line.
- Feature - Scrapers: #enableNodes#
and #disableNodes# allow to directly
change the state of page elements
matching a css selector.
- Feature - Scrapers:
#ifURLContains#, #ifURLDoesNotContain#
allows to execute a scraper line or not
depending on the URL being scraped.
- Feature - Scrapers: #ignoreIfField#
instructs the scraper to ignore this
page or record if a field has a certain
value.
- Feature - Scrapers: #lowerCase()#,
#upperCase()#, #properCase()#,
#sentenceCase()# alters the case of the
string extracted by a scraper line.
- Feature - Scrapers: #PAGESTATUS#
replacement function returns info on the
current page (errors, title...).
- Feature - Scrapers: #pressKey#
allows the scraper to simulate a key
press in certain cases.
- Feature - Scrapers: #select# adds
elements matching a css selector to the
selection in the current page.
- Feature - Scrapers: #setValue# now
also allows to check radio buttons,
checkboxes, etc.
- Feature - Scrapers: A ^ suffix in
the description (myURLFieldName^) in a
scraper line destined to extract a URL,
only returns the "top" url in the
hierarchy (out of
example.com/products/shoes and
example.com/products/, only the latter
is returned.)
- Feature - Scrapers: Multiple
required fields (descriptions ending
with "!") can now be interpreted as AND
or OR conditions.
- Enhancement - contact recognition
& filtering: Implementation of job
title recognition (Debug stage. mostly
English for now). Better elimination of
example/bogus email addresses and phone
numbers. Enhancements in name, company,
address and copyright fields
recognition. Enhancements throughout the
program in first/last names split and in
physical address split. Better handling
of obfuscated email addresses.
- Enhancement - Data refining tools:
Clean up > Normalize All Figure:
enhanced and optimized.
- Enhancement - Enhancement of date
recognition, including dates without a
year.
- Enhancement - Export: performance
enhancements in export functions.
- Enhancement - The application
window is resized at launch if it
exceeds the dimensions of the screen.
- Enhancement - Updated the list of
User Agents in the advanced preferences.
- Fix - Fixed source colorizing
problems in the case of line breaks
inside HTML tags.
- Fix - Multiple fixes and
enhancements in String Generation
functions.
- Fixes - Many more enhancements and
fixes throughout the code.
- Version 8.0.0.57
- Feature - The Search Engine Query
Builder (Tools menu and toolbar button)
allows you to create complex queries for
the main search engines.
- Feature - alt-shift-click on a page
in the browser forces the reapplication
of the current extractor. A useful
alternative to changing the preferences
in some AJAX pages where small
alterations to the rendered page do not
trigger the extraction.
- Enhancement - Updated list of
pre-defined User Agents in the
preferences with recent devices and
browsers.
- Version 8.0.0.46 - release 8.0
- Feature - #emptyDirectory# Empties
the first directory of the queries view
matching the passed name.
- Feature - #splitField# Splits the
passed field as a post-process, using
the values in the separator and labels
columns. (Can allow consecutive splits.)
- Feature - #decodeEntities# Decodes
HTML entities (like & or
>) to their plain text
equivalent.
- Feature - #decodeURL# Decodes URL
encoded characters (like %20) to their
plain text equivalent.
- Feature - #save# Saves the string
extracted by the scraper line to a
separate text file.
- Feature - #screenshot# Saves a
screenshot of the current page into a
file using the passed file name.
- Feature - #hideNodes# Makes the
nodes matching the passed css selector
invisible.
- Feature - #scrollBy# Scrolls the
page loaded in the OutWit Hub browser by
the passed number of pixels.
- Feature - #resetPrefOnStop# Reset
the passed preference to its default
value at the end of the scrape process.
- Feature - #uniqueField# Makes sure
that no duplicate values are extracted
for the specified field(s) during the
same exploration. (An alternative to
deduplication while scraping, in case
volumes are too large to post-process
it.)
- Feature - #setValue# Sets the value
of the <select> or <input>
HTML block matching the format column,
to the value passed in the replace
column.
- Feature - #restartEvery# Sets
'auto-explore on startup' flag to true
and restarts the application, every n
pages or seconds.
- Feature - #uncheckURLInQuery#
Unchecks the 'OK' checkbox of the first
line containing the current URL in the
passed query directory.
- Feature - #uncheckItemInQuery#
Unchecks the 'OK' checkbox of the first
line containing the string extracted by
the scraper line in the passed query
directory.
- Enhancement - It is now possible to
set the field name with a variable in
the #default# directive.
- Feature - #readFromQueries# Reads
the next active string from the passed
query directory and stores its value in
the passed variable, then unchecks the
line in the query directory.
- Feature - #switchTo# Changes the
current view to the value set in the
replace column.
- Feature - #reapply# now accepts
parameters for the number of
applications and the delay between them.
- Feature - #adler32()# Used in the
replacement column, allows you to
generate a short hash from the string
extracted by the scraper line. (This can
be useful for deduplication although it
is not 100% reliable as, even if it is
unlikely, two different strings can
result in the same hash.)
- Feature - #encodeBase64()#,
#decodeBase64()# Converts the string
extracted by the scraper line into a
base64 encoded string or decodes it into
plain text.
- Feature - #decode()# Decodes the
string extracted by the scraper line
into plain text, trying several
algorithms.
- Feature - #unique()# Only returns
the string extracted by the scraper line
if the values is unique during the same
exploration. (An alternative to
deduplication while scraping, in case
volumes are too large to post-process
it.)
- Feature - #WEEK# was added to the
time variables. Returns the week number
in the year.
- Feature - #LAST-POST-QUERY# returns
the last POST query send.
#LAST-POST-QUERY#param# returns the
value of the passed parameter in the
last POST query sent.
- Feature - Several tools were added
to the right-click menu on datasheets:
Insert Index Column, Duplicate Column,
Indexed Duplicate Column, Copy from
Column..., Select if in...
- Feature - When scraping a
self-updating AJAX page, the #reapply#
directive now allows to do the
extraction n times at the frequency you
choose.
- Enhancement - Faster start
preparation and end of process cleaning
in large volume Fast Scrapes.
- Enhancement - The contact
recognition module and its dictionary
were enhanced, lax recognition and dummy
email addresses elimination, improved.
- Enhancement - Improved dictionary
of multilingual words, acronyms and
roots, frequently used in company names
addresses etc. to enhance recognition.
- Fixes - Many more enhancements
and fixes throughout the code.
- Version 7.0.0.56
- Feature - (Expert & Enterprise)
Added 'Duplicate Column', 'Insert Index
Column', etc., to the right-click menu
on datasheets.
- Fixes - Many minor fixes and
optimizations.
- Fix - there was a regression in
7.0.0.55 that could prevent correct
scraping in Fast mode. This was fixed in
7.0.0.56
- Version 7.0.0.36 - release 7.0
- Feature - #exportEvery#n#,
#exportAndDeleteEvery#n#,
#catchEvery#n#, #catchOnStop# Catches or
exports the extracted data when you
desire during the process.
- Feature - #abortAfter#,
#abortAfterNPages#n#,
#abortAfterNResults#n# Aborts the
current extraction after a text is found
or a certain number of pages or results
have been reached.
- Feature - #decodeJSCharcodes#,
#zapGremlins# to decode hexadecimal,
remove unwanted control or invisible
characters, correct badly encoded
characters, etc.
- Feature - #clearForms#,
#clearAllHistory#,
#clearBrowsingHistory#, #clearCookie#,
#clearCookieEvery#n#, #clearCookieIf#,
#clearCookiesEvery#n#, #clearCookiesIf#,
#clearCookiesIfNot# allow you to manage
history and cookies from within the
scraper.
- Feature - Use #autoEmpty#,
#autoCatch#, #emptyOnDemand#,
#deduplicate# to set the value of the
scraped view options from a scraper.
- Feature - #keepForms#,
#removeScripts#, #removeTags#,
#allFrames#, #originalHTML#... allow you
to determine exactly how you want the
source before the scraper is applied.
- Feature -
#replaceInField#fieldName# replaces
value (litteral of RegExp) in a given
field at the end of the process.
- Feature - #fieldGroup# Makes sure
that the fields indexes in a same group
are incremented together even if some of
the fields are empty.
- Feature - #oneRow# Makes sure that
all extracted data in the page will be
presented as a single row in the
datasheet.
- Feature - #allowCrossDomain#
removes js restrictions, which is
sometimes useful to simulate clicks and
other interactions with the page.
- Feature - #rename#, #unzip# give
you post-processing access to files that
you have downloaded.
- Feature - The #match()# function,
used in the replacement column allows
you to search the other occurrences of a
string (or matches of a RegExp) that you
grab (or build) from the page itself. It
allows very powerful conditional
extractions.
- Feature - The syntax
myFieldName<n, myFieldName>n,
myFieldName= in the description column
allows you to manage multiple results,
duplicates etc..
- Feature - Used in the Enterprise
edition, the #exportAndDeleteEvery#n#
scraping directive can define an SQLite
database as the destination (using a
filename with the .sqlite extension),
allowing you to process and store
extremely large volumes of data.
- Feature - You can now save (and
restore) the state of preferences in
(from) a directory of the queries view.
- Features - With the #HEADER#
keyword added to the POST query format
you can add custom parameters to the
header of the query. #CHARSET# defines
the encoding, #TYPE#, the contentType
and #REFERER#, the referrer.
- Features - Plus a large number of
enhancements and features which are not
listed here.
- Enhancement - Improved recognition
and extraction of rss feeds, publication
dates in more locales, addition of
universal identifier (guid)...
- Enhancement - The contact
recognition module was further enhanced,
lax recognition and dummy email
addresses elimination, improved.
- Enhancement - A large dictionary
of multilingual words, acronyms and
roots, frequently used in company names
addresses etc. was added to enhance
recognition.
- Editions - OutWit Hub still comes
in three different editions (license
levels): Pro, Expert and Enterprise but
we now propose a streamlined version of
the Hub that can do Web explorations and
run scrapers but has no editing
capacities. Don't hesitate to enquire
about this on the customer support
system.
- Fixes - Many enhancements and fixes
throughout the code.
- Version 6.0.0.72
- Feature - The text size control
from the View menu now increases or
decreases the size of the page text as
well as the extracted data.
- Fixes - Many fixes, in particular
on inline editing in sorted datasheets
and managers and in the scrollToEnd
function .
- Version 6.0.0.51 - Release 6.0
- Editions - New Expert Edition:
OutWit Hub now comes in three different
editions: Pro, Expert and Enterprise.
Expert is single user and contains all
features that were reserved to the
Enterprise edition until version 5.0.
Enterprise now allows several users or
instances to share common automators.
- Feature - (Expert & Enterprise
editions) #suspend#n#,
#suspendIf#n#, #suspendIfNot#n#: added a
parameter to wait for n seconds before
resuming when the OK button is clicked.
(Useful to give the user time to
interract with the page, solve a
captcha, etc. ).
- Feature - (Expert & Enterprise
editions)
#firstName(string)#, #lastName(string)#,
#firstLastName(string)#,
#gender(string)#: tries to finds the
most likely first, last, first &
last name or the gender in the passed
full name string.
- Feature - Pro users can now
organize their automators (scrapers,
macros, jobs, queries), grouping them by
projects.
- Features - A large series of
directives and functions was added to
the pro version: #autoEmpty#,
#autoCatch#, #emptyOnDemand#,
#deduplicate#, #default#,
#default#fieldName#, #pauseBefore#,
#checkIfURL# and #checkIfNotURL#,
#encodeURL()#, #SECOND# ... #FIFTH#,
#LOCALIP#.
- Features - New Directives were
added to Expert & Enterprise
editions: #scope# (outside or within
domain, all links or with a depth of 1
or 2),
#deduplicateOnStop#criterionColumnName#,
#deduplicateWithinPage#,
#scrollToEnd#cssSelector#...
- Feature - (Expert & Enterprise
editions) Added preference
to create additional Gender column when
using the Insert First/Last Name
function in the right-click menu. The
column contains the string defined in
the preference (like "Dear Mr", "Dear
Ms") when the gender is recognized and a
fallback value (like "Dear Customer")
otherwise.
- Feature - (Expert & Enterprise
editions) The words view
now includes a text box where to type or
paste the words to count in the page.
- Feature - Added a prference to
instruct the program to check the page
header before loading it, in order to
avoid errors and login dialogs that
could block an automatic exploration.
- Enhancement - The Scroll to End
directive was enhanced to work in more
AJAX pages.
- Enhancement - The email recognition
module now allows for diacritic
characters, more dummy email addresses
(user@example.com...) are eliminated,
lax recognition (jackie at mysite dot
com...) is much more efficient.
- Enhancement - The export module was
refactored and optimized in v6.0, fixing
bugs, enhancing data cleaning and
performance and adding features like
additional preference settings for SQL
exports, VARCHAR(xxx)....
- Version 5.0.1.57
- Feature - added #checkIfURL# and
#checkIfNotURL# scraping directives for
extraction conditions on the current
URL.
- Fix - fixes in abortIf abortIfNot
and abortAfter.
- Version 5.0.1.42
- Feature - It is now possible to use
a multiple character string as the
CONCAT separator.
- Feature - Added preference to name
the fields in the queries of SQL
exports.
- Feature - Added #MaxColumns#
directive to limit the number of columns
in the extracted data.
- Fix - fixed stalling explorations
in certain cases when the server did not
answer.
- Fix - #REQUESTED-URL# works in more
cases.
- Fixes - several fixes and
optimizations in contact extractions on
large lists of URLs.
- Enhancement - Enhancements and
fixes in #suspendIf# and #formatDate()#.
- Version 5.0.1.9
- Enhancement - Modified 'Zap
Gremlins' preference for scrapers, so
that it doesn't remove non-latin
characters.
- Fix - fixed export problem on
columns with a % in the header.
- Fix - made #showAlert# work even if
it is the only line in scraper.
- Fixes - Various fixes in scrapers
and macros.
- Version 5.0.0.294
- Feature - Added a preference to set
the list of fields in INSERT
instructions for SQL exports.
- Feature - Added preference "Proceed
If Page Contains / Does Not Contain".
- Feature - Now resolves generation
patterns to first value when typed in
address bar.
- Feature - Automatic corrections are
now performed when pasting URLs into a
directory of queries.
- Fix - Removed save dialog on
command line execution with -url
parameter and a MAU.
- Fix - Corrected header problem that
could occur when editing a cell in table
view.
- Fix - Corrected #showAlert# which
did not execute when no other line in
scraper.
- Enhancements - Enhancements and
fixes in contact recognition and
extraction.
- Enhancements - Enhanced queue
performance and fast scraping on very
large numbers of URLs.
- Fixes - Many performance and
security enhancements and fixes.
- Version 5.0.0.239
- Feature - Max number of retries can
now be set in the Exploration preference
panel.
- Feature - New format, added
milliseconds and some additional changes
in date eval.
- Feature - Now forcing contact
column extraction if unhidden in column
picker (if applicable).
- Feature - Browse and fast contacts
(all links) is implemented in all
editions.
- Feature - Added replacement
functions #MACHINE-NAME# (set in the
preferences) and
#RANDOM-PHRASE#[adjective] [character]#
to generate random strings.
- Fix - Fixed automator selection in
tutorials that could lead to editing the
wrong scraper during the execution of
the tutorial.
- Fix - Corrected a rare problem
which could cause the application to
start in full screen mode.
- Fix - Fixes in first name
recognition (removed short ambiguous
names & corrected a recent regression).
- Fix - Fixed #DISTINCT-COUNT# which
was not creating two columns.
- Fix - Multiple fixes in contacts.
- Fix - Fix for oversized query
directories which could be truncated and
rendered unusable if RAM was not large
enough (could only happen in extreme
cases of many hundreds of thousands or
millions of items).
- Enhancement - Now corrects URLs
sent to (or pasted in) queries without
protocol (adding http://).
- Enhancement - Allowed slash
character in some phone numbers (mostly
for Belgian phone formats).
- Enhancement - Fixes and
enhancements in text cleaning and
script/style/comment removal.
- Enhancement - Made #ignoreErrors#
work even for timeouts, unreachable, no
data....
- Version 5.0.0.127
- Feature - Added preference to
prevent link extraction in Enterprise
edition (useful when loading extremely
large documents into OutWit Hub).
- Feature - Added epub format to
document recognition in documents view.
- Enhancement - Recognizes more Next
Page links automatically.
- Enhancement - Tested and enhanced
upgrade/update/downgrade functions in
large number of configurations.
- Known Issue - Not signed for
Firefox 43+. Can only be installed as an
add-on to Firefox 43+ if the preference
named xpinstall.signatures.required is
set to to false.
- Version 5.0.0.107 - Release 5.0
- Feature - Directive library and
help in the scraper editor right-click.
- Feature - Recognition of "onclick"
javascript links as Next Page links.
- Feature - First implementation of
selectors in the bottom panels.
- Feature - 'Show in source' function
from the browser. (Right-click on the
page.)
- Feature - Split directory function
in the queries view. (Right-click on a
directory.)
- Feature - Script execution timeout
preference.
- Feature - FTP upload as a new
destination for macro data exports in
Pro and Enterprise editions.
- Feature - User replacements on
source load and on export.
- Features - Refactoring, dozen of
additional features, new scraper
directives.
- Feature - Enterprise edition:
Scraper directives: #nextPageReferrer#,
#skipIfIn#queryDirectory#,
#deduplicateWithinPage#
- Feature - Enterprise edition:
Scraper click commands to be used in the
Replace column: #CLICK-ID#nodeID#,
#CLICK-SELECTOR#cssSelector#,
#CLICK-SELECTOR-FIRST-NODE#cssSelector#,
#CLICK-SELECTOR-LAST-NODE#cssSelector#,
#CLICK-SELECTOR-FIRST-LINK#cssSelector#,
#CLICK-SELECTOR-LAST-LINK#cssSelector#,
#CLICK-SELECTOR-ALL#cssSelector#,
#CLICK-SELECTOR-NEXT-NODE#cssSelector#,
#CLICK-CLASS-ALL#cssClass#,
#CLICK-CLASS#cssClass#,
#CLICK-CLASS-FIRST-NODE#cssClass#,
#CLICK-CLASS-LAST-NODE#cssClass#,
#CLICK-CLASS-FIRST-LINK#cssClass#,
#CLICK-CLASS-LAST-LINK#cssClass#,
#CLICK-CLASS-NEXT-NODE#cssClass#,
#CLICK-CLASS-NEXT-LINK#cssClass#
- Enhancement - Verification of
profile files and config consistency at
startup and correction of known possible
problems.
- Enhancement - Handles combined Fast
Dig and Browse with the 'include
selected data' option on (or in macros
with 'catchData').
- Fix - Blinking scrollbars on
Macintosh in the scraper manager.
- Fix - Small fixes and enhancements
throughout the code.
- Version 4.1.2.18
- Fix - fixed contact extractor for
emails like alt=name@example.com.
- Fix - Several minor corrections
throughout the code.
- Enhancement - Refactoring and
optimization in scraper engine.
- Enhancement - Enhanced paste
links function.