- Digimind Main Text Extractor

Objective

Information content is rarely isolated on the internet. It is often incorporated into a page made up of browser menus, headers and footers, etc. The problem is that the text of interest is often mixed in with the html code and therefore difficult to identify. Digimind Main Text Extractor is able to extract the information content automatically, without any further programming necessary.

How it operates

Digimind Main Text Extractor analyzes the html code of the page it receives, corrects any faults identified, then applies a number of algorithms based on topology and standardized vectorial space theories.