Español      Français      English 

EVALTEC, Management of Research and Technological Development

Project management - Commercialisation of Technologies - Internationalisation


Text Mining


Text mining is a process obtaining knowledge from big volumes of information, not necessarily structured, that includes the comprehension of the contents of the documents, this is, “understanding” them. The information can be obtained from any source like databases, internal documents, Internet, mail, news, etc. and can be treated afterwards.


This means that tasks like classifying documents, reading mails, summarising contents of files and reports, analysing news and comments, etc. can be performed in an automatic way.


Phases in Text Mining


There are three main phases in the text mining process:

1 -   Extraction of the information.

2 -   Clustering or grouping.

3 -   Categorising.


In general, working procedure starts with the extraction of the information, this is, the linguistic analysis of the primary sources. For this phase a previous knowledge of the language, the special characters and the terms related to the knowledge area to be analysed is required.


Later there is a need of classifying the information, either through already defined criteria, or through a clusterer that suggests a number of groups and criteria to classify the information in an optimal way.


The categorisation process will address the information into the different clusters previously defined.


Taxonomies and “cartridges”


Taxonomies are linguistic and conceptual structures that conform an area of knowledge.


A basic part of the text mining tools are “cartridges”. Each “cartridge” contains information relative to the criteria to be applied on the extraction, clustering and categorising processes. This is, contains relationships, provides relevance to certain expressions, has specific terms, modal verbs and judgements, or typical structure under which information is presented. They enable the comprehension of the written language in defined technical, commercial or knowledge areas. In summary, they are practical concretions of the taxonomies.


Text Mining applications


Therefore, the new text mining tools enable big cost and time economies in processes such as documentary funds analysis and mapping, addressing documentation, feeding knowledge databases, technology watch, information management in CRM centres, etc.


As pointed out above, on the basis of this procedures, activities such as classifying documents, reading mail, summary of contents in files and reports, news analysis, etc. can be automatised.



Source: Presentation by Dr Mrs Elicet Cruz ©2004 - IALE Tecnologia.         *   twitter @ialeT



EVALTEC, Gestión de Investigación y Desarrollo Tecnológico, S.L. - C.I.F.: B-83399204 - ©2002-2021

Inscrita en el Registro Mercantil de Madrid, Tomo 18.001, Folio 56, Sección 8, Hoja M-311087