Thanks to its powerful capture function, Athento SE - Athento’s document capture module- is able to extract the textual content of existing images that are already in the repository (Athento ECM or other ECM solutions) and images that were captured through its capture module. This ensures that documents can be found searching for text contained in the images. This is achieved by combining OCR for extracting text within the image as well as full-text indexing, to allow this text within the images to be searchable.
Athento can also performs field or metadata extraction. Some of the available methods for data extraction:
- Document text extraction (OCR)
- Extraction of semantic tags in documents
- Extraction of people and other entities (semantics)
- Extraction of metadata using coordinates.
- Extraction of metadata using regular expressions.
- Extraction of QR codes and barcodes.
- Extraction of metadata contained in tables.
- Extraction of metadata using anchors (HOCR)