Amazon Textract overview
Following you will find a brief of Textract
Service
Textract service is divided in 2 API’s
Detect Document Text API: The Detect Document Text API uses optical character recognition (OCR) technology to extract text from a provided document.
Optical Character Recognition (OCR): Detect printed text and numbers in a scan or rendering of a document, use synchronous or asynchronous operations via API and information is returned in JSON format. Synchronous refer to live scenes such as posters or road signs, asynchronous to a multy page documents.
Hello, world
. in the text Hello, world. How are you?
is represented by Block objectsAnalyze Document API: The Analyze Document API extracts data from tables and key-value pairs from forms.
Key-Value Pair Extraction: Detect key-value pairs in document images automatically to retain the inherent context of the document. Use synchronous or asynchronous operations to analyze text in a document. The results of text analysis are returned in a JSON format
Table Extraction: Automatically load the extracted data into a database using a pre-defined schema. Preserves the composition of data stored in tables during extraction.
Pricing
No minimum fees and no upfront commitments. Amazon Textract charges for each page processed and whether we extract only text from documents or text with tables and/or form data.
As AWS customers we have access to the API’s in a free tier for 12 months with the following restrictions:
1,000 pages per month using the Detecting Document Text API
100 pages per month using the Analyze Document API
Full price list in the following link:
https://aws.amazon.com/textract/pricing/
Testing
Bellow the requirements to test the Amazon Textract API
Dependencies:
Python 3.7
pip 19.03
Documentation :
https://docs.aws.amazon.com/cli/latest/userguide/install-linux.html#install-linux-pip
User configuration
Create an IAM user with the following permissions
- AmazonTextractFullAccess
- AmazonSQSFullAccess
- Sufficient permissions to upload and read images from a bucket in S3
Documentation :
https://docs.aws.amazon.com/IAM/latest/UserGuide/getting-started_create-admin-group.html
- Install AWS CLI
This client is available in Linux, Windows, macOS, Virtualenv, Bundled Installer. We will test it in Ubuntu 16.04
https://docs.aws.amazon.com/cli/latest/userguide/install-linux.html#install-linux-pip
Verify that the AWS CLI installed correctly.
2. Configure AWS Cli
Credentials sent to Manuel Navarro (Unlicensed)
User: manuel.navarro Access Key ID AKIAI524ELATZZGGSPDA Secret Access Key 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
To select region please check the following link (Amazon Textract)
https://docs.aws.amazon.com/general/latest/gr/rande.html
Instructions
https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html
3. Test by executing a list operation
Related links:
https://docs.aws.amazon.com/aws-sdk-php/v3/api/class-Aws.Textract.TextractClient.html