Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Following you will find a brief of Textract

...

Optical Character Recognition (OCR): Detect printed text and numbers in a scan or rendering of a document, use synchronous or asynchronous operations via API and information is returned in JSON format. Synchronous refer to live scenes such as posters or road signs, asynchronous to a multy page documents.


Optical Character Recognition (OCR)Image Added
The following diagram shows how the line Hello, world. in the text Hello, world.

...

How are you? is represented by Block objects
Image Added

Analyze Document API: The Analyze Document API extracts data from tables and key-value pairs from forms.

Key-Value Pair Extraction: Detect key-value pairs in document images automatically to retain the inherent context of the document. Use synchronous or asynchronous operations to analyze text in a document. The results of text analysis are returned in a JSON format

...

Key-Value Pair ExtractionImage Added
The following diagram shows how the key-value pair Name: Ana Carolina is represented by Block object
Image Added

Table Extraction: Automatically load the extracted data into a database using a pre-defined schema. Preserves the composition of data stored in tables during extraction.

...

Table Extraction Image Added
The following diagram shows how a single cell in a table is represented by Block objects.
Image Added

Pricing

No minimum fees and no upfront commitments. Amazon Textract charges for each page processed and whether we extract only text from documents or text with tables and/or form data.

...

Dependencies:

Python 3.7

pip 19.03

Installing dependencies documentation requiredDocumentation :

https://serverfault.com/questions/918335/best-way-to-run-python-3-7-on-ubuntu-16-04-which-comes-with-python-3-5

https://docs.aws.amazon.com/cli/latest/userguide/install-linux.html#install-linux-pip


User configuration

Create an IAM user with the following permissions

  • AmazonTextractFullAccess
  • AmazonSQSFullAccess
  • Sufficient permissions to upload and read images from a bucket in S3

Documentation :

https://docs.aws.amazon.com/IAM/latest/UserGuide/getting-started_create-admin-group.html


  1. Install AWS CLI

This client is available in Linux, Windows, macOS, Virtualenv, Bundled Installer. We will test it in Ubuntu 16.04

https://docs.aws.amazon.com/cli/latest/userguide/install-linux.html#install-linux-pip

Verify that the AWS CLI installed correctly.

Image Modified

2. Configure AWS Cli

...

3. Test by executing a list operation



Related links:

https://docs.aws.amazon.com/aws-sdk-php/v3/api/class-Aws.Textract.TextractClient.html