Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Current »

Following you will find a brief of Textract

Service

Textract service is divided in 2 API’s

Detect Document Text API: The Detect Document Text API uses optical character recognition (OCR) technology to extract text from a provided document.

Optical Character Recognition (OCR): Detect printed text and numbers in a scan or rendering of a document, use synchronous or asynchronous operations via API and information is returned in JSON format. Synchronous refer to live scenes such as posters or road signs, asynchronous to a multy page documents.


image-20190205-011424.png

Analyze Document API: The Analyze Document API extracts data from tables and key-value pairs from forms.

Key-Value Pair Extraction: Detect key-value pairs in document images automatically to retain the inherent context of the document. Use synchronous or asynchronous operations to analyze text in a document. The results of text analysis are returned in a JSON format

image-20190205-011554.png

Table Extraction: Automatically load the extracted data into a database using a pre-defined schema. Preserves the composition of data stored in tables during extraction.

image-20190205-011622.png

Pricing

No minimum fees and no upfront commitments. Amazon Textract charges for each page processed and whether we extract only text from documents or text with tables and/or form data.

As AWS customers we have access to the API’s in a free tier for 12 months with the following restrictions:

  • 1,000 pages per month using the Detecting Document Text API

  • 100 pages per month using the Analyze Document API

Full price list in the following link:

https://aws.amazon.com/textract/pricing/


Testing

Bellow the requirements to test the Amazon Textract API

Dependencies:

Python 3.7

pip 19.03

Documentation :

https://serverfault.com/questions/918335/best-way-to-run-python-3-7-on-ubuntu-16-04-which-comes-with-python-3-5

https://docs.aws.amazon.com/cli/latest/userguide/install-linux.html#install-linux-pip


User configuration

Create an IAM user with the following permissions

  • AmazonTextractFullAccess
  • AmazonSQSFullAccess
  • Sufficient permissions to upload and read images from a bucket in S3

Documentation :

https://docs.aws.amazon.com/IAM/latest/UserGuide/getting-started_create-admin-group.html


  1. Install AWS CLI

This client is available in Linux, Windows, macOS, Virtualenv, Bundled Installer. We will test it in Ubuntu 16.04

https://docs.aws.amazon.com/cli/latest/userguide/install-linux.html#install-linux-pip

Verify that the AWS CLI installed correctly.

2. Configure AWS Cli

Credentials sent to Manuel Navarro (Unlicensed)

User: manuel.navarro
Access Key ID
AKIAI524ELATZZGGSPDA
Secret Access Key
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

To select region please check the following link (Amazon Textract)

https://docs.aws.amazon.com/general/latest/gr/rande.html

Instructions

https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html

3. Test by executing a list operation



Related links:

https://docs.aws.amazon.com/aws-sdk-php/v3/api/class-Aws.Textract.TextractClient.html

  • No labels