What's new in Form Recognizer: new Document API, Signature detection, 122 Languages and lots more
Published Oct 13 2021 11:44 PM 8,508 Views
Microsoft

Form Recognizer is an AI service that provides pre-built  or custom models to extract information from documents. Today, customers can take advantage of a new set of preview capabilities that enhance your document process automation or knowledge mining capabilities. This release is packed with new features and updates.

 

What’s New in Form Recognizer

 

Document API

General document API uses a pretrained model to extract text, tables, structure key value pairs and entities from a form or document. With general document, you no longer need to train a model to extract key value pairs that can be inferred from the structure or content of most documents. Start with the general document overview to learn more about this feature or to test the API in the new Form Recognizer Studio.

 

Try General Document in the Form Recognizer StudioTry General Document in the Form Recognizer Studio

 

The new General Document API is only available on the latest version of the REST API which has been redesigned for better usability. The migration guide describes the differences between the API versions and how you can start using the new API version.

Code Examples

REST API

 

 

 

 

 

curl -v -i POST "https://{endpoint}/formrecognizer/documentModels/prebuilt-document:analyze?api-version=2021-09-30-preview&api-version=2021-09-30-preview HTTP/1.1" -H "Content-Type: application/json" -H "Ocp-Apim-Subscription-Key: {subscription key}" --data-ascii "{​​​​​​​'source': '{your-document-url}'}​​​​​​​​"

 

 

 

A successful request will return a operation location header that will contain the URL to get the result of the operation when complete.

 

 

 

curl -v -i https://{endpoint}/formrecognizer/documentModels/prebuilt-document/analyzeResults/{operation}?api-version=2021-09-30-preview -H "Content-Type: application/json" -H "Ocp-apim-subscription-key: {api key}"

 

 

 

C# Sample

  1. Install the C# SDK
    dotnet add package Azure.AI.FormRecognizer
  2. Authenticate the client
    string endpoint = "<your-endpoint>";
    string apiKey = "<your-apiKey>";
    var credential = new AzureKeyCredential(apiKey);
    var client = new DocumentAnalysisClient(new Uri(endpoint), credential);
    
  3. Analyze a document with General Document API

    string fileUri = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/sample-layout.pdf";
    
    AnalyzeDocumentOperation operation = await client.StartAnalyzeDocumentFromUriAsync("prebuilt-document", fileUri);
    
    await operation.WaitForCompletionAsync();
    
    AnalyzeResult result = operation.Value;
    
    Console.WriteLine("Detected entities:");
    
    foreach (DocumentEntity entity in result.Entities)
    {
        if (entity.SubCategory == null)
        {
            Console.WriteLine($"  Found entity '{entity.Content}' with category '{entity.Category}'.");
        }
        else
        {
            Console.WriteLine($"  Found entity '{entity.Content}' with category '{entity.Category}' and sub-category '{entity.SubCategory}'.");
        }
    }
    
    Console.WriteLine("Detected key-value pairs:");
    
    foreach (DocumentKeyValuePair kvp in result.KeyValuePairs)
    {
        if (kvp.Value.Content == null)
        {
            Console.WriteLine($"  Found key with no value: '{kvp.Key.Content}'");
        }
        else
        {
            Console.WriteLine($"  Found key-value pair: '{kvp.Key.Content}' and '{kvp.Value.Content}'");
        }
    }
    
    
    

 

 For more information see General Document API 

 

Signature detection

 Signatures can now be detected in form fields in a custom form model. Signature field is a new field type in custom models that detects if a signature exists in the specified field. In addition to key value pairs, tables and selection marks you can now also train a model to detect signature in documents. 

 

Labeling experience for signature field in custom formsLabeling experience for signature field in custom forms

 

To learn more about signature detection, see custom and composed models.

Language Expansion

Printed text extraction covers a total of 122 languages with the addition of 49 new languages including Russian and other Cyrillic and Latin languages. Handwritten text extraction now supports Chinese, French, German, Italian, Portuguese, and Spanish in addition to the existing English handwritten support. For the full list of supported languages see here.

 

Hotel Receipts

Support for Hotel receipts is now available in the Receipt model. You can now automatically process a hotel receipt and extract the key value pairs required, such as date of arrival and date of departure and line items. 

 

Results for an analyzed hotel receipt in the Form Recognizer StudioResults for an analyzed hotel receipt in the Form Recognizer Studio

 

Learn more about the receipt model here.

Pre-built ID

The ID pre-built model now recognizes additional fields within the US driver’s license such as endorsements, restrictions, and vehicle classification.

 

Results for analyzed prebuilt id in the Form Recognizer StudioResults for analyzed prebuilt id in the Form Recognizer Studio

 

Learn more about the ID document model here.

New Form Recognizer Studio, REST API & Updated SDK

Form Recognizer Studio simplifies the use of the service, enabling testing pre-built models, testing pre-trained models, and building and testing custom models. As the service expands, the REST API has been redesigned for improved usability, the migration guide will help you transition to the new API.

 

The new Form Recognizer Studio to test, train and analyze document modelsThe new Form Recognizer Studio to test, train and analyze document models

 

 

Get started 

Form Recognizer continues to improve AI quality and service performance. If you have any questions or feedback on either the preview APIs or the service, please contact us via email.

Co-Authors
Version history
Last update:
‎Jan 25 2024 08:00 AM
Updated by: