Tutorial: How to Extract Annotations from Documents

Learning Objectives

In this tutorial, you’ll learn how to:

Extract all annotations from an annotated document
Parse and process annotation data
Access annotation properties and metadata
Implement annotation extraction in different programming languages

What is Annotation Extraction?

Annotation extraction allows you to retrieve all annotations from a document as structured data. This is useful for analyzing annotations, generating reports, implementing approval workflows, or building collaborative review systems.

Prerequisites

Before starting this tutorial, ensure you have:

A GroupDocs.Annotation Cloud account (or get a free trial
Your Client ID and Client Secret credentials
A development environment for your preferred language
An annotated document uploaded to your GroupDocs.Annotation Cloud storage

Implementation Steps

Let’s walk through the process of extracting annotations from a document:

1. Authentication

First, authenticate with the GroupDocs.Annotation Cloud API:

// Get JWT token
curl -v "https://api.groupdocs.cloud/connect/token" \
-X POST \
-d "grant_type=client_credentials&client_id=YOUR_CLIENT_ID&client_secret=YOUR_CLIENT_SECRET" \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "Accept: application/json"

Save the received JWT token for subsequent API calls.

2. Extract Annotations

Use the POST /annotation/extract endpoint to retrieve all annotations:

// cURL example to extract annotations
curl -v "https://api.groupdocs.cloud/v2.0/annotation/extract" \
-X POST \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-d "{ \"FilePath\": \"annotated-document.docx\"}"

3. Process the Response

The API returns an array of annotation objects with all properties. Here’s a sample response:

[
  {
    "id": 0,
    "text": "This is ellipse annotation",
    "textToReplace": null,
    "horizontalAlignment": 0,
    "verticalAlignment": 0,
    "creatorId": 0,
    "creatorName": "John Doe",
    "creatorEmail": null,
    "box": {
      "x": 100,
      "y": 100,
      "width": 100,
      "height": 100
    },
    "points": null,
    "pageNumber": 0,
    "annotationPosition": null,
    "svgPath": null,
    "type": 4,
    "replies": [
      {
        "id": 0,
        "userId": 0,
        "userName": null,
        "userEmail": null,
        "comment": "First comment",
        "repliedOn": "2023-04-25T06:52:01.376Z",
        "parentReplyId": 0
      },
      {
        "id": 0,
        "userId": 0,
        "userName": null,
        "userEmail": null,
        "comment": "Second comment",
        "repliedOn": "2023-04-25T06:52:01.376Z",
        "parentReplyId": 0
      }
    ],
    "createdOn": "2023-04-25T06:52:01.376Z",
    "fontColor": null,
    "penColor": null,
    "penWidth": null,
    "penStyle": null,
    "backgroundColor": null,
    "fontFamily": null,
    "fontSize": null,
    "opacity": null,
    "angle": null,
    "url": null,
    "imagePath": null
  }
]

The response contains detailed information about each annotation, including:

Annotation ID and type
Text content
Position information (box, points)
Page number
Creator information
Creation date
Replies/comments
Style properties (colors, opacity, etc.)

Try It Yourself

Now, let’s implement annotation extraction in different programming languages.

C# Example

// For complete examples, visit: https://github.com/groupdocs-annotation-cloud/groupdocs-annotation-cloud-dotnet-samples
string MyAppKey = "YOUR_APP_KEY"; // Get AppKey and AppSID from https://dashboard.groupdocs.cloud
string MyAppSid = "YOUR_APP_SID";

var configuration = new Configuration(MyAppSid, MyAppKey);
var apiInstance = new AnnotateApi(configuration);

var fileInfo = new FileInfo { FilePath = "annotated-document.docx" };

// Extract annotations
var response = apiInstance.Extract(new ExtractRequest(fileInfo));

Console.WriteLine("Extracted annotations count: " + response.Count);

// Process the annotations
foreach (var annotation in response)
{
    Console.WriteLine($"Annotation ID: {annotation.Id}");
    Console.WriteLine($"Type: {annotation.Type}");
    Console.WriteLine($"Text: {annotation.Text}");
    Console.WriteLine($"Page Number: {annotation.PageNumber}");
    Console.WriteLine($"Creator: {annotation.CreatorName}");
    Console.WriteLine($"Created On: {annotation.CreatedOn}");
    
    // Process replies if present
    if (annotation.Replies != null && annotation.Replies.Count > 0)
    {
        Console.WriteLine("Replies:");
        foreach (var reply in annotation.Replies)
        {
            Console.WriteLine($" - {reply.Comment} (by: {reply.UserName ?? "Anonymous"}, on: {reply.RepliedOn})");
        }
    }
    
    Console.WriteLine("-------------------");
}

Java Example

// For complete examples, visit: https://github.com/groupdocs-annotation-cloud/groupdocs-annotation-cloud-java-samples
String clientId = "YOUR_CLIENT_ID"; // Get ClientId and ClientSecret from https://dashboard.groupdocs.cloud
String clientSecret = "YOUR_CLIENT_SECRET";

Configuration configuration = new Configuration(clientId, clientSecret);
AnnotateApi apiInstance = new AnnotateApi(configuration);

// Create request object
FileInfo fileInfo = new FileInfo();
fileInfo.setFilePath("annotated-document.docx");

ExtractRequest request = new ExtractRequest();
request.setfileInfo(fileInfo);

// Execute API method
List<AnnotationInfo> response = apiInstance.extract(request);

System.out.println("Extracted annotations count: " + response.size());

// Process the annotations
for (AnnotationInfo annotation : response) {
    System.out.println("Annotation ID: " + annotation.getId());
    System.out.println("Type: " + annotation.getType());
    System.out.println("Text: " + annotation.getText());
    System.out.println("Page Number: " + annotation.getPageNumber());
    System.out.println("Creator: " + annotation.getCreatorName());
    System.out.println("Created On: " + annotation.getCreatedOn());
    
    // Process replies if present
    if (annotation.getReplies() != null && !annotation.getReplies().isEmpty()) {
        System.out.println("Replies:");
        for (AnnotationReplyInfo reply : annotation.getReplies()) {
            System.out.println(" - " + reply.getComment() + 
                " (by: " + (reply.getUserName() != null ? reply.getUserName() : "Anonymous") + 
                ", on: " + reply.getRepliedOn() + ")");
        }
    }
    
    System.out.println("-------------------");
}

Python Example

# For complete examples, visit: https://github.com/groupdocs-annotation-cloud/groupdocs-annotation-cloud-python-samples
import groupdocs_annotation_cloud

app_sid = "YOUR_APP_SID"  # Get AppKey and AppSID from https://dashboard.groupdocs.cloud
app_key = "YOUR_APP_KEY"

api = groupdocs_annotation_cloud.AnnotateApi.from_keys(app_sid, app_key)

# Set up file info for the annotated document
file_info = groupdocs_annotation_cloud.FileInfo()
file_info.file_path = "annotated-document.docx"

# Create extract request
request = groupdocs_annotation_cloud.ExtractRequest(file_info)
result = api.extract(request)

print(f"Extracted annotations count: {len(result)}")

# Process the annotations
for annotation in result:
    print(f"Annotation ID: {annotation.id}")
    print(f"Type: {annotation.type}")
    print(f"Text: {annotation.text}")
    print(f"Page Number: {annotation.page_number}")
    print(f"Creator: {annotation.creator_name}")
    print(f"Created On: {annotation.created_on}")
    
    # Process replies if present
    if annotation.replies and len(annotation.replies) > 0:
        print("Replies:")
        for reply in annotation.replies:
            print(f" - {reply.comment} (by: {reply.user_name or 'Anonymous'}, on: {reply.replied_on})")
    
    print("-------------------")

Working with Extracted Annotations

Here are some common use cases for extracted annotation data:

1. Building Approval Workflows

Extract annotations to identify approvals, rejections, or requests for changes in a review process:

# Pseudocode for approval workflow
approval_status = "Pending"

for annotation in extracted_annotations:
    if "APPROVED" in annotation.text.upper():
        approval_status = "Approved"
        approver = annotation.creator_name
        approval_date = annotation.created_on
    elif "REJECTED" in annotation.text.upper():
        approval_status = "Rejected"
        rejection_reason = annotation.text
        rejector = annotation.creator_name

2. Generating Annotation Reports

Create summary reports of all annotations in a document:

// Pseudocode for generating report
StringBuilder report = new StringBuilder();
report.AppendLine("Annotation Report - " + DateTime.Now);
report.AppendLine("Document: " + filePath);
report.AppendLine("Total Annotations: " + annotations.Count);
report.AppendLine();

var annotationsByPage = annotations.GroupBy(a => a.PageNumber);
foreach (var pageGroup in annotationsByPage)
{
    report.AppendLine($"Page {pageGroup.Key + 1}:");
    foreach (var annotation in pageGroup)
    {
        report.AppendLine($" - {annotation.Type}: {annotation.Text} (by {annotation.CreatorName})");
    }
}

3. Collaborative Review Analysis

Analyze collaboration patterns based on annotation data:

// Pseudocode for collaboration analysis
Map<String, Integer> contributorCounts = new HashMap<>();
Map<Integer, List<AnnotationInfo>> annotationsByPage = new HashMap<>();

for (AnnotationInfo annotation : annotations) {
    // Count contributions by user
    String creator = annotation.getCreatorName();
    contributorCounts.put(creator, contributorCounts.getOrDefault(creator, 0) + 1);
    
    // Group annotations by page
    int page = annotation.getPageNumber();
    if (!annotationsByPage.containsKey(page)) {
        annotationsByPage.put(page, new ArrayList<>());
    }
    annotationsByPage.get(page).add(annotation);
}

Troubleshooting Tips

Empty Result: Ensure the document actually contains annotations; some file formats may not support all annotation types
Authentication Issues: Verify your JWT token is valid and not expired
File Not Found: Check that the file path is correct and the document exists in your cloud storage
Type Interpretation: The type property is returned as a numeric value; refer to the API documentation for a mapping of type values to annotation types

What You’ve Learned

In this tutorial, you’ve learned how to:

Extract annotations from documents using the GroupDocs.Annotation Cloud API
Process and interpret annotation data
Implement annotation extraction in different programming languages
Work with extracted annotation data for various business use cases

Further Practice

To enhance your understanding of annotation extraction, try these exercises:

Create a system that extracts annotations and saves them to a database
Build a dashboard that visualizes annotation activity across multiple documents
Implement a notification system that alerts users when their annotations receive replies
Create a document comparison tool that identifies differences in annotations between document versions

Next Steps

Continue your learning journey with these related tutorials: