Tutorial: How to Extract Annotations from Documents
Learning Objectives
In this tutorial, you’ll learn how to:
- Extract all annotations from an annotated document
- Parse and process annotation data
- Access annotation properties and metadata
- Implement annotation extraction in different programming languages
What is Annotation Extraction?
Annotation extraction allows you to retrieve all annotations from a document as structured data. This is useful for analyzing annotations, generating reports, implementing approval workflows, or building collaborative review systems.
Prerequisites
Before starting this tutorial, ensure you have:
- A GroupDocs.Annotation Cloud account (or get a free trial
- Your Client ID and Client Secret credentials
- A development environment for your preferred language
- An annotated document uploaded to your GroupDocs.Annotation Cloud storage
Implementation Steps
Let’s walk through the process of extracting annotations from a document:
1. Authentication
First, authenticate with the GroupDocs.Annotation Cloud API:
// Get JWT token
curl -v "https://api.groupdocs.cloud/connect/token" \
-X POST \
-d "grant_type=client_credentials&client_id=YOUR_CLIENT_ID&client_secret=YOUR_CLIENT_SECRET" \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "Accept: application/json"
Save the received JWT token for subsequent API calls.
2. Extract Annotations
Use the POST /annotation/extract
endpoint to retrieve all annotations:
// cURL example to extract annotations
curl -v "https://api.groupdocs.cloud/v2.0/annotation/extract" \
-X POST \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-d "{ \"FilePath\": \"annotated-document.docx\"}"
3. Process the Response
The API returns an array of annotation objects with all properties. Here’s a sample response:
[
{
"id": 0,
"text": "This is ellipse annotation",
"textToReplace": null,
"horizontalAlignment": 0,
"verticalAlignment": 0,
"creatorId": 0,
"creatorName": "John Doe",
"creatorEmail": null,
"box": {
"x": 100,
"y": 100,
"width": 100,
"height": 100
},
"points": null,
"pageNumber": 0,
"annotationPosition": null,
"svgPath": null,
"type": 4,
"replies": [
{
"id": 0,
"userId": 0,
"userName": null,
"userEmail": null,
"comment": "First comment",
"repliedOn": "2023-04-25T06:52:01.376Z",
"parentReplyId": 0
},
{
"id": 0,
"userId": 0,
"userName": null,
"userEmail": null,
"comment": "Second comment",
"repliedOn": "2023-04-25T06:52:01.376Z",
"parentReplyId": 0
}
],
"createdOn": "2023-04-25T06:52:01.376Z",
"fontColor": null,
"penColor": null,
"penWidth": null,
"penStyle": null,
"backgroundColor": null,
"fontFamily": null,
"fontSize": null,
"opacity": null,
"angle": null,
"url": null,
"imagePath": null
}
]
The response contains detailed information about each annotation, including:
- Annotation ID and type
- Text content
- Position information (box, points)
- Page number
- Creator information
- Creation date
- Replies/comments
- Style properties (colors, opacity, etc.)
Try It Yourself
Now, let’s implement annotation extraction in different programming languages.
C# Example
// For complete examples, visit: https://github.com/groupdocs-annotation-cloud/groupdocs-annotation-cloud-dotnet-samples
string MyAppKey = "YOUR_APP_KEY"; // Get AppKey and AppSID from https://dashboard.groupdocs.cloud
string MyAppSid = "YOUR_APP_SID";
var configuration = new Configuration(MyAppSid, MyAppKey);
var apiInstance = new AnnotateApi(configuration);
var fileInfo = new FileInfo { FilePath = "annotated-document.docx" };
// Extract annotations
var response = apiInstance.Extract(new ExtractRequest(fileInfo));
Console.WriteLine("Extracted annotations count: " + response.Count);
// Process the annotations
foreach (var annotation in response)
{
Console.WriteLine($"Annotation ID: {annotation.Id}");
Console.WriteLine($"Type: {annotation.Type}");
Console.WriteLine($"Text: {annotation.Text}");
Console.WriteLine($"Page Number: {annotation.PageNumber}");
Console.WriteLine($"Creator: {annotation.CreatorName}");
Console.WriteLine($"Created On: {annotation.CreatedOn}");
// Process replies if present
if (annotation.Replies != null && annotation.Replies.Count > 0)
{
Console.WriteLine("Replies:");
foreach (var reply in annotation.Replies)
{
Console.WriteLine($" - {reply.Comment} (by: {reply.UserName ?? "Anonymous"}, on: {reply.RepliedOn})");
}
}
Console.WriteLine("-------------------");
}
Java Example
// For complete examples, visit: https://github.com/groupdocs-annotation-cloud/groupdocs-annotation-cloud-java-samples
String clientId = "YOUR_CLIENT_ID"; // Get ClientId and ClientSecret from https://dashboard.groupdocs.cloud
String clientSecret = "YOUR_CLIENT_SECRET";
Configuration configuration = new Configuration(clientId, clientSecret);
AnnotateApi apiInstance = new AnnotateApi(configuration);
// Create request object
FileInfo fileInfo = new FileInfo();
fileInfo.setFilePath("annotated-document.docx");
ExtractRequest request = new ExtractRequest();
request.setfileInfo(fileInfo);
// Execute API method
List<AnnotationInfo> response = apiInstance.extract(request);
System.out.println("Extracted annotations count: " + response.size());
// Process the annotations
for (AnnotationInfo annotation : response) {
System.out.println("Annotation ID: " + annotation.getId());
System.out.println("Type: " + annotation.getType());
System.out.println("Text: " + annotation.getText());
System.out.println("Page Number: " + annotation.getPageNumber());
System.out.println("Creator: " + annotation.getCreatorName());
System.out.println("Created On: " + annotation.getCreatedOn());
// Process replies if present
if (annotation.getReplies() != null && !annotation.getReplies().isEmpty()) {
System.out.println("Replies:");
for (AnnotationReplyInfo reply : annotation.getReplies()) {
System.out.println(" - " + reply.getComment() +
" (by: " + (reply.getUserName() != null ? reply.getUserName() : "Anonymous") +
", on: " + reply.getRepliedOn() + ")");
}
}
System.out.println("-------------------");
}
Python Example
# For complete examples, visit: https://github.com/groupdocs-annotation-cloud/groupdocs-annotation-cloud-python-samples
import groupdocs_annotation_cloud
app_sid = "YOUR_APP_SID" # Get AppKey and AppSID from https://dashboard.groupdocs.cloud
app_key = "YOUR_APP_KEY"
api = groupdocs_annotation_cloud.AnnotateApi.from_keys(app_sid, app_key)
# Set up file info for the annotated document
file_info = groupdocs_annotation_cloud.FileInfo()
file_info.file_path = "annotated-document.docx"
# Create extract request
request = groupdocs_annotation_cloud.ExtractRequest(file_info)
result = api.extract(request)
print(f"Extracted annotations count: {len(result)}")
# Process the annotations
for annotation in result:
print(f"Annotation ID: {annotation.id}")
print(f"Type: {annotation.type}")
print(f"Text: {annotation.text}")
print(f"Page Number: {annotation.page_number}")
print(f"Creator: {annotation.creator_name}")
print(f"Created On: {annotation.created_on}")
# Process replies if present
if annotation.replies and len(annotation.replies) > 0:
print("Replies:")
for reply in annotation.replies:
print(f" - {reply.comment} (by: {reply.user_name or 'Anonymous'}, on: {reply.replied_on})")
print("-------------------")
Working with Extracted Annotations
Here are some common use cases for extracted annotation data:
1. Building Approval Workflows
Extract annotations to identify approvals, rejections, or requests for changes in a review process:
# Pseudocode for approval workflow
approval_status = "Pending"
for annotation in extracted_annotations:
if "APPROVED" in annotation.text.upper():
approval_status = "Approved"
approver = annotation.creator_name
approval_date = annotation.created_on
elif "REJECTED" in annotation.text.upper():
approval_status = "Rejected"
rejection_reason = annotation.text
rejector = annotation.creator_name
2. Generating Annotation Reports
Create summary reports of all annotations in a document:
// Pseudocode for generating report
StringBuilder report = new StringBuilder();
report.AppendLine("Annotation Report - " + DateTime.Now);
report.AppendLine("Document: " + filePath);
report.AppendLine("Total Annotations: " + annotations.Count);
report.AppendLine();
var annotationsByPage = annotations.GroupBy(a => a.PageNumber);
foreach (var pageGroup in annotationsByPage)
{
report.AppendLine($"Page {pageGroup.Key + 1}:");
foreach (var annotation in pageGroup)
{
report.AppendLine($" - {annotation.Type}: {annotation.Text} (by {annotation.CreatorName})");
}
}
3. Collaborative Review Analysis
Analyze collaboration patterns based on annotation data:
// Pseudocode for collaboration analysis
Map<String, Integer> contributorCounts = new HashMap<>();
Map<Integer, List<AnnotationInfo>> annotationsByPage = new HashMap<>();
for (AnnotationInfo annotation : annotations) {
// Count contributions by user
String creator = annotation.getCreatorName();
contributorCounts.put(creator, contributorCounts.getOrDefault(creator, 0) + 1);
// Group annotations by page
int page = annotation.getPageNumber();
if (!annotationsByPage.containsKey(page)) {
annotationsByPage.put(page, new ArrayList<>());
}
annotationsByPage.get(page).add(annotation);
}
Troubleshooting Tips
- Empty Result: Ensure the document actually contains annotations; some file formats may not support all annotation types
- Authentication Issues: Verify your JWT token is valid and not expired
- File Not Found: Check that the file path is correct and the document exists in your cloud storage
- Type Interpretation: The
type
property is returned as a numeric value; refer to the API documentation for a mapping of type values to annotation types
What You’ve Learned
In this tutorial, you’ve learned how to:
- Extract annotations from documents using the GroupDocs.Annotation Cloud API
- Process and interpret annotation data
- Implement annotation extraction in different programming languages
- Work with extracted annotation data for various business use cases
Further Practice
To enhance your understanding of annotation extraction, try these exercises:
- Create a system that extracts annotations and saves them to a database
- Build a dashboard that visualizes annotation activity across multiple documents
- Implement a notification system that alerts users when their annotations receive replies
- Create a document comparison tool that identifies differences in annotations between document versions
Next Steps
Continue your learning journey with these related tutorials: