Tutorial: How to Retrieve Document Information with GroupDocs.Viewer Cloud API

Learning Objectives

In this tutorial, you’ll learn how to:

Retrieve basic information about documents using GroupDocs.Viewer Cloud API
Extract document properties like format, page count, and page dimensions
Get text coordinates for implementing text search and selection
Access attachment information for documents with embedded files
Utilize document information in your applications

Prerequisites

Before starting this tutorial, you should have:

A GroupDocs.Viewer Cloud account (get your free trial here)
Your Client ID and Client Secret
Basic understanding of REST APIs
Familiarity with your programming language of choice (C#, Java, Python, PHP, Ruby, Node.js, or Go)
A document for testing (we’ll use a sample DOCX in this tutorial)

Why Document Information Matters

Retrieving document information is often a crucial first step in document processing workflows. Having accurate metadata allows you to:

Plan rendering operations: Knowing page count and dimensions helps properly configure viewing options
Implement pagination: Page information enables efficient navigation in multi-page documents
Enable text search: Text coordinates make text selection and search possible
Handle attachments: Awareness of embedded files allows for complete document processing
Validate compatibility: File format detection ensures the document can be properly processed

GroupDocs.Viewer Cloud API provides a simple and efficient way to extract this valuable information before you begin the rendering process.

Step 1: Upload Your Document to Cloud Storage

Before retrieving document information, you need to upload a document to GroupDocs.Viewer Cloud storage.

# First get JSON Web Token
curl -v "https://api.groupdocs.cloud/connect/token" \
-X POST \
-d "grant_type=client_credentials&client_id=YOUR_CLIENT_ID&client_secret=YOUR_CLIENT_SECRET" \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "Accept: application/json"

# Store JWT in a variable for reuse
JWT="YOUR_JWT_TOKEN"

# Upload file to storage
curl -v "https://api.groupdocs.cloud/v2.0/viewer/storage/file/SampleFiles/sample.docx" \
-X PUT \
-H "Content-Type: multipart/form-data" \
-H "Accept: application/json" \
-H "Authorization: Bearer $JWT" \
--data-binary "@/path/to/your/sample.docx"

Step 2: Retrieve Document Information

Now, let’s use the API to retrieve comprehensive information about the document:

curl -v "https://api.groupdocs.cloud/v2.0/viewer/info" \
-X POST \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-H "Authorization: Bearer $JWT" \
-d "{
      'FileInfo': {
         'FilePath': 'SampleFiles/sample.docx'
      },
      'ViewFormat': 'HTML'
}"

Understanding the Response

The API returns detailed information about the document:

{
  "formatExtension": ".docx",
  "format": "Microsoft Word Open XML Document",
  "pages": [
    {
      "number": 1,
      "width": 595,
      "height": 841,
      "visible": true,
      "lines": []
    },
    {
      "number": 2,
      "width": 595,
      "height": 841,
      "visible": true,
      "lines": []
    },
    {
      "number": 3,
      "width": 595,
      "height": 841,
      "visible": true,
      "lines": []
    }
  ],
  "attachments": [],
  "archiveViewInfo": null,
  "cadViewInfo": null,
  "projectManagementViewInfo": null,
  "outlookViewInfo": null,
  "pdfViewInfo": null
}

Key information included in the response:

Format: The document’s format name and extension
Pages: Details about each page, including number, dimensions, and visibility
Attachments: List of any embedded files (empty for our example document)
Specialized Info: Additional format-specific information for special formats (null for our example)

Step 3: Retrieving Text Coordinates

If you need text coordinates for implementing text selection or search functionality, you can set the ExtractText parameter to true:

curl -v "https://api.groupdocs.cloud/v2.0/viewer/info" \
-X POST \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-H "Authorization: Bearer $JWT" \
-d "{
      'FileInfo': {
         'FilePath': 'SampleFiles/sample.docx'
      },
      'ViewFormat': 'PNG',
      'RenderOptions': {
         'ExtractText': true
      }
}"

With text extraction enabled, the response will include detailed text information:

{
  "formatExtension": ".docx",
  "format": "Microsoft Word Open XML Document",
  "pages": [
    {
      "number": 1,
      "width": 595,
      "height": 841,
      "visible": true,
      "lines": [
        {
          "words": [
            {
              "characters": [
                {
                  "x": 229.607,
                  "y": 67.8,
                  "width": 10.674,
                  "height": 19.8,
                  "value": "T"
                },
                // More characters...
              ],
              "x": 229.607,
              "y": 67.8,
              "width": 39.721,
              "height": 19.8,
              "value": "This"
            },
            // More words...
          ],
          "x": 229.607,
          "y": 67.8,
          "width": 136.387,
          "height": 19.8,
          "value": "This is a sample"
        },
        // More lines...
      ]
    },
    // More pages...
  ],
  "attachments": [],
  "archiveViewInfo": null,
  "cadViewInfo": null,
  "projectManagementViewInfo": null,
  "outlookViewInfo": null,
  "pdfViewInfo": null
}

This hierarchical structure provides coordinates for:

Lines of text
Words within each line
Individual characters within each word

Step 4: Working with Document Attachments

For documents that contain attachments (like emails or archives), the API will return information about these embedded files:

curl -v "https://api.groupdocs.cloud/v2.0/viewer/info" \
-X POST \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-H "Authorization: Bearer $JWT" \
-d "{
      'FileInfo': {
         'FilePath': 'SampleFiles/with_attachments.msg'
      },
      'ViewFormat': 'HTML'
}"

For a document with attachments, the response will include attachment details:

{
  "formatExtension": ".msg",
  "format": "Microsoft Outlook Message",
  "pages": [
    {
      "number": 1,
      "width": 612,
      "height": 792,
      "visible": true,
      "lines": []
    }
  ],
  "attachments": [
    {
      "name": "attachment-image.png",
      "filePath": null
    },
    {
      "name": "attachment-word.doc",
      "filePath": null
    }
  ],
  "archiveViewInfo": null,
  "cadViewInfo": null,
  "projectManagementViewInfo": null,
  "outlookViewInfo": {
    "folders": []
  },
  "pdfViewInfo": null
}

Step 5: Implement in Your Application

Now let’s implement document information retrieval in a real application using one of our supported SDKs.

C# Example

using GroupDocs.Viewer.Cloud.Sdk.Api;
using GroupDocs.Viewer.Cloud.Sdk.Client;
using GroupDocs.Viewer.Cloud.Sdk.Model;
using GroupDocs.Viewer.Cloud.Sdk.Model.Requests;
using System;
using System.Collections.Generic;
using System.IO;

namespace GroupDocs.Viewer.Cloud.Tutorial
{
    class Program
    {
        static void Main(string[] args)
        {
            // Get your client ID and client secret from https://dashboard.groupdocs.cloud/
            string MyClientId = "YOUR_CLIENT_ID";
            string MyClientSecret = "YOUR_CLIENT_SECRET";

            // Create API instance
            var configuration = new Configuration(MyClientId, MyClientSecret);
            var apiInstance = new InfoApi(configuration);

            // Define view options
            var viewOptions = new ViewOptions
            {
                FileInfo = new FileInfo
                {
                    FilePath = "SampleFiles/sample.docx"
                },
                ViewFormat = ViewOptions.ViewFormatEnum.HTML
            };

            try
            {
                // Call the API to get document information
                var response = apiInstance.GetInfo(new GetInfoRequest(viewOptions));
                
                Console.WriteLine("Document information retrieved successfully!");
                
                // Display basic document information
                Console.WriteLine($"Format: {response.Format} ({response.FormatExtension})");
                Console.WriteLine($"Total pages: {response.Pages.Count}");
                
                // Display information about each page
                Console.WriteLine("\nPage information:");
                foreach (var page in response.Pages)
                {
                    Console.WriteLine($"  Page {page.Number}: {page.Width}x{page.Height} pixels, Visible: {page.Visible}");
                }
                
                // Display attachment information if any
                if (response.Attachments != null && response.Attachments.Count > 0)
                {
                    Console.WriteLine("\nAttachments:");
                    foreach (var attachment in response.Attachments)
                    {
                        Console.WriteLine($"  {attachment.Name}");
                    }
                }
                else
                {
                    Console.WriteLine("\nNo attachments found.");
                }
                
                // Display format-specific information if available
                if (response.PdfViewInfo != null)
                {
                    Console.WriteLine("\nPDF-specific information available.");
                }
                if (response.ArchiveViewInfo != null)
                {
                    Console.WriteLine("\nArchive-specific information available.");
                }
                if (response.CadViewInfo != null)
                {
                    Console.WriteLine("\nCAD-specific information available.");
                }
                if (response.ProjectManagementViewInfo != null)
                {
                    Console.WriteLine("\nProject Management-specific information available.");
                }
                if (response.OutlookViewInfo != null)
                {
                    Console.WriteLine("\nOutlook-specific information available.");
                }
            }
            catch (Exception e)
            {
                Console.WriteLine("Exception while calling InfoApi: " + e.Message);
            }
            
            Console.WriteLine("\nPress any key to exit...");
            Console.ReadKey();
        }
    }
}

Python Example

# Import modules
import groupdocs_viewer_cloud

# Get your client ID and client secret from https://dashboard.groupdocs.cloud/
client_id = "YOUR_CLIENT_ID"
client_secret = "YOUR_CLIENT_SECRET"

# Create API instance
api_instance = groupdocs_viewer_cloud.InfoApi.from_keys(client_id, client_secret)

# Define view options
view_options = groupdocs_viewer_cloud.ViewOptions()
view_options.file_info = groupdocs_viewer_cloud.FileInfo()
view_options.file_info.file_path = "SampleFiles/sample.docx"
view_options.view_format = "HTML"

try:
    # Call the API to get document information
    request = groupdocs_viewer_cloud.GetInfoRequest(view_options)
    response = api_instance.get_info(request)
    
    print("Document information retrieved successfully!")
    
    # Display basic document information
    print(f"Format: {response.format} ({response.format_extension})")
    print(f"Total pages: {len(response.pages)}")
    
    # Display information about each page
    print("\nPage information:")
    for page in response.pages:
        print(f"  Page {page.number}: {page.width}x{page.height} pixels, Visible: {page.visible}")
    
    # Display attachment information if any
    if response.attachments and len(response.attachments) > 0:
        print("\nAttachments:")
        for attachment in response.attachments:
            print(f"  {attachment.name}")
    else:
        print("\nNo attachments found.")
    
    # Display format-specific information if available
    if response.pdf_view_info:
        print("\nPDF-specific information available.")
    if response.archive_view_info:
        print("\nArchive-specific information available.")
    if response.cad_view_info:
        print("\nCAD-specific information available.")
    if response.project_management_view_info:
        print("\nProject Management-specific information available.")
    if response.outlook_view_info:
        print("\nOutlook-specific information available.")

except groupdocs_viewer_cloud.ApiException as e:
    print(f"Exception while calling InfoApi: {e}")

Step 6: Extracting Text Coordinates for Selection

If you need to implement text selection functionality, you’ll need detailed text coordinates. Let’s modify our code to extract text data:

C# Example with Text Extraction

// Define view options with text extraction enabled
var viewOptions = new ViewOptions
{
    FileInfo = new FileInfo
    {
        FilePath = "SampleFiles/sample.docx"
    },
    ViewFormat = ViewOptions.ViewFormatEnum.PNG,
    RenderOptions = new ImageOptions
    {
        ExtractText = true
    }
};

try
{
    // Call the API to get document information with text coordinates
    var response = apiInstance.GetInfo(new GetInfoRequest(viewOptions));
    
    Console.WriteLine("Document information with text coordinates retrieved successfully!");
    
    // Process text information for the first page (as an example)
    if (response.Pages.Count > 0 && response.Pages[0].Lines != null && response.Pages[0].Lines.Count > 0)
    {
        Console.WriteLine("\nText content of the first page:");
        
        foreach (var line in response.Pages[0].Lines)
        {
            Console.WriteLine($"Line: \"{line.Value}\"");
            Console.WriteLine($"  Position: X={line.X}, Y={line.Y}, Width={line.Width}, Height={line.Height}");
            
            Console.WriteLine("  Words:");
            foreach (var word in line.Words)
            {
                Console.WriteLine($"    Word: \"{word.Value}\"");
                Console.WriteLine($"      Position: X={word.X}, Y={word.Y}, Width={word.Width}, Height={word.Height}");
                
                if (word.Characters != null && word.Characters.Count > 0)
                {
                    Console.WriteLine("      Characters:");
                    foreach (var character in word.Characters)
                    {
                        Console.WriteLine($"        Char: '{character.Value}' at X={character.X}, Y={character.Y}");
                    }
                }
            }
            Console.WriteLine();
        }
    }
    else
    {
        Console.WriteLine("No text data found in the document.");
    }
}
catch (Exception e)
{
    Console.WriteLine("Exception while calling InfoApi: " + e.Message);
}

Practical Applications

Let’s explore some practical applications of document information retrieval:

Creating a Document Preview

With document information, you can generate a preview that displays basic metadata:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Document Preview</title>
    <style>
        body {
            font-family: Arial, sans-serif;
            max-width: 800px;
            margin: 0 auto;
            padding: 20px;
        }
        .document-info {
            border: 1px solid #ddd;
            padding: 20px;
            border-radius: 5px;
            margin-bottom: 20px;
        }
        .document-info h2 {
            margin-top: 0;
            color: #333;
        }
        .info-row {
            display: flex;
            margin-bottom: 10px;
        }
        .info-label {
            width: 150px;
            font-weight: bold;
        }
        .pages-list {
            display: flex;
            flex-wrap: wrap;
            gap: 10px;
            margin-top: 20px;
        }
        .page-thumbnail {
            border: 1px solid #ddd;
            padding: 5px;
            width: 100px;
            height: 150px;
            display: flex;
            align-items: center;
            justify-content: center;
            background-color: #f9f9f9;
        }
        .attachments-list {
            margin-top: 20px;
        }
        .attachment-item {
            display: flex;
            align-items: center;
            margin-bottom: 10px;
        }
        .attachment-icon {
            margin-right: 10px;
            font-size: 20px;
        }
    </style>
</head>
<body>
    <div class="document-info">
        <h2>Document Information</h2>
        
        <div class="info-row">
            <div class="info-label">Format:</div>
            <div>Microsoft Word Open XML Document (.docx)</div>
        </div>
        
        <div class="info-row">
            <div class="info-label">Total Pages:</div>
            <div>3</div>
        </div>
        
        <div class="info-row">
            <div class="info-label">Page Dimensions:</div>
            <div>595x841 pixels</div>
        </div>
    </div>
    
    <h3>Page Thumbnails</h3>
    <div class="pages-list">
        <div class="page-thumbnail">Page 1</div>
        <div class="page-thumbnail">Page 2</div>
        <div class="page-thumbnail">Page 3</div>
    </div>
    
    <h3>Attachments</h3>
    <div class="attachments-list">
        <div class="attachment-item">
            <div class="attachment-icon">📄</div>
            <div>No attachments found</div>
        </div>
    </div>
</body>
</html>

Planning a Document Viewer UI

You can use document information to intelligently configure your document viewer:

// Example pseudo-code for a document viewer application
function configureDocumentViewer(documentInfo) {
    // Set the total number of pages for pagination
    viewer.setTotalPages(documentInfo.pages.length);
    
    // Set the initial zoom based on the page dimensions
    if (documentInfo.pages.length > 0) {
        const firstPage = documentInfo.pages[0];
        const aspectRatio = firstPage.width / firstPage.height;
        viewer.setOptimalZoom(aspectRatio);
    }
    
    // Configure attachments panel if needed
    if (documentInfo.attachments && documentInfo.attachments.length > 0) {
        viewer.showAttachmentsPanel();
        viewer.setAttachments(documentInfo.attachments);
    } else {
        viewer.hideAttachmentsPanel();
    }
    
    // Configure text selection if text coordinates are available
    const hasTextData = documentInfo.pages.some(page => 
        page.lines && page.lines.length > 0
    );
    
    if (hasTextData) {
        viewer.enableTextSelection(documentInfo.pages);
    } else {
        viewer.disableTextSelection();
    }
    
    // Configure special features based on document type
    if (documentInfo.pdfViewInfo) {
        viewer.enablePdfFeatures();
    }
    if (documentInfo.cadViewInfo) {
        viewer.enableCadFeatures();
    }
}

Try It Yourself

Now that you’ve learned how to retrieve document information using GroupDocs.Viewer Cloud API, try it with different document types:

Exercise 1: Compare Document Types

Retrieve information for different document types (PDF, DOCX, XLSX, PPTX)
Compare the structure and properties returned for each format
Note any format-specific information returned

Exercise 2: Implement Text Search

Retrieve document information with text coordinates enabled
Create a simple search function that highlights matching text based on the coordinates
Test with different search terms

Troubleshooting Tips

Authentication Issues: Ensure your Client ID and Client Secret are correct and that you’re generating a fresh JWT token
File Not Found: Verify that the file path in your request matches the actual path in cloud storage
Text Extraction: Text extraction is only available for certain document formats and requires setting the correct view format (PNG or JPG)
Large Documents: For very large documents, consider retrieving information for specific pages rather than the entire document

What You’ve Learned

In this tutorial, you’ve learned:

How to retrieve comprehensive document information using GroupDocs.Viewer Cloud API
How to extract text with coordinates for implementing text selection
How to access information about document attachments
How to use document information to plan and configure document viewing experiences
How to implement these features in your applications using SDKs

Next Steps

Ready to explore more document viewing capabilities? Check out these related tutorials:

Helpful Resources

Feedback and Questions

Have questions about retrieving document information? Need help implementing it in your application? We welcome your feedback and questions on our support forum.