Tutorial: How to Get Document Metadata with GroupDocs.Conversion Cloud
Learning Objectives
In this tutorial, you’ll learn:
- What document metadata is and why it’s important
- How to retrieve metadata from documents stored in the cloud
- How to interpret and use metadata information
- How to apply this knowledge in real-world scenarios
Prerequisites
Before starting this tutorial, you should have:
- A GroupDocs.Conversion Cloud account (get a free trial if needed)
- Your Client ID and Client Secret credentials
- Basic understanding of REST APIs
- Familiarity with your preferred programming language (C#, Java, PHP, etc.)
- Completed our Basic Document Conversion tutorial (recommended)
Why Access Document Metadata?
Document metadata provides valuable information about a file, including:
- Author and creation information
- Page count and dimensions
- Security settings (password protection)
- File size and format details
- Timestamps and version information
This information is essential for:
- Building document management systems
- Pre-conversion validation
- File cataloging and searching
- Document filtering and organization
- Security assessment
Tutorial Steps
Step 1: Understanding Document Metadata
Document metadata is information about a document that isn’t part of the content itself. Different document types have various metadata properties, but common ones include:
- File type/format
- Number of pages
- Document dimensions
- Creation and modification dates
- Author information
- File size
- Security settings
The GroupDocs.Conversion Cloud API allows you to retrieve this information easily from documents stored in your cloud storage.
Step 2: Set Up Your Environment
Let’s start by setting up our development environment:
Try it yourself:
- Create a new project in your preferred IDE
- Install the appropriate GroupDocs.Conversion Cloud SDK for your language
- Set up your authentication credentials:
// C# example
string MyClientSecret = ""; // Get from https://dashboard.groupdocs.cloud
string MyClientId = ""; // Get from https://dashboard.groupdocs.cloud
var configuration = new Configuration(MyClientId, MyClientSecret);
// Create necessary API instance
var infoApi = new InfoApi(configuration);
Step 3: Retrieve Document Metadata
Now, let’s retrieve metadata from a document in cloud storage:
Using cURL
# First get JSON Web Token
curl -v "https://api.groupdocs.cloud/connect/token" \
-X POST \
-d "grant_type=client_credentials&client_id=xxxx&client_secret=xxxx" \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "Accept: application/json"
# Get document metadata
curl -X GET "https://api.groupdocs.cloud/v2.0/conversion/info?FilePath=words/four-pages.docx" \
-H "accept: application/json" \
-H "authorization: Bearer [Access Token]"
Using C# SDK
try
{
// Create document metadata request
var request = new GetDocumentMetadataRequest
{
FilePath = "WordProcessing/four-pages.docx",
StorageName = "MyStorage" // Optional, default storage is used if not specified
};
// Execute request
var metadata = infoApi.GetDocumentMetadata(request);
// Display metadata information
Console.WriteLine($"File Type: {metadata.FileType}");
Console.WriteLine($"Page Count: {metadata.PageCount}");
Console.WriteLine($"File Size: {metadata.Size} bytes");
Console.WriteLine($"Created Date: {metadata.CreatedDate}");
Console.WriteLine($"Modified Date: {metadata.ModifiedDate}");
Console.WriteLine($"Is Password Protected: {metadata.IsPasswordProtected}");
// Display additional information if available
if (metadata.Width > 0 && metadata.Height > 0)
{
Console.WriteLine($"Dimensions: {metadata.Width} x {metadata.Height}");
}
if (!string.IsNullOrEmpty(metadata.Author))
{
Console.WriteLine($"Author: {metadata.Author}");
}
if (!string.IsNullOrEmpty(metadata.Title))
{
Console.WriteLine($"Title: {metadata.Title}");
}
}
catch (Exception e)
{
Console.WriteLine("Error retrieving document metadata: " + e.Message);
}
Step 4: Working with Password-Protected Documents
If you need to retrieve metadata from a password-protected document, you’ll need to provide the password:
Using C# SDK
try
{
// Create document metadata request with password
var request = new GetDocumentMetadataRequest
{
FilePath = "WordProcessing/password-protected.docx",
StorageName = "MyStorage",
Password = "your-document-password" // Add the document password
};
// Execute request
var metadata = infoApi.GetDocumentMetadata(request);
// Display metadata
Console.WriteLine($"Successfully accessed password-protected document");
Console.WriteLine($"File Type: {metadata.FileType}");
Console.WriteLine($"Page Count: {metadata.PageCount}");
// ... other metadata properties
}
catch (Exception e)
{
Console.WriteLine("Error retrieving document metadata: " + e.Message);
}
Step 5: Practical Applications
Now let’s look at some practical applications of document metadata:
Checking if Conversion is Needed
/// <summary>
/// Determines if a document needs to be converted based on its type
/// </summary>
bool NeedsConversion(string filePath, string targetFormat, string storageName = null)
{
try
{
// Get metadata
var request = new GetDocumentMetadataRequest
{
FilePath = filePath,
StorageName = storageName
};
var metadata = infoApi.GetDocumentMetadata(request);
// Check if already in target format
return !metadata.FileType.Equals(targetFormat, StringComparison.OrdinalIgnoreCase);
}
catch (Exception e)
{
Console.WriteLine("Error checking document: " + e.Message);
throw;
}
}
// Usage example
if (NeedsConversion("documents/sample.docx", "docx"))
{
Console.WriteLine("Document already in DOCX format, no conversion needed.");
}
else
{
Console.WriteLine("Document needs conversion.");
// Perform conversion
}
Pre-conversion Document Validation
/// <summary>
/// Validates a document before conversion
/// </summary>
bool ValidateDocumentForConversion(string filePath, string storageName = null)
{
try
{
// Get metadata
var request = new GetDocumentMetadataRequest
{
FilePath = filePath,
StorageName = storageName
};
var metadata = infoApi.GetDocumentMetadata(request);
// Perform validation checks
bool isValid = true;
// Check if file is too large (e.g., over 10MB)
if (metadata.Size > 10 * 1024 * 1024)
{
Console.WriteLine("Warning: File is very large, conversion may take longer.");
// isValid = false; // Uncomment to fail validation for large files
}
// Check if file has too many pages (e.g., over 100 pages)
if (metadata.PageCount > 100)
{
Console.WriteLine("Warning: Document has many pages, conversion may take longer.");
// isValid = false; // Uncomment to fail validation for many pages
}
// Check if file is password protected
if (metadata.IsPasswordProtected)
{
Console.WriteLine("Error: File is password protected. Please provide a password.");
isValid = false;
}
return isValid;
}
catch (Exception e)
{
Console.WriteLine("Error validating document: " + e.Message);
return false;
}
}
// Usage example
if (ValidateDocumentForConversion("documents/large-document.pdf"))
{
Console.WriteLine("Document passed validation, proceeding with conversion.");
// Proceed with conversion
}
else
{
Console.WriteLine("Document failed validation, conversion aborted.");
}
Step 6: Complete Implementation Example
Here’s a complete example that demonstrates retrieving and using document metadata in C#:
// For complete examples and data files, please go to https://github.com/groupdocs-conversion-cloud/groupdocs-conversion-cloud-dotnet-samples
using System;
using GroupDocs.Conversion.Cloud.Sdk.Api;
using GroupDocs.Conversion.Cloud.Sdk.Client;
using GroupDocs.Conversion.Cloud.Sdk.Model;
using GroupDocs.Conversion.Cloud.Sdk.Model.Requests;
namespace DocumentMetadataTutorial
{
class Program
{
static void Main(string[] args)
{
// Get your client ID and client secret from https://dashboard.groupdocs.cloud
string MyClientSecret = "YOUR_CLIENT_SECRET";
string MyClientId = "YOUR_CLIENT_ID";
// Create API instances
var configuration = new Configuration(MyClientId, MyClientSecret);
var infoApi = new InfoApi(configuration);
var storageApi = new StorageApi(configuration);
var convertApi = new ConvertApi(configuration);
try
{
// Define document path in cloud storage
string filePath = "WordProcessing/four-pages.docx";
Console.WriteLine("Retrieving document metadata...");
// Create metadata request
var request = new GetDocumentMetadataRequest
{
FilePath = filePath,
StorageName = "MyStorage" // Optional, default storage is used if not specified
};
// Execute request
var metadata = infoApi.GetDocumentMetadata(request);
// Display metadata information
Console.WriteLine("\nDocument Metadata:");
Console.WriteLine($"---------------------------");
Console.WriteLine($"File Type: {metadata.FileType}");
Console.WriteLine($"Page Count: {metadata.PageCount}");
Console.WriteLine($"File Size: {metadata.Size} bytes");
Console.WriteLine($"Created Date: {metadata.CreatedDate}");
Console.WriteLine($"Modified Date: {metadata.ModifiedDate}");
Console.WriteLine($"Is Password Protected: {metadata.IsPasswordProtected}");
// Display additional information if available
if (metadata.Width > 0 && metadata.Height > 0)
{
Console.WriteLine($"Dimensions: {metadata.Width} x {metadata.Height}");
}
if (!string.IsNullOrEmpty(metadata.Author))
{
Console.WriteLine($"Author: {metadata.Author}");
}
if (!string.IsNullOrEmpty(metadata.Title))
{
Console.WriteLine($"Title: {metadata.Title}");
}
// Demonstrate practical use of metadata
Console.WriteLine("\nPractical application example:");
// Check if PDF conversion is needed based on file type
if (metadata.FileType.Equals("pdf", StringComparison.OrdinalIgnoreCase))
{
Console.WriteLine("Document is already in PDF format, no conversion needed.");
}
else
{
Console.WriteLine("Document is not in PDF format, conversion would be required.");
// Check if document meets conversion requirements
if (metadata.IsPasswordProtected)
{
Console.WriteLine("Warning: Document is password protected. Password required for conversion.");
}
if (metadata.PageCount > 50)
{
Console.WriteLine("Note: Document has many pages, conversion may take longer.");
}
}
}
catch (Exception e)
{
Console.WriteLine("Error: " + e.Message);
}
Console.WriteLine("\nPress any key to exit.");
Console.ReadKey();
}
}
}
SDK Examples for Other Languages
Java Example
// For complete examples and data files, please go to https://github.com/groupdocs-conversion-cloud/groupdocs-conversion-cloud-java-samples
InfoApi apiInstance = new InfoApi(Utils.AppSID, Utils.AppKey);
try {
GetDocumentMetadataRequest request = new GetDocumentMetadataRequest();
request.setStorageName(Utils.MYStorage);
request.setFilePath("conversions/sample.docx");
DocumentMetadata response = apiInstance.getDocumentMetadata(request);
System.out.println("File Type: " + response.getFileType());
System.out.println("Page Count: " + response.getPageCount());
System.out.println("Size: " + response.getSize() + " bytes");
System.out.println("Created Date: " + response.getCreatedDate());
} catch (ApiException e) {
System.err.println("Exception while calling InfoApi:");
e.printStackTrace();
}
PHP Example
<?php
include(dirname(__DIR__) . '\CommonUtils.php');
try {
$infoApi = CommonUtils::GetInfoApiInstance();
$request = new GroupDocs\Conversion\Model\Requests\GetDocumentMetadataRequest();
$request->setStorageName(CommonUtils::$MyStorage);
$request->setFilePath("conversions/sample.docx");
$metadata = $infoApi->getDocumentMetadata($request);
echo "File Type: " . $metadata->getFileType() . "<br/>";
echo "Page Count: " . $metadata->getPageCount() . "<br/>";
echo "Size: " . $metadata->getSize() . " bytes<br/>";
echo "Is Password Protected: " . ($metadata->getIsPasswordProtected() ? "Yes" : "No") . "<br/>";
} catch (Exception $e) {
echo "Something went wrong: " . $e->getMessage();
}
?>
Python Example
# Import modules
import groupdocs_conversion_cloud
from Common_Utilities.Utils import Common_Utilities
# Create instance of the API
api = Common_Utilities.Get_InfoApi_Instance()
try:
request = groupdocs_conversion_cloud.GetDocumentMetadataRequest()
request.storage_name = Common_Utilities.myStorage
request.file_path = "conversions/sample.docx"
metadata = api.get_document_metadata(request)
print("Document Information:")
print(f"File Type: {metadata.file_type}")
print(f"Page Count: {metadata.page_count}")
print(f"Size: {metadata.size} bytes")
print(f"Created Date: {metadata.created_date}")
print(f"Is Password Protected: {metadata.is_password_protected}")
except groupdocs_conversion_cloud.ApiException as e:
print("Exception while calling API: {0}".format(e.message))
Troubleshooting Tips
- File Not Found Errors: Ensure the file path in cloud storage is correct and the file exists.
- Authentication Issues: Verify your Client ID and Client Secret are correct.
- Permission Errors: Make sure your account has access to the specified storage.
- Password Required: If you get an “invalid password” error, the document is password-protected. Provide the correct password in your request.
What You’ve Learned
In this tutorial, you’ve learned how to:
- Retrieve metadata from documents stored in GroupDocs cloud storage
- Access key document properties like file type, page count, and security settings
- Handle password-protected documents
- Apply metadata information in practical scenarios like pre-conversion validation
Further Practice
To reinforce your learning, try these exercises:
- Create a document inventory tool that scans a folder in cloud storage and reports metadata for all documents
- Build a simple web interface that displays document metadata for uploaded files
- Implement a document filter that categorizes files by type, size, or page count based on metadata
Next Steps
Congratulations! You’ve completed our Quick Start Guide tutorial series for GroupDocs.Conversion Cloud API. You now have the skills to:
- Check supported document formats
- Convert documents using cloud storage
- Perform direct document conversions
- Retrieve and utilize document metadata
You’re now ready to implement these features in your own applications!
Helpful Resources
We’d love to hear your feedback on this tutorial series! Please feel free to ask questions or suggest improvements on our support forum.