Tutorial: Converting PDF Documents with Custom Options
In this tutorial, you’ll learn how to convert PDF documents to various formats using GroupDocs.Conversion Cloud API. You’ll master specialized PDF conversion options including annotation handling, security credentials, and advanced formatting control.
Learning Objectives
By the end of this tutorial, you will be able to:
- Convert PDF files to Word, Excel, images, and other formats
- Handle PDF security with password-protected documents
- Control annotation visibility during conversion
- Manage PDF form fields and embedded content
- Implement both storage-based and stream-based PDF conversions
- Troubleshoot common PDF conversion challenges
Prerequisites
Before starting this tutorial, you need:
- A GroupDocs.Conversion Cloud account
- Your Client ID and Client Secret credentials
- Basic understanding of REST API concepts
- Development environment with your preferred programming language set up
- Sample PDF files to test conversion (including some with annotations, form fields, and password protection)
Implementation Steps
Step 1: Authentication with GroupDocs.Conversion Cloud API
Before performing any operations, we need to authenticate with the API using your Client ID and Client Secret.
Try it yourself
First, let’s obtain a JWT access token using cURL:
# First get JSON Web Token
curl -v "https://api.groupdocs.cloud/connect/token" \
-X POST \
-d "grant_type=client_credentials&client_id=YOUR_CLIENT_ID&client_secret=YOUR_CLIENT_SECRET" \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "Accept: application/json"
Make sure to replace YOUR_CLIENT_ID
and YOUR_CLIENT_SECRET
with your actual credentials.
Step 2: Basic PDF to Word Conversion
Let’s start with a simple conversion from a PDF file to DOCX format:
Try it yourself
Using cURL:
curl -X POST "https://api.groupdocs.cloud/v2.0/conversion" \
-H "accept: application/json" \
-H "authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d "{
'FilePath': 'documents/sample.pdf',
'Format': 'docx',
'OutputPath': 'converted'
}"
Replace YOUR_JWT_TOKEN
with the actual token received in Step 1.
Step 3: Converting PDF to Word with Annotation Control
Now let’s implement a comprehensive example that converts a PDF to Word while handling annotations:
// C# SDK Example
using System;
using System.Collections.Generic;
using GroupDocs.Conversion.Cloud.Sdk.Api;
using GroupDocs.Conversion.Cloud.Sdk.Client;
using GroupDocs.Conversion.Cloud.Sdk.Model;
using GroupDocs.Conversion.Cloud.Sdk.Model.Requests;
namespace PdfConversionTutorial
{
class Program
{
static void Main(string[] args)
{
// Configure API client
var configuration = new Configuration("YOUR_CLIENT_ID", "YOUR_CLIENT_SECRET");
var apiInstance = new ConvertApi(configuration);
try
{
// Set up conversion from PDF to Word with annotation control
var settings = new ConvertSettings
{
StorageName = "MyStorage",
FilePath = "documents/annotated.pdf",
Format = "docx",
// PDF-specific load options
LoadOptions = new PdfLoadOptions()
{
Password = "", // If PDF is password-protected
HidePdfAnnotations = true, // Hide annotations in output
FlattenAllFields = true, // Flatten form fields
RemoveEmbeddedFiles = true // Remove embedded files for cleaner output
},
// Word-specific convert options
ConvertOptions = new DocxConvertOptions()
{
FromPage = 1,
PagesCount = 0, // All pages
Dpi = 300
},
OutputPath = "converted"
};
// Execute conversion
List<StoredConvertedResult> response = apiInstance.ConvertDocument(
new ConvertDocumentRequest(settings));
Console.WriteLine("PDF document converted successfully to DOCX: " + response[0].Url);
}
catch (Exception e)
{
Console.WriteLine("Error: " + e.Message);
}
}
}
}
Step 4: Converting Password-Protected PDF Documents
For PDFs that require a password to open:
// Java SDK Example
import com.groupdocs.cloud.conversion.api.*;
import com.groupdocs.cloud.conversion.client.ApiException;
import com.groupdocs.cloud.conversion.model.*;
import com.groupdocs.cloud.conversion.model.requests.*;
import java.util.List;
public class ProtectedPdfExample {
public static void main(String[] args) {
// Configure API client
String clientId = "YOUR_CLIENT_ID";
String clientSecret = "YOUR_CLIENT_SECRET";
Configuration configuration = new Configuration(clientId, clientSecret);
ConvertApi apiInstance = new ConvertApi(configuration);
try {
// Prepare convert settings for protected PDF
ConvertSettings settings = new ConvertSettings();
settings.setFilePath("documents/secured.pdf");
settings.setFormat("docx");
// Set PDF-specific load options with password
PdfLoadOptions loadOptions = new PdfLoadOptions();
loadOptions.setPassword("yourpassword"); // Password to open the PDF
loadOptions.setFlattenAllFields(true);
loadOptions.setHidePdfAnnotations(true);
settings.setLoadOptions(loadOptions);
// Configure Word-specific convert options
WordProcessingConvertOptions convertOptions = new WordProcessingConvertOptions();
convertOptions.setFromPage(1);
convertOptions.setPagesCount(0); // All pages
convertOptions.setDpi(300);
settings.setConvertOptions(convertOptions);
settings.setOutputPath("converted");
// Execute conversion
List<StoredConvertedResult> result = apiInstance.convertDocument(
new ConvertDocumentRequest(settings));
System.out.println("Protected PDF document converted successfully: " + result.get(0).getUrl());
} catch (ApiException e) {
System.err.println("Exception when calling ConvertApi: " + e.getMessage());
e.printStackTrace();
}
}
}
Step 5: Converting PDF to Image Format with Page Control
Converting PDF documents to images gives you precise control over rendering:
# Python SDK Example
import groupdocs_conversion_cloud
from groupdocs_conversion_cloud.models.requests import ConvertDocumentRequest
# Configure API client
client_id = "YOUR_CLIENT_ID"
client_secret = "YOUR_CLIENT_SECRET"
api_instance = groupdocs_conversion_cloud.ConvertApi.from_keys(client_id, client_secret)
try:
# Prepare conversion settings
settings = groupdocs_conversion_cloud.ConvertSettings()
settings.file_path = "documents/multipage.pdf"
settings.format = "jpg"
# Configure PDF-specific load options
load_options = groupdocs_conversion_cloud.PdfLoadOptions()
load_options.password = "" # If PDF is protected
load_options.hide_pdf_annotations = True
load_options.remove_embedded_files = True
settings.load_options = load_options
# Configure JPG-specific convert options
convert_options = groupdocs_conversion_cloud.JpegConvertOptions()
convert_options.from_page = 2 # Start from second page
convert_options.pages_count = 3 # Convert three pages
convert_options.dpi = 300 # High resolution
convert_options.quality = 100 # Maximum quality
convert_options.grayscale = False # Color image
settings.convert_options = convert_options
settings.output_path = "converted"
# Execute conversion
request = ConvertDocumentRequest(settings)
result = api_instance.convert_document(request)
print(f"PDF document converted successfully to JPG: {len(result)} images")
for i, image in enumerate(result):
print(f" - Page {i+2}: {image.name} ({image.size} bytes)")
except groupdocs_conversion_cloud.ApiException as e:
print(f"Exception when calling ConvertApi: {e}")
Step 6: Converting PDF to Excel with Table Recognition
PDF documents often contain tabular data that’s valuable in spreadsheet format:
// Node.js SDK Example
const { ConvertApi, Configuration } = require("groupdocs-conversion-cloud");
// Configure API client
const clientId = "YOUR_CLIENT_ID";
const clientSecret = "YOUR_CLIENT_SECRET";
const config = new Configuration(clientId, clientSecret);
const apiInstance = new ConvertApi(config);
// Prepare conversion settings
const settings = {
filePath: "documents/financial_report.pdf",
format: "xlsx",
loadOptions: {
// PDF-specific load options
password: "", // If PDF is protected
hidePdfAnnotations: true,
flattenAllFields: true
},
convertOptions: {
// Excel-specific convert options
fromPage: 1,
pagesCount: 0, // All pages
zoom: 100
},
outputPath: "converted"
};
// Execute conversion
apiInstance.convertDocument({ convertSettings: settings })
.then((result) => {
console.log(`PDF document converted successfully to Excel: ${result[0].url}`);
})
.catch((error) => {
console.log(`Error: ${error.message}`);
});
Step 7: Stream-Based PDF Conversion
For applications that need to process the converted document directly:
// Node.js SDK Example
const { ConvertApi, Configuration } = require("groupdocs-conversion-cloud");
const fs = require("fs");
// Configure API client
const clientId = "YOUR_CLIENT_ID";
const clientSecret = "YOUR_CLIENT_SECRET";
const config = new Configuration(clientId, clientSecret);
const apiInstance = new ConvertApi(config);
// Prepare conversion settings
const settings = {
filePath: "documents/sample.pdf",
format: "docx",
loadOptions: {
// PDF-specific load options
hidePdfAnnotations: true,
flattenAllFields: true
},
convertOptions: {
// DOCX-specific convert options
dpi: 300
},
// Set outputPath to null for stream output
outputPath: null
};
// Execute conversion
apiInstance.convertDocumentDownload({ convertSettings: settings })
.then((result) => {
// Save the stream to a file
const fileName = "./converted-pdf.docx";
const writeStream = fs.createWriteStream(fileName);
result.pipe(writeStream);
writeStream.on("finish", () => {
console.log(`PDF document converted and saved to ${fileName}`);
});
})
.catch((error) => {
console.log(`Error: ${error.message}`);
});
PDF-Specific Load Options
When converting PDF documents, you can leverage these specialized options:
Option | Description | Default | Impact |
---|---|---|---|
Password | Document open password | null | Required for protected PDFs |
HidePdfAnnotations | Hide annotations in output | false | Controls annotation visibility |
FlattenAllFields | Flatten form fields | false | Controls form field appearance |
RemoveEmbeddedFiles | Remove embedded files | false | Controls embedded content |
ExtractOCRText | Extract OCR text when available | true | Controls text extraction method |
EnableLayeredRendering | Enable layer rendering | false | Controls layer handling |
Troubleshooting Common Issues
1. Password and Security Problems
If you encounter issues with protected PDFs:
- Ensure the password is correct and provided in the correct case
- If a document has permission restrictions, you might need owner password
- For highly secured PDFs, some restrictions might prevent certain types of conversion
2. Annotation Handling Issues
When dealing with annotated PDFs:
- Use HidePdfAnnotations to control whether annotations appear in output
- If annotations appear unexpectedly, ensure this option is set to true
- For format-specific annotation conversion, test different target formats
3. Form Field Preservation
For PDFs with forms:
- Use FlattenAllFields to control whether fields become regular content
- When converting to editable formats, you might need to set this to false
- Test form field handling with different destination formats
4. Image Quality Challenges
When converting PDF to images:
- Adjust DPI for better quality (300 DPI is good for most purposes)
- For JPG conversion, set quality to a higher value (90-100)
- For text clarity, enable anti-aliasing if available
5. Layout Preservation Issues
For complex layouts:
- PDF to Word/Excel conversion might not preserve all formatting
- Try PDF to PDF/A conversion first for consistent results
- Consider using PDF to image conversion for exact visual fidelity
What You’ve Learned
In this tutorial, you’ve learned:
- How to convert PDF documents to various formats including Word, Excel, and images
- Handling password-protected PDF documents
- Controlling annotation visibility and form field handling
- Managing embedded content during conversion
- Implementing both storage-based and stream-based PDF conversions
- Troubleshooting common PDF conversion challenges
Further Practice
To reinforce your learning, try these exercises:
- Create a batch conversion utility that processes multiple PDF files with consistent settings
- Implement a web form that allows users to upload PDFs and choose conversion options
- Build a system that extracts tables from PDFs and converts them to Excel with proper formatting
- Create a PDF processing pipeline that handles annotations differently based on their type
Additional Resources
Have questions about this tutorial? Feel free to reach out on our forum for support.