Tutorial: Converting to HTML Format
In this tutorial, you’ll learn how to convert various document types to HTML format using GroupDocs.Conversion Cloud API. You’ll master web-friendly document conversions with options for responsive layouts, embedded resources, and display customizations.
Learning Objectives
By the end of this tutorial, you will be able to:
- Convert documents to HTML format with various customization options
- Control HTML layout and styling during conversion
- Manage resource handling for embedded content
- Implement responsive design considerations
- Handle both storage-based and stream-based HTML conversions
- Troubleshoot common HTML conversion challenges
Prerequisites
Before starting this tutorial, you need:
- A GroupDocs.Conversion Cloud account
- Your Client ID and Client Secret credentials
- Basic understanding of REST API concepts
- Development environment with your preferred programming language set up
- Sample documents to test conversion (we’ll use various formats including Word, PDF, and PowerPoint)
Implementation Steps
Step 1: Authentication with GroupDocs.Conversion Cloud API
Before performing any operations, we need to authenticate with the API using your Client ID and Client Secret.
Try it yourself
First, let’s obtain a JWT access token using cURL:
# First get JSON Web Token
curl -v "https://api.groupdocs.cloud/connect/token" \
-X POST \
-d "grant_type=client_credentials&client_id=YOUR_CLIENT_ID&client_secret=YOUR_CLIENT_SECRET" \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "Accept: application/json"
Make sure to replace YOUR_CLIENT_ID
and YOUR_CLIENT_SECRET
with your actual credentials.
Step 2: Basic Document to HTML Conversion
Let’s start with a simple conversion from a Word document to HTML format:
Try it yourself
Using cURL:
curl -X POST "https://api.groupdocs.cloud/v2.0/conversion" \
-H "accept: application/json" \
-H "authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d "{
'FilePath': 'documents/sample.docx',
'Format': 'html',
'OutputPath': 'converted'
}"
Replace YOUR_JWT_TOKEN
with the actual token received in Step 1.
Step 3: Converting to HTML with Layout Options
Now let’s implement a comprehensive example that converts a document to HTML with specific layout options:
// C# SDK Example
using System;
using System.Collections.Generic;
using GroupDocs.Conversion.Cloud.Sdk.Api;
using GroupDocs.Conversion.Cloud.Sdk.Client;
using GroupDocs.Conversion.Cloud.Sdk.Model;
using GroupDocs.Conversion.Cloud.Sdk.Model.Requests;
namespace HtmlConversionTutorial
{
class Program
{
static void Main(string[] args)
{
// Configure API client
var configuration = new Configuration("YOUR_CLIENT_ID", "YOUR_CLIENT_SECRET");
var apiInstance = new ConvertApi(configuration);
try
{
// Set up conversion to HTML with options
var settings = new ConvertSettings
{
StorageName = "MyStorage",
FilePath = "documents/sample.docx",
Format = "html",
// For password-protected documents
LoadOptions = new DocxLoadOptions()
{
Password = ""
},
// HTML-specific convert options
ConvertOptions = new WebConvertOptions()
{
// Page selection
FromPage = 1,
PagesCount = 0, // All pages
// Layout options
FixedLayout = true, // Use fixed layout instead of fluid
FixedLayoutShowBorders = true, // Show page borders
// Resource handling
ResourcesEmbedded = true, // Embed resources (images, fonts)
// Output formatting
Zoom = 100, // Default zoom level
// Navigation options
UsePdf = false // Direct conversion (not via PDF)
},
OutputPath = "converted"
};
// Execute conversion
List<StoredConvertedResult> response = apiInstance.ConvertDocument(
new ConvertDocumentRequest(settings));
Console.WriteLine("Document converted successfully to HTML: " + response[0].Url);
}
catch (Exception e)
{
Console.WriteLine("Error: " + e.Message);
}
}
}
}
Step 4: Converting PDF to Responsive HTML
PDF documents often need special handling to convert to responsive HTML:
// Java SDK Example
import com.groupdocs.cloud.conversion.api.*;
import com.groupdocs.cloud.conversion.client.ApiException;
import com.groupdocs.cloud.conversion.model.*;
import com.groupdocs.cloud.conversion.model.requests.*;
import java.util.List;
public class PdfToHtmlExample {
public static void main(String[] args) {
// Configure API client
String clientId = "YOUR_CLIENT_ID";
String clientSecret = "YOUR_CLIENT_SECRET";
Configuration configuration = new Configuration(clientId, clientSecret);
ConvertApi apiInstance = new ConvertApi(configuration);
try {
// Prepare convert settings
ConvertSettings settings = new ConvertSettings();
settings.setFilePath("documents/sample.pdf");
settings.setFormat("html");
// Set PDF-specific load options
PdfLoadOptions loadOptions = new PdfLoadOptions();
loadOptions.setPassword(""); // If PDF is protected
loadOptions.setRemoveEmbeddedFiles(true); // Remove embedded files for cleaner output
loadOptions.setHidePdfAnnotations(true); // Hide annotations
settings.setLoadOptions(loadOptions);
// Configure HTML-specific convert options for responsive layout
WebConvertOptions convertOptions = new WebConvertOptions();
convertOptions.setFromPage(1);
convertOptions.setPagesCount(0); // All pages
// Use fluid layout (not fixed) for responsive design
convertOptions.setFixedLayout(false);
// Additional options
convertOptions.setResourcesEmbedded(true); // Keep resources embedded
settings.setConvertOptions(convertOptions);
settings.setOutputPath("converted");
// Execute conversion
List<StoredConvertedResult> result = apiInstance.convertDocument(
new ConvertDocumentRequest(settings));
System.out.println("PDF converted successfully to responsive HTML: " + result.get(0).getUrl());
} catch (ApiException e) {
System.err.println("Exception when calling ConvertApi: " + e.getMessage());
e.printStackTrace();
}
}
}
Step 5: Converting Presentations to HTML
PowerPoint and other presentation formats often contain complex slides that need special handling:
# Python SDK Example
import groupdocs_conversion_cloud
from groupdocs_conversion_cloud.models.requests import ConvertDocumentRequest
# Configure API client
client_id = "YOUR_CLIENT_ID"
client_secret = "YOUR_CLIENT_SECRET"
api_instance = groupdocs_conversion_cloud.ConvertApi.from_keys(client_id, client_secret)
try:
# Prepare conversion settings
settings = groupdocs_conversion_cloud.ConvertSettings()
settings.file_path = "documents/presentation.pptx"
settings.format = "html"
# Configure presentation-specific load options
load_options = groupdocs_conversion_cloud.PresentationLoadOptions()
load_options.show_hidden_slides = False # Skip hidden slides
load_options.password = "" # If presentation is protected
settings.load_options = load_options
# Configure HTML-specific convert options
convert_options = groupdocs_conversion_cloud.WebConvertOptions()
convert_options.from_page = 1
convert_options.pages_count = 0 # All slides
# Layout options - fixed is often better for presentations to preserve slide layout
convert_options.fixed_layout = True
convert_options.fixed_layout_show_borders = False # No borders around slides
# Resource handling - embed everything for standalone HTML
convert_options.resources_embedded = True
settings.convert_options = convert_options
settings.output_path = "converted"
# Execute conversion
request = ConvertDocumentRequest(settings)
result = api_instance.convert_document(request)
print(f"Presentation converted successfully to HTML: {result[0].url}")
except groupdocs_conversion_cloud.ApiException as e:
print(f"Exception when calling ConvertApi: {e}")
Step 6: Stream-Based HTML Conversion
For applications that need to process the converted HTML directly:
// Node.js SDK Example
const { ConvertApi, Configuration } = require("groupdocs-conversion-cloud");
const fs = require("fs");
// Configure API client
const clientId = "YOUR_CLIENT_ID";
const clientSecret = "YOUR_CLIENT_SECRET";
const config = new Configuration(clientId, clientSecret);
const apiInstance = new ConvertApi(config);
// Prepare conversion settings
const settings = {
filePath: "documents/sample.docx",
format: "html",
loadOptions: {
// DOCX-specific load options if needed
},
convertOptions: {
// HTML-specific convert options
fromPage: 1,
pagesCount: 0, // All pages
fixedLayout: false, // Fluid layout for responsive design
resourcesEmbedded: true // Embed all resources
},
// Set outputPath to null for stream output
outputPath: null
};
// Execute conversion
apiInstance.convertDocumentDownload({ convertSettings: settings })
.then((result) => {
// Save the stream to a file
const fileName = "./converted-document.html";
const writeStream = fs.createWriteStream(fileName);
result.pipe(writeStream);
writeStream.on("finish", () => {
console.log(`Document converted and saved to ${fileName}`);
// Read the HTML to demonstrate we have it
fs.readFile(fileName, 'utf8', (err, data) => {
if (err) {
console.error("Error reading HTML file:", err);
return;
}
// Display the first 100 characters of the HTML
console.log("HTML Preview: " + data.substring(0, 100) + "...");
});
});
})
.catch((error) => {
console.log(`Error: ${error.message}`);
});
HTML-Specific Conversion Options
When converting to HTML format, you can leverage these specialized options:
Option | Description | Default | Impact |
---|---|---|---|
FromPage | First page number to convert | 1 | Controls starting page/slide |
PagesCount | Number of pages to convert | All | Limits content amount |
FixedLayout | Use fixed layout instead of fluid | false | Affects responsiveness |
FixedLayoutShowBorders | Show page borders in fixed layout | false | Visual appearance |
ResourcesEmbedded | Embed resources (images, fonts) | false | File independence |
Zoom | Default zoom level (%) | 100 | Initial display size |
UsePdf | Use PDF as intermediate format | false | Can improve complex layouts |
Troubleshooting Common Issues
1. Layout and Responsiveness Issues
If the converted HTML doesn’t display correctly across devices:
- For responsive design, set
FixedLayout
to false - For preserving exact layout (like presentations), set
FixedLayout
to true - If content appears too small or large, adjust the
Zoom
parameter
2. Resource Handling Problems
If images, fonts, or other resources are missing:
- Set
ResourcesEmbedded
to true to include all resources in the HTML - Check if the original document has links to external resources that might be unavailable
- For documents with custom fonts, ensure these fonts are available in the conversion environment
3. Complex Layout Conversion Challenges
For documents with complex layouts:
- Try setting
UsePdf
to true, which may preserve complex layouts better - For tables, charts, and other complex elements, fixed layout often works better
- If only part of the document has complex layouts, consider converting specific pages
4. File Size Concerns
If the HTML output is too large:
- Consider setting
ResourcesEmbedded
to false and manage resources separately - For multi-page documents, convert pages in batches
- Check if original documents contain high-resolution images that could be optimized
What You’ve Learned
In this tutorial, you’ve learned:
- How to convert various document formats to HTML
- Controlling HTML layout options for different display requirements
- Managing embedded resources for standalone or externally-referenced content
- Implementing responsive design considerations for web viewing
- Handling both storage-based and stream-based HTML conversions
- Troubleshooting common HTML conversion challenges
Further Practice
To reinforce your learning, try these exercises:
- Create a document viewer web application that converts and displays documents as HTML
- Implement a batch conversion utility that processes multiple documents to responsive HTML
- Build a service that optimizes HTML output for mobile devices by adjusting layout options
- Create a comparison tool that shows different HTML conversion options side by side
Next Tutorial
Ready to learn more? Continue with our Tutorial: Converting to Text Format to master text extraction and conversion from various document types.
Additional Resources
Have questions about this tutorial? Feel free to reach out on our forum for support.