Tutorial: Converting to HTML Format

In this tutorial, you’ll learn how to convert various document types to HTML format using GroupDocs.Conversion Cloud API. You’ll master web-friendly document conversions with options for responsive layouts, embedded resources, and display customizations.

Learning Objectives

By the end of this tutorial, you will be able to:

  • Convert documents to HTML format with various customization options
  • Control HTML layout and styling during conversion
  • Manage resource handling for embedded content
  • Implement responsive design considerations
  • Handle both storage-based and stream-based HTML conversions
  • Troubleshoot common HTML conversion challenges

Prerequisites

Before starting this tutorial, you need:

  1. A GroupDocs.Conversion Cloud account
  2. Your Client ID and Client Secret credentials
  3. Basic understanding of REST API concepts
  4. Development environment with your preferred programming language set up
  5. Sample documents to test conversion (we’ll use various formats including Word, PDF, and PowerPoint)

Implementation Steps

Step 1: Authentication with GroupDocs.Conversion Cloud API

Before performing any operations, we need to authenticate with the API using your Client ID and Client Secret.

Try it yourself

First, let’s obtain a JWT access token using cURL:

# First get JSON Web Token
curl -v "https://api.groupdocs.cloud/connect/token" \
-X POST \
-d "grant_type=client_credentials&client_id=YOUR_CLIENT_ID&client_secret=YOUR_CLIENT_SECRET" \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "Accept: application/json"

Make sure to replace YOUR_CLIENT_ID and YOUR_CLIENT_SECRET with your actual credentials.

Step 2: Basic Document to HTML Conversion

Let’s start with a simple conversion from a Word document to HTML format:

Try it yourself

Using cURL:

curl -X POST "https://api.groupdocs.cloud/v2.0/conversion" \
-H "accept: application/json" \
-H "authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d "{  
      'FilePath': 'documents/sample.docx',  
      'Format': 'html',  
      'OutputPath': 'converted'
    }"

Replace YOUR_JWT_TOKEN with the actual token received in Step 1.

Step 3: Converting to HTML with Layout Options

Now let’s implement a comprehensive example that converts a document to HTML with specific layout options:

// C# SDK Example
using System;
using System.Collections.Generic;
using GroupDocs.Conversion.Cloud.Sdk.Api;
using GroupDocs.Conversion.Cloud.Sdk.Client;
using GroupDocs.Conversion.Cloud.Sdk.Model;
using GroupDocs.Conversion.Cloud.Sdk.Model.Requests;

namespace HtmlConversionTutorial
{
    class Program
    {
        static void Main(string[] args)
        {
            // Configure API client
            var configuration = new Configuration("YOUR_CLIENT_ID", "YOUR_CLIENT_SECRET");
            var apiInstance = new ConvertApi(configuration);

            try
            {
                // Set up conversion to HTML with options
                var settings = new ConvertSettings
                {
                    StorageName = "MyStorage",
                    FilePath = "documents/sample.docx",
                    Format = "html",
                    // For password-protected documents
                    LoadOptions = new DocxLoadOptions() 
                    { 
                        Password = "" 
                    },
                    // HTML-specific convert options
                    ConvertOptions = new WebConvertOptions()
                    {
                        // Page selection
                        FromPage = 1,
                        PagesCount = 0,  // All pages
                        
                        // Layout options
                        FixedLayout = true,           // Use fixed layout instead of fluid
                        FixedLayoutShowBorders = true, // Show page borders
                        
                        // Resource handling
                        ResourcesEmbedded = true,      // Embed resources (images, fonts)
                        
                        // Output formatting
                        Zoom = 100,                   // Default zoom level
                        
                        // Navigation options
                        UsePdf = false                // Direct conversion (not via PDF)
                    },
                    OutputPath = "converted"
                };

                // Execute conversion
                List<StoredConvertedResult> response = apiInstance.ConvertDocument(
                    new ConvertDocumentRequest(settings));
                
                Console.WriteLine("Document converted successfully to HTML: " + response[0].Url);
            }
            catch (Exception e)
            {
                Console.WriteLine("Error: " + e.Message);
            }
        }
    }
}

Step 4: Converting PDF to Responsive HTML

PDF documents often need special handling to convert to responsive HTML:

// Java SDK Example
import com.groupdocs.cloud.conversion.api.*;
import com.groupdocs.cloud.conversion.client.ApiException;
import com.groupdocs.cloud.conversion.model.*;
import com.groupdocs.cloud.conversion.model.requests.*;
import java.util.List;

public class PdfToHtmlExample {
    public static void main(String[] args) {
        // Configure API client
        String clientId = "YOUR_CLIENT_ID";
        String clientSecret = "YOUR_CLIENT_SECRET";
        Configuration configuration = new Configuration(clientId, clientSecret);
        ConvertApi apiInstance = new ConvertApi(configuration);
        
        try {
            // Prepare convert settings
            ConvertSettings settings = new ConvertSettings();
            settings.setFilePath("documents/sample.pdf");
            settings.setFormat("html");
            
            // Set PDF-specific load options
            PdfLoadOptions loadOptions = new PdfLoadOptions();
            loadOptions.setPassword("");  // If PDF is protected
            loadOptions.setRemoveEmbeddedFiles(true);  // Remove embedded files for cleaner output
            loadOptions.setHidePdfAnnotations(true);   // Hide annotations
            
            settings.setLoadOptions(loadOptions);
            
            // Configure HTML-specific convert options for responsive layout
            WebConvertOptions convertOptions = new WebConvertOptions();
            convertOptions.setFromPage(1);
            convertOptions.setPagesCount(0);  // All pages
            
            // Use fluid layout (not fixed) for responsive design
            convertOptions.setFixedLayout(false);
            
            // Additional options
            convertOptions.setResourcesEmbedded(true);  // Keep resources embedded
            
            settings.setConvertOptions(convertOptions);
            settings.setOutputPath("converted");
            
            // Execute conversion
            List<StoredConvertedResult> result = apiInstance.convertDocument(
                new ConvertDocumentRequest(settings));
            
            System.out.println("PDF converted successfully to responsive HTML: " + result.get(0).getUrl());
        } catch (ApiException e) {
            System.err.println("Exception when calling ConvertApi: " + e.getMessage());
            e.printStackTrace();
        }
    }
}

Step 5: Converting Presentations to HTML

PowerPoint and other presentation formats often contain complex slides that need special handling:

# Python SDK Example
import groupdocs_conversion_cloud
from groupdocs_conversion_cloud.models.requests import ConvertDocumentRequest

# Configure API client
client_id = "YOUR_CLIENT_ID"
client_secret = "YOUR_CLIENT_SECRET"
api_instance = groupdocs_conversion_cloud.ConvertApi.from_keys(client_id, client_secret)

try:
    # Prepare conversion settings
    settings = groupdocs_conversion_cloud.ConvertSettings()
    settings.file_path = "documents/presentation.pptx"
    settings.format = "html"
    
    # Configure presentation-specific load options
    load_options = groupdocs_conversion_cloud.PresentationLoadOptions()
    load_options.show_hidden_slides = False  # Skip hidden slides
    load_options.password = ""  # If presentation is protected
    
    settings.load_options = load_options
    
    # Configure HTML-specific convert options
    convert_options = groupdocs_conversion_cloud.WebConvertOptions()
    convert_options.from_page = 1
    convert_options.pages_count = 0  # All slides
    
    # Layout options - fixed is often better for presentations to preserve slide layout
    convert_options.fixed_layout = True
    convert_options.fixed_layout_show_borders = False  # No borders around slides
    
    # Resource handling - embed everything for standalone HTML
    convert_options.resources_embedded = True
    
    settings.convert_options = convert_options
    settings.output_path = "converted"
    
    # Execute conversion
    request = ConvertDocumentRequest(settings)
    result = api_instance.convert_document(request)
    
    print(f"Presentation converted successfully to HTML: {result[0].url}")
except groupdocs_conversion_cloud.ApiException as e:
    print(f"Exception when calling ConvertApi: {e}")

Step 6: Stream-Based HTML Conversion

For applications that need to process the converted HTML directly:

// Node.js SDK Example
const { ConvertApi, Configuration } = require("groupdocs-conversion-cloud");
const fs = require("fs");

// Configure API client
const clientId = "YOUR_CLIENT_ID";
const clientSecret = "YOUR_CLIENT_SECRET";
const config = new Configuration(clientId, clientSecret);
const apiInstance = new ConvertApi(config);

// Prepare conversion settings
const settings = {
    filePath: "documents/sample.docx",
    format: "html",
    loadOptions: {
        // DOCX-specific load options if needed
    },
    convertOptions: {
        // HTML-specific convert options
        fromPage: 1,
        pagesCount: 0,  // All pages
        fixedLayout: false,  // Fluid layout for responsive design
        resourcesEmbedded: true  // Embed all resources
    },
    // Set outputPath to null for stream output
    outputPath: null
};

// Execute conversion
apiInstance.convertDocumentDownload({ convertSettings: settings })
    .then((result) => {
        // Save the stream to a file
        const fileName = "./converted-document.html";
        const writeStream = fs.createWriteStream(fileName);
        
        result.pipe(writeStream);
        
        writeStream.on("finish", () => {
            console.log(`Document converted and saved to ${fileName}`);
            
            // Read the HTML to demonstrate we have it
            fs.readFile(fileName, 'utf8', (err, data) => {
                if (err) {
                    console.error("Error reading HTML file:", err);
                    return;
                }
                
                // Display the first 100 characters of the HTML
                console.log("HTML Preview: " + data.substring(0, 100) + "...");
            });
        });
    })
    .catch((error) => {
        console.log(`Error: ${error.message}`);
    });

HTML-Specific Conversion Options

When converting to HTML format, you can leverage these specialized options:

OptionDescriptionDefaultImpact
FromPageFirst page number to convert1Controls starting page/slide
PagesCountNumber of pages to convertAllLimits content amount
FixedLayoutUse fixed layout instead of fluidfalseAffects responsiveness
FixedLayoutShowBordersShow page borders in fixed layoutfalseVisual appearance
ResourcesEmbeddedEmbed resources (images, fonts)falseFile independence
ZoomDefault zoom level (%)100Initial display size
UsePdfUse PDF as intermediate formatfalseCan improve complex layouts

Troubleshooting Common Issues

1. Layout and Responsiveness Issues

If the converted HTML doesn’t display correctly across devices:

  • For responsive design, set FixedLayout to false
  • For preserving exact layout (like presentations), set FixedLayout to true
  • If content appears too small or large, adjust the Zoom parameter

2. Resource Handling Problems

If images, fonts, or other resources are missing:

  • Set ResourcesEmbedded to true to include all resources in the HTML
  • Check if the original document has links to external resources that might be unavailable
  • For documents with custom fonts, ensure these fonts are available in the conversion environment

3. Complex Layout Conversion Challenges

For documents with complex layouts:

  • Try setting UsePdf to true, which may preserve complex layouts better
  • For tables, charts, and other complex elements, fixed layout often works better
  • If only part of the document has complex layouts, consider converting specific pages

4. File Size Concerns

If the HTML output is too large:

  • Consider setting ResourcesEmbedded to false and manage resources separately
  • For multi-page documents, convert pages in batches
  • Check if original documents contain high-resolution images that could be optimized

What You’ve Learned

In this tutorial, you’ve learned:

  • How to convert various document formats to HTML
  • Controlling HTML layout options for different display requirements
  • Managing embedded resources for standalone or externally-referenced content
  • Implementing responsive design considerations for web viewing
  • Handling both storage-based and stream-based HTML conversions
  • Troubleshooting common HTML conversion challenges

Further Practice

To reinforce your learning, try these exercises:

  1. Create a document viewer web application that converts and displays documents as HTML
  2. Implement a batch conversion utility that processes multiple documents to responsive HTML
  3. Build a service that optimizes HTML output for mobile devices by adjusting layout options
  4. Create a comparison tool that shows different HTML conversion options side by side

Next Tutorial

Ready to learn more? Continue with our Tutorial: Converting to Text Format to master text extraction and conversion from various document types.

Additional Resources

Have questions about this tutorial? Feel free to reach out on our forum for support.