Tutorial: How to Extract Pages from Documents

Learning Objectives

In this tutorial, you’ll learn how to:

Extract specific pages from a document by providing exact page numbers
Extract pages using a page range with options for even/odd pages
Work with both unprotected and password-protected documents
Implement page extraction in multiple programming languages

Prerequisites

Before starting this tutorial, make sure you have:

A GroupDocs.Merger Cloud account (get a free trial here
Your Client ID and Client Secret (available in the API dashboard
A document uploaded to your cloud storage
The appropriate SDK installed for your language of choice

Practical Scenario

Imagine you have a 10-page report and need to extract pages 2, 4, and 7 to create a summary document. Alternatively, you might need to extract all even pages from pages 1-10 for a two-sided printing job. This tutorial will teach you both approaches.

Method 1: Extract Pages by Exact Page Numbers

This approach allows you to specify exactly which pages you want to extract. This is useful when you need specific, non-sequential pages from your document.

Step 1: Obtain Your JWT Token

Before making any API request, you need to authenticate with the GroupDocs.Merger Cloud API:

curl -v "https://api.groupdocs.cloud/connect/token" \
-X POST \
-d "grant_type=client_credentials&client_id=YOUR_CLIENT_ID&client_secret=YOUR_CLIENT_SECRET" \
-H "Content-Type: application/x-www-form-urlencoded" \
-H "Accept: application/json"

The response will include your JWT token that you’ll use in subsequent requests.

Step 2: Define Your Extraction Request

Now, create a JSON request that specifies:

The source document path
The exact page numbers to extract
The destination path for the new document

Step 3: Execute the API Request

Using cURL

curl -v "https://api.groupdocs.cloud/v1.0/merger/pages/extract" \
-X POST \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-d "{
     'FileInfo': { 'FilePath': '/WordProcessing/sample-10-pages.docx'},
     'Pages':  [ 2, 4, 7 ],
     'OutputPath': 'output/extract-pages-by-numbers.docx'
 }"

Try it yourself!

Replace the placeholder values with your actual Client ID, Client Secret, and file paths, then run the command. The API will return a JSON response with the path to your newly created document containing only pages 2, 4, and 7.

Step 4: Verify the Results

Download the resulting document from your storage and open it. You should see a document with only the three pages you specified.

Method 2: Extract Pages by Page Range

This approach allows you to extract pages based on a range and filter them by even or odd page numbers.

Step 1: Define Your Range Extraction Request

Create a JSON request that specifies:

The source document path
The start and end page numbers of your range
The range mode (All, EvenPages, or OddPages)
The destination path for the new document

Step 2: Execute the API Request

Using cURL

curl -v "https://api.groupdocs.cloud/v1.0/merger/pages/extract" \
-X POST \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-d "{
     'FileInfo': { 'FilePath': '/WordProcessing/sample-10-pages.docx'},
     'StartPageNumber': 1,
     'EndPageNumber': 10,
     'RangeMode': 'EvenPages',
     'OutputPath': 'output/extract-pages-by-range.docx'
 }"

Try it yourself!

Run this command with your own JWT token. The resulting document will contain only the even-numbered pages (2, 4, 6, 8, 10) from your source document.

Working with Password-Protected Documents

If your document is password-protected, you need to include the password in your request:

curl -v "https://api.groupdocs.cloud/v1.0/merger/pages/extract" \
-X POST \
-H "Content-Type: application/json" \
-H "Accept: application/json" \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-d "{
     'FileInfo': { 
         'FilePath': '/WordProcessing/protected-document.docx',
         'Password': 'your_password'
     },
     'Pages':  [ 2, 4, 7 ],
     'OutputPath': 'output/extract-pages-protected.docx'
 }"

SDK Implementation Examples

C# Example

// Extract Pages using C# SDK
string MyClientSecret = ""; // Get ClientId and ClientSecret from https://dashboard.groupdocs.cloud
string MyClientId = ""; // Get ClientId and ClientSecret from https://dashboard.groupdocs.cloud
  
var configuration = new Configuration(MyClientId, MyClientSecret);
  
var apiInstance = new PagesApi(configuration);

var options = new ExtractOptions
{
    FileInfo = new FileInfo
    {
        FilePath = "WordProcessing/sample-10-pages.docx"
    },
    Pages = new List<int?> { 2, 4, 7 },
    OutputPath = "output/extract-pages-by-numbers.docx"
};

var response = apiInstance.Extract(new ExtractRequest(options));
Console.WriteLine("Output file path: " + response.Path);

Java Example

// Extract Pages using Java SDK
String MyClientSecret = ""; // Get ClientId and ClientSecret from https://dashboard.groupdocs.cloud
String MyClientId = ""; // Get ClientId and ClientSecret from https://dashboard.groupdocs.cloud

Configuration configuration = new Configuration(MyClientId, MyClientSecret);
PagesApi apiInstance = new PagesApi(configuration);

FileInfo fileInfo = new FileInfo();
fileInfo.setFilePath("WordProcessing/sample-10-pages.docx");

ExtractOptions options = new ExtractOptions();
options.setFileInfo(fileInfo);
options.setPages(Arrays.asList(2, 4, 7));
options.setOutputPath("output/extract-pages-by-numbers.docx");

ExtractRequest request = new ExtractRequest(options);
DocumentResult response = apiInstance.extract(request);
System.out.println("Output file path: " + response.getPath());

Python Example

# Extract Pages using Python SDK
import groupdocs_merger_cloud

# Get your app_sid and app_key at https://dashboard.groupdocs.cloud
my_client_id = ""
my_client_secret = ""

# Create instance of the API
configuration = groupdocs_merger_cloud.Configuration(my_client_id, my_client_secret)
api_instance = groupdocs_merger_cloud.PagesApi(configuration)

file_info = groupdocs_merger_cloud.FileInfo()
file_info.file_path = "WordProcessing/sample-10-pages.docx"

options = groupdocs_merger_cloud.ExtractOptions()
options.file_info = file_info
options.pages = [2, 4, 7]
options.output_path = "output/extract-pages-by-numbers.docx"

request = groupdocs_merger_cloud.ExtractRequest(options)
response = api_instance.extract(request)
print("Output file path: " + response.path)

Troubleshooting Tips

Error 401: If you receive an authentication error, make sure your JWT token is valid and hasn’t expired.
Error 400: Verify that the file path is correct and the file exists in your storage.
Empty Result Document: Ensure you’ve specified valid page numbers that exist in your source document.
Password Issues: If working with protected documents, check that the provided password is correct.

What You’ve Learned

In this tutorial, you’ve learned how to:

Extract specific pages from a document by providing exact page numbers
Extract pages using a range with filtering for even or odd pages
Handle password-protected documents
Implement page extraction functionality using various programming languages

Further Practice

To reinforce your learning, try these exercises:

Extract the first and last page from a document
Extract all odd pages from a 20-page document
Combine both methods to extract pages 1, 3, and all even pages between 10-20

Next Tutorial

Ready to learn more? Continue to the next tutorial: How to Remove Pages from Documents to learn how to permanently delete pages from your documents.

Helpful Resources

If you have any questions about this tutorial, please let us know in the comments below or through our support forum!