Zum Inhalt springen

How to Enable PDF Text Search in Web Applications

In the digital age, PDFs are one of the most common formats for sharing and archiving documents. They preserve layout, fonts, and images across platforms. However, not all PDFs are created equal. While many contain actual text data, others—especially scanned documents—are essentially just images wrapped in a PDF container. This difference is critical when it comes to text search.

A searchable PDF allows you to find words instantly using the viewer’s search feature, while an image-only PDF does not. If you work with scanned contracts, reports, or forms, being able to search quickly is a major productivity boost.

Starting from v3.0, the Dynamsoft Document Viewer SDK (DDV) supports text search in searchable PDFs directly in a web application. This means you can open a PDF, hit the search icon, and instantly navigate to matching text—all without extra backend processing. And if your PDF isn’t searchable yet, you can use OCR (Optical Character Recognition) to make it so.

In this tutorial, you’ll learn:

  • How to enable the DDV text search panel in your web app.
  • How to integrate a free OCR library to convert image-based documents to searchable PDFs before viewing.

Demo: Search Text in PDF Files

Online Demo

https://yushulx.me/web-twain-document-scan-management/examples/document_annotation/

Prerequisites

Before you start, you’ll need:

What is a Searchable PDF?

A searchable PDF is a hybrid file that contains:

  • Visual layer: The scanned page image, graphics, and layout exactly as they appear.
  • Hidden text layer: OCR-extracted text positioned precisely over the corresponding visual content.

When you open such a PDF:

  • You can select, copy, and paste text just like a native document.
  • The viewer’s search feature finds matches instantly.
  • Screen readers and accessibility tools can read it.

By contrast, an image-only PDF is just pixels. The search tool finds nothing because there’s no text data.

How it’s created:
OCR software analyzes the image, recognizes characters, and writes them into the PDF as invisible text. This is exactly what we’ll do programmatically for any image-based files.

Project Structure

The sample project files:

document_annotation/
├── index.html              # Main HTML file with UI components
├── main.js                 # Core application logic
├── main.css                # Styling and responsive design
├── ocr-lib.js              # OCR library with multiple engine support
├── full.json               # Configuration templates
├── searchable-document.pdf # Sample searchable PDF for testing
└── README.md               

Step 1: Configure the UI for Text Search

DDV offers UI components for search:

  • TextSearchPanelSwitch (search icon button)
  • TextSearchPanel (search bar and results list)

We’ll add them to the desktop and mobile viewer configs in main.js.

Desktop config:


const pcEditViewerUiConfig = {
    type: Dynamsoft.DDV.Elements.Layout,
    flexDirection: "column",
    className: "ddv-edit-viewer-desktop",
    children: [
        {
            type: Dynamsoft.DDV.Elements.Layout,
            className: "ddv-edit-viewer-header-desktop",
            children: [
                {
                    type: Dynamsoft.DDV.Elements.Layout,
                    children: [
                        ...
                        Dynamsoft.DDV.Elements.TextSearchPanelSwitch
                    ],
                },
                {
                    type: Dynamsoft.DDV.Elements.Layout,
                    children: [
                        {
                            type: Dynamsoft.DDV.Elements.Pagination,
                            className: "ddv-edit-viewer-pagination-desktop",
                        },
                        downloadButton,
                    ],
                },
            ],
        },
        ...
    ],
};

Mobile config:

const mobileEditViewerUiConfig = {
    type: Dynamsoft.DDV.Elements.Layout,
    flexDirection: "column",
    className: "ddv-edit-viewer-mobile",
    children: [
        {
            type: Dynamsoft.DDV.Elements.Layout,
            className: "ddv-edit-viewer-header-mobile",
            children: [
                ...
                Dynamsoft.DDV.Elements.TextSearchPanelSwitch
            ],
        },
        {
            type: Dynamsoft.DDV.Elements.Layout,
            flexDirection: "column",
            children: [
                Dynamsoft.DDV.Elements.MainView,
                {
                    type: Dynamsoft.DDV.Elements.TextSearchPanel,
                    className: "ddv-edit-viewer-search-mobile"
                }
            ]
        },
        ...
    ],
};

Result: When you run the app and open a searchable PDF, you’ll see a magnifier icon. Clicking it shows the search panel.

text search icon

Step 2: Convert Images/PDFs to Searchable PDFs (Optional)

If your users may load image-only PDFs or images (JPG, PNG, TIFF), you can make them searchable on the fly before loading into DDV.

This adds processing time—especially for large documents—but it’s a huge usability boost.

The logic is as follows:

  1. Include the necessary libraries for OCR and PDF manipulation.

    <script src="https://unpkg.com/tesseract.js@5.0.2/dist/tesseract.min.js"></script>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/pdf.js/3.11.174/pdf.min.js"></script>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/jspdf/2.5.1/jspdf.umd.min.js"></script>
    <script src="https://unpkg.com/utif2@4.1.0/UTIF.js"></script>
    <script src="ocr-lib.js"></script>
    
  2. Initialize the OCR library:

    const ocrLib = new OCRLibrary();
    
  3. In async function load(blob, password), convert the input blob to a searchable PDF before loading into DDV:

    async function load(blob, password) {
        try {
            if (!currentDoc) {
                currentDoc = Dynamsoft.DDV.documentManager.createDocument({
                    name: Date.now().toString(),
                    author: "DDV",
                });
            }
    
            let searchablePDF = blob;
            searchablePDF = await ocrLib.convert2searchable(
                blob,
                {
                    engine: 'ocr.space',
                    language: 'eng'
                },
                (message, percentage) => {
                    console.log(`${message} (${percentage}%)`);
                }
            );
    
            const source = {
                fileData: searchablePDF,
                password: password,
                renderOptions: {
                    renderAnnotations: "loadAnnotations"
                }
            };
            await currentDoc.loadSource([source]);
            editViewer.openDocument(currentDoc);
            editViewer.goToPage(editViewer.getPageCount() - 1);
        } catch (error) {
            console.error(error);
        }
    }
    

Source Code

https://github.com/yushulx/web-twain-document-scan-management/tree/main/examples/document_annotation

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert