PDF search and indexing | Adobe PDF Services API | Adobe Acrobat Services

Search and Indexing

Overview

With document digitization comes the problem of not only archiving accumulated files, but also the issue of unlocking the data in those documents. Many companies have a wealth of archived information stored in paper or scanned document archives—in a variety of formats—and they cannot be searched or indexed properly. Since the process of manually indexing and searching is incredibly labor intensive and not always accurate, the challenge is creating digital archives that are immediately accessible and available for editing or downstream processing.

With Adobe Acrobat Services, a workflow can:

Convert a variety of files to PDF format for standardization
Run OCR (optical character recognition) on scanned images to convert to machine-readable text
Enable easy and fast searching of converted documents

Relevant APIs

Adobe PDF Services API

Scenario

A firm has thousands of legal contracts that have been scanned and stored as digital files. For compliance purposes, documents with a particular clause or special addendum must be found and then updated. To be successful, the staff must inventory all digital documents, unlock the text, and create a searchable index that can be used to locate critical content.

Building this Solution

Identify a file system for digital document storage on a shared drive or database system
Convert files not in PDF, such as scanned images, using Adobe PDF Services API Create PDF operation
Send PDF documents to Adobe PDF Services API OCR operation to create fully searchable documents
Index and store the searchable documents using a search engine and database

Acrobat Services API Use Cases

Search and Indexing.css-83mclq{margin-left:var(--spectrum-global-dimension-size-100);}

Overview.css-jmmyhj{margin-left:var(--spectrum-global-dimension-size-50);}

Relevant APIs

Scenario

Building this Solution

Search and Indexing

Overview