PDF Accessibility Auto-Tag API
API Output Format
The output of the PDF Accessibility Auto-Tag API contains the following:
- The tagged PDF file.
- A report in XLSX format, if the report generation was enabled. This report provides information related to the tags found in the original document, if any, and the auto-tagged document.
API limitations
- File size: Files up to a maximum of 100 MB are supported.
- Number of Pages: Non-scanned PDFs up to 200 pages and scanned PDFs up to 100 pages are supported, however limits may be lower for files with a large number of tables.
- Rate limits: Keep request rate below 25 requests per minute.
- Page Size: The API supports standard page sizes not more than 17.5” or less than 6” in either dimension.
- Hidden Objects: PDF files that contain content that is not visible on the page like JavaScript, OCG (optional content groups), etc are not supported. Files that contain such hidden information may fail to process. For such cases, removing hidden content prior to processing files again may return a successful result.
- Language: The API is currently optimized for English language content. Files containing content in French, German, Spanish, Danish, Dutch, Norwegian (Bokmal), Galician, Catalan, Finnish, Italian, Swedish, Portuguese, and Romanian should return good results most of the time. Files containing content in Afrikaans, Bosnian, Croatian, Czech, Hungarian, Indonesian, Malay, Polish, Russian, Serbian, Turkish, Hindi, Marathi and other similar languages should return good results often. Non-English files may have issues with non-English punctuation. OCR is configured for English content.
- OCR and Scan quality: The quality of text extracted from scanned files is dependent on the clarity of content in the input file and is currently configured for English content. Conditions like skewed pages, shadowing, obscured or overlapping fonts, and page resolution less than 200 DPI can all result in lower quality text output.
- Form fields: Files containing XFA and other fillable form elements are not supported.
- Unprotected files: The API supports files that are unprotected or where security restrictions allow editing of content. Files that are secured and do not allow editing of content will not be processed. If the password of a protected PDF is known, the permissions of the file can be modified using the PDF Services API as shown here.
- Annotations: Content in PDF files containing annotations such as highlights and sticky notes will be processed, but annotations that obscure text could impact output quality. Text within annotations will not be included in the output.
- PDF Producers: The PDF Accessibility Auto-Tag API is designed to add tags to PDF to make it easier to make the file accessible. Files created from applications that produce other types of content like illustrations, CAD drawings or other types of vector art may not return high quality results.
- PDF Collections: PDFs that are made from a collection of files including PDF Portfolios are not currently supported.
Error codes
Scenario | Error code | Error message |
---|---|---|
Unknown error/ failure | ERROR | Unexpected error |
Timeout | TIMEOUT | Unexpected error: Processing timeout |
Disqualified | DISQUALIFIED | File is not suitable for conversion |
Unsupported XFA file | DISQUALIFIED_XFA | File is not suitable for conversion: File contains an XFA form |
Page limit violation | DISQUALIFIED_PAGE_LIMIT | File is not suitable for conversion: File exceeds page limit |
Scan page limit violation | DISQUALIFIED_SCAN_PAGE_LIMIT | File is not suitable for conversion: Scanned file exceeds page limit |
File size violation | DISQUALIFIED_FILE_SIZE | File is not suitable for conversion: File exceeds size limit |
Encryption permission | DISQUALIFIED_PERMISSIONS | File is not suitable for conversion: File permissions do not allow conversion |
Complex file | DISQUALIFIED_COMPLEX_FILE | File is not suitable for conversion: File content is too complex |
Unsupported language | DISQUALIFIED_LANGUAGE | File is not suitable for conversion: File content language is unsupported |
Bad PDF | BAD_PDF | The PDF file is damaged or its content is too complex |
Bad PDF file type | BAD_PDF_FILE_TYPE | The input file is not a PDF file |
Damaged input file | BAD_PDF_DAMAGED | The input file is damaged |
Complex table | BAD_PDF_COMPLEX_TABLE | The input file contains a table that is too complex to process |
Complex content | BAD_PDF_COMPLEX_INPUT | The input file contains content that is too complex to process |
Unsupported font | BAD_PDF_UNSUPPORTED_FONT | The input file contains font data that is corrupted or not supported |
Large PDF file | BAD_PDF_LARGE_FILE | The input file size exceeds the maximum allowed |
Protected PDF | PROTECTED_PDF | PDF is encrypted or password-protected |
Empty or corrupted input | BAD_INPUT | Input is corrupted or empty |
Invalid input parameters | BAD_INPUT_PARAMS | Invalid input parameters |
User not enrolled to allowed Atlas plans | INVALID_PLAN_CODE | Unauthorized to execute this operation. User is not enrolled to plans allowed for the service |
REST API
See our public API Reference for PDF Accessibility Auto-Tag API.
Generate tagged PDF from a PDF
The sample below generates a tagged PDF from a PDF.
Please refer to the API usage guide to understand how to use our APIs.
Java
.NET
Node JS
Python
REST API
Copied to your clipboard1// Get the samples from https://github.com/adobe/pdfservices-java-sdk-samples2// Run the sample:3// mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.pdfservices.operation.samples.autotagpdf.AutotagPDF45public class AutotagPDF {6 // Initialize the logger.7 private static final Logger LOGGER = LoggerFactory.getLogger(AutotagPDF.class);89 public static void main(String[] args) {1011 try (InputStream inputStream = Files.newInputStream(new File("src/main/resources/autotagPDFInput.pdf").toPath())) {12 // Initial setup, create credentials instance13 Credentials credentials = new ServicePrincipalCredentials(14 System.getenv("PDF_SERVICES_CLIENT_ID"),15 System.getenv("PDF_SERVICES_CLIENT_SECRET"));1617 // Creates a PDF Services instance18 PDFServices pdfServices = new PDFServices(credentials);1920 // Creates an asset(s) from source file(s) and upload21 Asset asset = pdfServices.upload(inputStream, PDFServicesMediaType.PDF.getMediaType());2223 // Creates a new job instance24 AutotagPDFJob autotagPDFJob = new AutotagPDFJob(asset);2526 // Submit the job and gets the job result27 String location = pdfServices.submit(autotagPDFJob);28 PDFServicesResponse<AutotagPDFResult> pdfServicesResponse = pdfServices.getJobResult(location, AutotagPDFResult.class);2930 // Get content from the resulting asset(s)31 Asset resultAsset = pdfServicesResponse.getResult().getTaggedPDF();32 StreamAsset streamAsset = pdfServices.getContent(resultAsset);3334 // Creates an output stream and copy stream asset's content to it35 Files.createDirectories(Paths.get("output/"));36 OutputStream outputStream = Files.newOutputStream(new File("output/autotagPDFOutput.pdf").toPath());37 LOGGER.info("Saving asset at output/autotagPDFOutput.pdf");38 IOUtils.copy(streamAsset.getInputStream(), outputStream);39 outputStream.close();40 } catch (ServiceApiException | IOException | SDKException | ServiceUsageException ex) {41 LOGGER.error("Exception encountered while executing operation", ex);42 }43 }
Copied to your clipboard1// Get the samples from https://github.com/adobe/PDFServices.NET.SDK.Samples2// Run the sample:3// cd AutotagPDF/4// dotnet run AutotagPDF.csproj56namespace AutotagPDF7{8 class Program9 {10 private static readonly ILog log = LogManager.GetLogger(typeof(Program));1112 static void Main()13 {14 //Configure the logging15 ConfigureLogging();16 try17 {18 // Initial setup, create credentials instance.19 Credentials credentials = Credentials.ServicePrincipalCredentialsBuilder()20 .WithClientId("PDF_SERVICES_CLIENT_ID")21 .WithClientSecret("PDF_SERVICES_CLIENT_SECRET")22 .Build();2324 //Create an ExecutionContext using credentials and create a new operation instance.25 ExecutionContext executionContext = ExecutionContext.Create(credentials);26 AutotagPDFOperation autotagPDFOperation = AutotagPDFOperation.CreateNew();2728 // Provide an input FileRef for the operation29 autotagPDFOperation.SetInput(FileRef.CreateFromLocalFile(@"autotagPDFInput.pdf"));3031 // Execute the operation32 AutotagPDFOutput autotagPDFOutput = autotagPDFOperation.Execute(executionContext);3334 // Save the output files at the specified location35 autotagPDFOutput.GetTaggedPDF().SaveAs(Directory.GetCurrentDirectory() + "autotagPDFOutput.pdf");36 }37 catch (ServiceUsageException ex)38 {39 log.Error("Exception encountered while executing operation", ex);40 }41 catch (ServiceApiException ex)42 {43 log.Error("Exception encountered while executing operation", ex);44 }45 catch (SDKException ex)46 {47 log.Error("Exception encountered while executing operation", ex);48 }49 catch (IOException ex)50 {51 log.Error("Exception encountered while executing operation", ex);52 }53 catch (Exception ex)54 {55 log.Error("Exception encountered while executing operation", ex);56 }57 }5859 static void ConfigureLogging()60 {61 ILoggerRepository logRepository = LogManager.GetRepository(Assembly.GetEntryAssembly());62 XmlConfigurator.Configure(logRepository, new FileInfo("log4net.config"));63 }64 }65}
Copied to your clipboard1// Get the samples from https://github.com/adobe/pdfservices-node-sdk-samples2// Run the sample:3// node src/autotagpdf/autotag-pdf.js45const {6 ServicePrincipalCredentials,7 PDFServices,8 MimeType,9 AutotagPDFJob,10 AutotagPDFResult,11 SDKError,12 ServiceUsageError,13 ServiceApiError,14} = require("@adobe/pdfservices-node-sdk");15const fs = require("fs");1617(async () => {18 let readStream;19 try {20 // Initial setup, create credentials instance21 const credentials = new ServicePrincipalCredentials({22 clientId: process.env.PDF_SERVICES_CLIENT_ID,23 clientSecret: process.env.PDF_SERVICES_CLIENT_SECRET24 });2526 // Creates a PDF Services instance27 const pdfServices = new PDFServices({credentials});2829 // Creates an asset(s) from source file(s) and upload30 readStream = fs.createReadStream("./autotagPDFInput.pdf");31 const inputAsset = await pdfServices.upload({32 readStream,33 mimeType: MimeType.PDF34 });3536 // Creates a new job instance37 const job = new AutotagPDFJob({inputAsset});3839 // Submit the job and get the job result40 const pollingURL = await pdfServices.submit({job});41 const pdfServicesResponse = await pdfServices.getJobResult({42 pollingURL,43 resultType: AutotagPDFResult44 });4546 // Get content from the resulting asset(s)47 const resultAsset = pdfServicesResponse.result.taggedPDF;48 const streamAsset = await pdfServices.getContent({asset: resultAsset});4950 // Creates an output stream and copy stream asset's content to it51 const outputFilePath = "./autotag-tagged.pdf";52 console.log(`Saving asset at ${outputFilePath}`);5354 let writeStream = fs.createWriteStream(outputFilePath);55 streamAsset.readStream.pipe(writeStream);56 } catch (err) {57 if (err instanceof SDKError || err instanceof ServiceUsageError || err instanceof ServiceApiError) {58 console.log("Exception encountered while executing operation", err);59 } else {60 console.log("Exception encountered while executing operation", err);61 }62 } finally {63 readStream?.destroy();64 }65})();
Copied to your clipboard1# Get the samples from https://github.com/adobe/pdfservices-python-sdk-samples2# Run the sample:3# python src/autotagpdf/autotag_pdf.py45logging.basicConfig(level=os.environ.get('LOGLEVEL', 'INFO'))67try:8 # get base path.9 base_path = str(Path(__file__).parents[2])1011 # Initial setup, create credentials instance.12 credentials = Credentials.service_principal_credentials_builder() \13 .with_client_id('PDF_SERVICES_CLIENT_ID') \14 .with_client_secret('PDF_SERVICES_CLIENT_SECRET') \15 .build()1617 # Create an ExecutionContext using credentials and create a new operation instance.18 execution_context = ExecutionContext.create(credentials)19 autotag_pdf_operation = AutotagPDFOperation.create_new()2021 # Set operation input from a source file.22 input_file_path = 'autotagPdfInput.pdf'23 source = FileRef.create_from_local_file(base_path + '/resources/' + input_file_path)24 autotag_pdf_operation.set_input(source)2526 # Execute the operation.27 autotag_pdf_output: AutotagPDFOutput = autotag_pdf_operation.execute(execution_context)2829 input_file_name = Path(input_file_path).stem30 base_output_path = base_path + '/output/AutotagPDF/'3132 Path(base_output_path).mkdir(parents=True, exist_ok=True)33 tagged_pdf_path = f'{base_output_path}{input_file_name}-tagged.pdf'3435 # Save the result to the specified location.36 autotag_pdf_output.get_tagged_pdf().save_as(tagged_pdf_path)3738except (ServiceApiException, ServiceUsageException, SdkException) as e:39 logging.exception(f'Exception encountered while executing operation: {e}')
Copied to your clipboard1// Please refer our REST API docs for more information2// https://developer.adobe.com/document-services/docs/apis/#tag/PDF-Accessibility-Auto-Tag34curl --location --request POST 'https://pdf-services.adobe.io/operation/autotag' \5--header 'x-api-key: {{Placeholder for client_id}}' \6--header 'Content-Type: application/json' \7--header 'Authorization: Bearer {{Placeholder for token}}' \8--data-raw '{9 "assetID": "urn:aaid:AS:UE1:23c30ee0-2e4d-46d6-87f2-087832fca718"10}'
Generate tagged PDF by setting options with command line arguments
The sample below generates a tagged PDF by setting options through command line arguments.
Here is a sample list of command line arguments and their description:
- --input < input file path >
- --output < output file path >
- --report { If this argument is present then the output will be generated with the report }
- --shift_headings { If this argument is present then the headings will be shifted in the output PDF file }
Java
.NET
Node JS
Python
Copied to your clipboard1// Get the samples from https://github.com/adobe/pdfservices-java-sdk-samples2// Run the sample:3// mvn -f pom.xml exec:java -Dexec.mainClass=com.adobe.pdfservices.operation.samples.autotagpdf.AutotagPDFParamaterised45public class AutotagPDFParameterised {67 private static final org.slf4j.Logger LOGGER = LoggerFactory.getLogger(AutotagPDFParameterised.class);89 public static void main(String[] args) {10 LOGGER.info("--input " + getInputFilePathFromCmdArgs(args));11 LOGGER.info("--output " + getOutputFilePathFromCmdArgs(args));12 LOGGER.info("--report " + getGenerateReportFromCmdArgs(args));13 LOGGER.info("--shift_headings " + getShiftHeadingsFromCmdArgs(args));1415 try (InputStream inputStream = Files.newInputStream(new File(getInputFilePathFromCmdArgs(args)).toPath())) {16 // Initial setup, create credentials instance17 Credentials credentials = new ServicePrincipalCredentials(18 System.getenv("PDF_SERVICES_CLIENT_ID"),19 System.getenv("PDF_SERVICES_CLIENT_SECRET"));2021 // Creates a PDF Services instance22 PDFServices pdfServices = new PDFServices(credentials);2324 // Creates an asset(s) from source file(s) and upload25 Asset asset = pdfServices.upload(inputStream, PDFServicesMediaType.PDF.getMediaType());2627 // Create parameters for the job28 AutotagPDFParams autotagPDFParams = getOptionsFromCmdArgs(args);2930 // Creates a new job instance31 AutotagPDFJob autotagPDFJob = new AutotagPDFJob(asset)32 .setParams(autotagPDFParams);3334 // Submit the job and gets the job result35 String location = pdfServices.submit(autotagPDFJob);36 PDFServicesResponse<AutotagPDFResult> pdfServicesResponse = pdfServices.getJobResult(location, AutotagPDFResult.class);3738 // Get content from the resulting asset(s)39 Asset resultAsset = pdfServicesResponse.getResult().getTaggedPDF();40 Asset resultAssetReport = pdfServicesResponse.getResult().getReport();41 StreamAsset streamAsset = pdfServices.getContent(resultAsset);42 StreamAsset streamAssetReport = (autotagPDFParams != null && autotagPDFParams.isGenerateReport()) ?43 pdfServices.getContent(resultAssetReport) : null;4445 // Creating output streams and copying stream assets' content to it46 Files.createDirectories(Paths.get("output/"));47 String outputPath = getOutputFilePathFromCmdArgs(args);48 OutputStream outputStream = Files.newOutputStream(new File(outputPath + "autotagPDFInput-tagged.pdf").toPath());49 LOGGER.info("Saving asset at " + outputPath + "autotagPDFInput-tagged.pdf");50 IOUtils.copy(streamAsset.getInputStream(), outputStream);51 if(streamAssetReport != null) {52 OutputStream outputStreamReport = Files.newOutputStream(new File(outputPath + "autotagPDFInput-report.xlsx").toPath());53 LOGGER.info("Saving asset at " + outputPath + "autotagPDFInput-report.xlsx");54 IOUtils.copy(streamAssetReport.getInputStream(), outputStreamReport);55 outputStreamReport.close();56 }57 } catch (ServiceApiException | IOException | SDKException | ServiceUsageException e) {58 LOGGER.error("Exception encountered while executing operation", e);59 }60 }6162 private static AutotagPDFParams getOptionsFromCmdArgs(String[] args) {63 Boolean generateReport = getGenerateReportFromCmdArgs(args);64 Boolean shiftHeadings = getShiftHeadingsFromCmdArgs(args);65 AutotagPDFParams.Builder autotagPDFParamsBuilder = AutotagPDFParams.autotagPDFParamsBuilder();6667 if (generateReport)68 autotagPDFParamsBuilder.generateReport();69 if (shiftHeadings)70 autotagPDFParamsBuilder.shiftHeadings();7172 return autotagPDFParamsBuilder.build();73 }7475 private static Boolean getShiftHeadingsFromCmdArgs(String[] args) {76 return Arrays.asList(args).contains("--shift_headings");77 }7879 private static Boolean getGenerateReportFromCmdArgs(String[] args) {80 return Arrays.asList(args).contains("--report");81 }8283 private static String getInputFilePathFromCmdArgs(String[] args) {84 String inputFilePath = "src/main/resources/autotagPDFInput.pdf";85 int inputFilePathIndex = Arrays.asList(args).indexOf("--input");86 if (inputFilePathIndex >= 0 && inputFilePathIndex < args.length - 1) {87 inputFilePath = args[inputFilePathIndex + 1];88 } else89 LOGGER.info("input file not specified, using default value : autotagPDFInput.pdf");9091 return inputFilePath;92 }9394 private static String getOutputFilePathFromCmdArgs(String[] args) {95 String outputFilePath = "output/AutotagPDFParameterised/";96 int outputFilePathIndex = Arrays.asList(args).indexOf("--output");97 if (outputFilePathIndex >= 0 && outputFilePathIndex < args.length - 1) {98 outputFilePath = args[outputFilePathIndex + 1];99 } else100 LOGGER.info("output path not specified, using default value : " + outputFilePath);101102 return outputFilePath;103 }104}105
Copied to your clipboard12// Get the samples from https://github.com/adobe/PDFServices.NET.SDK.Samples3// Run the sample:4// cd AutotagPDF/5// dotnet run .csproj67namespace AutotagPDFParameterised8{9 class Program10 {11 private static readonly ILog log = LogManager.GetLogger(typeof(Program));1213 private static AutotagPDFOptions GetOptionsFromCmdArgs(String[] args)14 {15 Boolean generateReport = GetGenerateReportFromCmdArgs(args);16 Boolean shiftHeadings = GetShiftHeadingsFromCmdArgs(args);1718 AutotagPDFOptions.Builder builder = AutotagPDFOptions.AutotagPDFOptionsBuilder();1920 if (generateReport) builder.GenerateReport();21 if (shiftHeadings) builder.ShiftHeadings();2223 return builder.Build();24 }2526 private static Boolean GetShiftHeadingsFromCmdArgs(String[] args)27 {28 return Array.Exists(args, element => element == "--shift_headings");29 }3031 private static Boolean GetGenerateReportFromCmdArgs(String[] args)32 {33 return Array.Exists(args, element => element == "--report");34 }3536 private static String GetInputFilePathFromCmdArgs(String[] args)37 {38 String inputFilePath = @"autotagPdfInput.pdf";39 int inputFilePathIndex = Array.IndexOf(args, "--input");40 if (inputFilePathIndex >= 0 && inputFilePathIndex < args.Length - 1)41 {42 inputFilePath = args[inputFilePathIndex + 1];43 }44 else45 log.Info("input file not specified, using default value : autotagPdfInput.pdf");4647 return inputFilePath;48 }4950 private static String GetOutputFilePathFromCmdArgs(String[] args)51 {52 String outputFilePath = Directory.GetCurrentDirectory() + "/output/";53 int outputFilePathIndex = Array.IndexOf(args, "--output");54 if (outputFilePathIndex >= 0 && outputFilePathIndex < args.Length - 1)55 {56 outputFilePath = args[outputFilePathIndex + 1];57 }58 else59 log.Info("output path not specified, using default value : /output/");6061 return outputFilePath;62 }6364 static void Main(string[] args)65 {66 //Configure the logging67 ConfigureLogging();6869 log.Info("--input " + GetInputFilePathFromCmdArgs(args));70 log.Info("--output " + GetOutputFilePathFromCmdArgs(args));71 log.Info("--report " + GetGenerateReportFromCmdArgs(args));72 log.Info("--shift_headings " + GetShiftHeadingsFromCmdArgs(args));7374 try75 {76 // Initial setup, create credentials instance.77 Credentials credentials = Credentials.ServicePrincipalCredentialsBuilder()78 .WithClientId("PDF_SERVICES_CLIENT_ID")79 .WithClientSecret("PDF_SERVICES_CLIENT_SECRET")80 .Build();8182 //Create an ExecutionContext using credentials and create a new operation instance.83 ExecutionContext executionContext = ExecutionContext.Create(credentials);84 AutotagPDFOperation autotagPDFOperation = AutotagPDFOperation.CreateNew();8586 // Provide an input FileRef for the operation87 autotagPDFOperation.SetInput(FileRef.CreateFromLocalFile(GetInputFilePathFromCmdArgs(args)));8889 // Get and Build AutotagPDF options from command line args and set them into the operation90 AutotagPDFOptions autotagPDFOptions = GetOptionsFromCmdArgs(args);91 autotagPDFOperation.SetOptions(autotagPDFOptions);9293 // Execute the operation94 AutotagPDFOutput autotagPDFOutput = autotagPDFOperation.Execute(executionContext);9596 // Save the output files at the specified location97 string outputPath = GetOutputFilePathFromCmdArgs(args);98 FileRef taggedPDF = autotagPDFOutput.GetTaggedPDF();99 taggedPDF.SaveAs(Directory.GetCurrentDirectory() + outputPath + "autotagPDFInput-tagged.pdf");100 if (autotagPDFOptions != null && autotagPDFOptions.IsGenerateReport)101 autotagPDFOutput.GetReport()102 .SaveAs(Directory.GetCurrentDirectory() + outputPath + "autotagPDFInput-report.xlsx");103 }104 catch (ServiceUsageException ex)105 {106 log.Error("Exception encountered while executing operation", ex);107 }108 catch (ServiceApiException ex)109 {110 log.Error("Exception encountered while executing operation", ex);111 }112 catch (SDKException ex)113 {114 log.Error("Exception encountered while executing operation", ex);115 }116 catch (IOException ex)117 {118 log.Error("Exception encountered while executing operation", ex);119 }120 catch (Exception ex)121 {122 log.Error("Exception encountered while executing operation", ex);123 }124 }125126 static void ConfigureLogging()127 {128 ILoggerRepository logRepository = LogManager.GetRepository(Assembly.GetEntryAssembly());129 XmlConfigurator.Configure(logRepository, new FileInfo("log4net.config"));130 }131}132}
Copied to your clipboard1// Get the samples from https://github.com/adobe/pdfservices-node-sdk-samples2// Run the sample:3// node src/autotag/autoag-pdf-parameterised.js45const {6 ServicePrincipalCredentials,7 PDFServices,8 MimeType,9 AutotagPDFParams,10 AutotagPDFJob,11 AutotagPDFResult,12 SDKError,13 ServiceUsageError,14 ServiceApiError15} = require("@adobe/pdfservices-node-sdk");16const fs = require("fs");17const args = process.argv;1819(async () => {20 let readStream;21 try {22 console.log("--input " + getInputFilePathFromCmdArgs(args));23 console.log("--output " + getOutputFilePathFromCmdArgs(args));24 console.log("--report " + getGenerateReportFromCmdArgs(args));25 console.log("--shift_headings " + getShiftHeadingsFromCmdArgs(args));2627 // Initial setup, create credentials instance28 const credentials = new ServicePrincipalCredentials({29 clientId: process.env.PDF_SERVICES_CLIENT_ID,30 clientSecret: process.env.PDF_SERVICES_CLIENT_SECRET31 });3233 // Creates a PDF Services instance34 const pdfServices = new PDFServices({credentials});3536 // Creates an asset(s) from source file(s) and upload37 readStream = fs.createReadStream(getInputFilePathFromCmdArgs(args));38 const inputAsset = await pdfServices.upload({39 readStream,40 mimeType: MimeType.PDF41 });4243 // Create parameters for the job44 const params = new AutotagPDFParams({45 generateReport: getGenerateReportFromCmdArgs(args),46 shiftHeadings: getShiftHeadingsFromCmdArgs(args)47 });4849 // Creates a new job instance50 const job = new AutotagPDFJob({inputAsset, params});5152 // Submit the job and get the job result53 const pollingURL = await pdfServices.submit({job});54 const pdfServicesResponse = await pdfServices.getJobResult({55 pollingURL,56 resultType: AutotagPDFResult57 });5859 // Get content from the resulting asset(s)60 const resultAsset = pdfServicesResponse.result.taggedPDF;61 const resultAssetReport = pdfServicesResponse.result.report;62 const streamAsset = await pdfServices.getContent({asset: resultAsset});63 const streamAssetReport = resultAssetReport64 ? await pdfServices.getContent({asset: resultAssetReport})65 : undefined;6667 // Creates an output stream and copy stream asset's content to it68 const outputPath = getOutputFilePathFromCmdArgs(args);69 const outputFilePath = outputPath + "autotagPDFInput-tagged.pdf";70 console.log(`Saving asset at ${outputFilePath}`);7172 const writeStream = fs.createWriteStream(outputFilePath);73 streamAsset.readStream.pipe(writeStream);74 if (resultAssetReport) {75 const outputFileReportPath = outputPath + "autotagPDFInput-report.xlsx";76 console.log(`Saving asset at ${outputFileReportPath}`);7778 const writeStream = fs.createWriteStream(outputFileReportPath);79 streamAssetReport.readStream.pipe(writeStream);80 }81 } catch (err) {82 if (err instanceof SDKError || err instanceof ServiceUsageError || err instanceof ServiceApiError) {83 console.log("Exception encountered while executing operation", err);84 } else {85 console.log("Exception encountered while executing operation", err);86 }87 } finally {88 readStream?.destroy();89 }90})();9192function getInputFilePathFromCmdArgs(args) {93 let inputFilePath = "resources/autotagPdfInput.pdf";94 let inputFilePathIndex = args.indexOf("--input");95 if (inputFilePathIndex >= 0 && inputFilePathIndex < args.length - 1) {96 inputFilePath = args[inputFilePathIndex + 1];97 } else98 console.log("input file not specified, using default value : autotagPdfInput.pdf");99 return inputFilePath;100}101102function getOutputFilePathFromCmdArgs(args) {103 let outputFilePath = "output/";104 let outputFilePathIndex = args.indexOf("--output");105 if (outputFilePathIndex >= 0 && outputFilePathIndex < args.length - 1) {106 outputFilePath = args[outputFilePathIndex + 1];107 } else {108 console.log("output path not specified, using default value :" + outputFilePath);109 fs.mkdirSync(outputFilePath, {recursive: true});110 }111 return outputFilePath;112}113114function getGenerateReportFromCmdArgs(args) {115 return args.includes("--report");116}117118function getShiftHeadingsFromCmdArgs(args) {119 return args.includes("--shift_headings");120}
Copied to your clipboard1# Get the samples from https://github.com/adobe/pdfservices-python-sdk-samples2# Run the sample:3# python src/autotagpdf/autotag_pdf_parameterised.py --report --shift_headings --input resources/autotagPdfInput.pdf --output output/45logging.basicConfig(level=os.environ.get('LOGLEVEL', 'INFO'))678class AutotagPDFParameterised:910 _input_path: str11 _output_path: str12 _generate_report: bool13 _shift_headings: bool1415 base_path = str(Path(__file__).parents[2])1617 def __init__(self):18 pass1920 @staticmethod21 def parse_args(*args: str):22 if not args:23 args = sys.argv[1:]24 parser = argparse.ArgumentParser(description='Autotag PDF')2526 parser.add_argument('--input', help='Input file path', type=Path, metavar='input')27 parser.add_argument('--output', help='Output path', type=Path, dest='output')28 parser.add_argument('--report', dest='report', action='store_true', help='Generate report(in XLSX format)',29 default=False)30 parser.add_argument('--shift_headings', dest='shift_headings', action='store_true', help='Shift headings',31 default=False)3233 return parser.parse_args(args)3435 def get_default_input_file_path(self) -> str:36 return self.base_path + '/resources/autotagPdfInput.pdf'3738 def get_default_output_file_path(self) -> str:39 return self.base_path + '/output/AutotagPDFParameterised'4041 def get_autotag_pdf_options(self) -> AutotagPDFOptions:42 shift_headings = self._shift_headings43 generate_report = self._generate_report4445 builder: AutotagPDFOptions.Builder = AutotagPDFOptions.builder()46 if shift_headings:47 builder.with_shift_headings()48 if generate_report:49 builder.with_generate_report()50 return builder.build()5152 def execute(self, *args: str) -> None:53 args = self.parse_args(*args)54 self._input_path = args.input if args.input else self.get_default_input_file_path()55 self._output_path = args.output if args.output else self.get_default_output_file_path()56 self._generate_report = args.report57 self._shift_headings = args.shift_headings5859 self.autotag_pdf()6061 def autotag_pdf(self):62 try:63 # Initial setup, create credentials instance.64 credentials = Credentials.service_principal_credentials_builder() \65 .with_client_id('PDF_SERVICES_CLIENT_ID') \66 .with_client_secret('PDF_SERVICES_CLIENT_SECRET') \67 .build()6869 # Create an ExecutionContext using credentials and create a new operation instance.70 execution_context = ExecutionContext.create(credentials)71 autotag_pdf_operation = AutotagPDFOperation.create_new()7273 # Set operation input from a source file.74 source = FileRef.create_from_local_file(self._input_path)75 autotag_pdf_operation.set_input(source)7677 # Build AutotagPDF options and set them into the operation78 autotag_pdf_operation.set_options(self.get_autotag_pdf_options())7980 # Execute the operation.81 autotag_pdf_output: AutotagPDFOutput = autotag_pdf_operation.execute(execution_context)8283 input_file_name = Path(self._input_path).stem84 base_output_path = self._output_path8586 Path(base_output_path).mkdir(parents=True, exist_ok=True)8788 # Save the result to the specified location.89 tagged_pdf_path = f'{base_output_path}/{input_file_name}-tagged.pdf'90 autotag_pdf_output.get_tagged_pdf().save_as(tagged_pdf_path)91 if self._generate_report:92 report_path = f'{base_output_path}/{input_file_name}-report.xlsx'93 autotag_pdf_output.get_report().save_as(report_path)9495 except (ServiceApiException, ServiceUsageException, SdkException) as e:96 logging.exception(f'Exception encountered while executing operation: {e}')979899if __name__ == "__main__":100 autotag_pdf_parameterised = AutotagPDFParameterised()101 autotag_pdf_parameterised.execute()