Download presentation
Presentation is loading. Please wait.
Published byReynard Turner Modified over 9 years ago
1
PDF AUTOMATION Pro The third eye for all your PDF Automation needs
SICPA Test Automation Consulting Proposal
2
Introduction to PDF Automation Pro
What? Cognizant’s solution for test automation of PDF documents, consisting of a suite of 3 tools Designed to address most aspects of PDF automation such as comparison of similar documents, extraction of specific data from a document, automating an interactive form, etc. Supports integration with most of the functional testing tools in the market Why? PDF Automation Pro helps to significantly reduce the manual effort required for PDF automation PDF Automation Pro eliminates any manual errors which might creep in, especially for large documents PDF Automation Pro fits perfectly into the existing test infrastructure and enables integration with end-to-end tests Introduction to PDF Automation Pro Who? PDF Automation Pro has been created by the Research and Development team from Cognizant’s Automation Centre of Excellence PDF Automation Pro is continuously enhanced and updated by the R&D team, based on feedback from the end users of the tool PDF Automation Pro has a dedicated helpdesk to assist end users with implementation and troubleshooting
3
Overview of PDF Automation Pro
PDF Probe PDF Assist PDF PerFORM Provides a solution for automation of PDF interactive forms. Supports filling up empty forms as well as extracting data from filled-in forms Provides a solution for comparing PDF documents and reporting the differences, if any Also supports comparison of a PDF document with an MS Word document Provides a solution to extract specific content from within a PDF document based on user defined criteria. The extracted content can subsequently be validated against an expected result for testing purposes. Overview of PDF Automation Pro Core Features Each of these tools comes with a simple and user friendly GUI which can be directly used to automate the PDF documents as required. In addition, all the 3 tools expose APIs which enable easy integration with most of the functional automation tools in the market. A handy code generator is included with all the tools, which automatically generates the API calls required to automate the PDF documents as required. These code generators support multiple languages including VBScript, C#, VB.NET and Java, and generate code which is consistent with Cognizant’s accepted standards and conventions.
4
Overview of PDF Probe Highlights SRC DOCUMENT COMPARATOR TRG
Comparison Features Textual content comparison Font size comparison Font family comparison Font style comparison (Bold and Italics) Font colour comparison Line spacing comparison Whitespace comparison Special Features Ability to compare a specified range of pages Batch comparison of multiple document sets Batch comparison of multiple documents against a specified template Provision to ignore the case (uppercase/lowercase) while comparing Supports comparison of multi-column text, tables, header and footer Supports comparison of password protected documents Comparison Reports Visual report in HTML format Detailed report in Excel format (optional) Both reports contain a high level summary, as well as corresponding performance statistics Highlights
5
Overview of PDF Assist Highlights Text Extraction features
SEARCH VALIDATE APPLICATION PROGRAMMING INTERFACE Text Extraction features Get the occurrence count of a specified word Get the word next to a given search key Get the text in between two specified words Get the hash value for a given key based on a specified delimiter (for key-value pairs separated by a delimiter such as “:”) Get the metadata of a given word, including font name, colour, width, etc. Get the document metadata , including PDF Author,PDF title,PDF producer etc. Special features Enables fine-tuning the content extraction with features such as limiting the search to a specified range of pages, case sensitive searching, etc. Overview of PDF Assist Highlights Supports searching within tables as well as document headers/footers Supports extracting content from password protected documents Image Extraction features Extract the specified image from the document Get the metadata of a specified image, including the position, dimensions, and pixel-by-pixel data UI features Clearly displays the description, input parameters and return values for the API selected Validates the user inputs to ensure that they are within acceptable boundaries
6
Overview of PDF PerFORM
APPLICATION PROGRAMMING INTERFACE FILL VALIDATE EXTRACT Overview of PDF PerFORM Form filling features Get the complete list of form fields from the document loaded Select specific fields to be filled in – this includes all types of fields such as textboxes, checkboxes, radio buttons, etc. Specify appropriate values for the selected fields Fill the specified values and save the filled form into a specified location Form values extraction/validation features Get the complete list of form fields from the document loaded Select specific fields whose values are to be extracted – this includes all types of fields such as textboxes, checkboxes, radio buttons, etc. If required, specify the expected values for the selected fields Extract the values from the fields specified Compare the extracted values with the expected results (if specified), and report any differences found Highlights
7
PDF Probe Comparison with other tools
DiffPDF DiffDoc Adobe Acrobat Pro i-net PDFC PDF Probe Textual content comparison (including headers, footers, tables, multi-column text, etc.) Comparison of metadata such as font color, font family, font size, font style, etc. Comparison of images Partially possible, using the "Compare Appearance" mode Integration with functional automation tools Execution can be triggered using the command line, but the comparison results cannot be retrieved and reported from the automation tool No API or command line execution possible to enable integrations with functional automation tools A Java API is provided, which enables integration with any Java based automation tool; this can be used for continuous integration as well. Apart from this, a command line option is also available, however the comparison results cannot be retrieved and reported from the automation tool in this case. Yes, the API provided enables integration with most of the automation tools. This can be used for continuous integration as well. The API calls are automatically generated by the tool. Support for password protected documents Provision for bulk comparison and template comparison Visual report highlighting the differences Detailed report documenting the differences Compare MS Word with PDF Licensing Open Source Licensed Priced PDF Probe Comparison with other tools
8
Comparison with other tools (contd.)
Document content extraction tools: There are many tools which enable extraction of content from a PDF document However, such tools provide only basic features such as extracting all the text from the document or from a specific page None of the tools provide the range of search criteria as provided by PDF Assist, which helps to really zero in on the exact content required to be extracted from the document To sum up, PDF Assist is probably the most advanced tool in the PDF content extraction space Interactive PDF forms automation tools: There are many APIs available which enable the automation of PDF forms by writing appropriate scripts Adobe has also released its Adobe Test Toolkit to cater to this requirement, however, the tool has not really matured yet The USP of PDF PerFORM in this space is the code generation facility, as well as the ability to directly fill an empty form or extract content from a filled-in form through the GUI provided Comparison with other tools (contd.)
9
Appendix
10
Limitations of PDF Automation Pro
General: Documents created by non-standard PDF writers may not be processed properly. If a single word or a single line contains multiple font faces, the results may be unexpected. The time taken to load the document for processing is directly proportional to the size of the document. Large documents may take a long time to load. PDF Probe: Images cannot be compared. The recommended approach here is to use PDF Assist to extract the required images and use any available image comparison algorithms. Values in form fields like checkboxes, radio buttons, etc. cannot be compared, and the presence of such fields may affect the accuracy of the comparison. When images are present in the document, the line spacing comparison might be affected. The comparison may be inaccurate if there are significant differences with respect to margin and line spacing between the documents. Split sections within documents are supported; however, the word wrapping must be similar across the source and target documents. Limitations of PDF Automation Pro
11
Limitations of PDF Automation Pro
PDF Probe (contd.): Word documents with tables in headers cannot be compared. If there is a text content deviation together with any other deviation like font size, color, etc., only the text deviation will be highlighted in the tool’s HTML report. The Excel report, however, will capture all the differences. Documents may not be compared properly if the font size of the words in the document is too small Border lines ,underline, table borders may not be displayed in html report The comparison may be inaccurate if the same content of source scattered in different position(page) of target document. PDF Probe does not support Page range for WORD-PDF. WORD- PDF comparison’s performance is slower than PDF-PDF comparison. Based on the coordinates retrieved by third party tool(Used internally for retrieving the PDF content), the html report are generated. Therefore html report accuracy it depends on the quality of PDF. Word Document with Image can give unexpected results Tool will read the content line by line even though it is a table. It won’t read the values cell by cell or column by column. Therefore if you find any text deviation in a line together with any other deviation like font size, color, etc., only the text deviation will be highlighted in the tool’s HTML report and in Excel report Limitations of PDF Automation Pro
12
Limitations of PDF Automation Pro
PDF Assist: General Images accessed using PDF Assist reflect the properties of the original image file, even if some of these properties may have changed while embedding it into the document. For example: The image may have been resized within the document, but PDF Assist will return only the original size of the image. The image may have been rotated by some angle while placing it into the document, but PDF Assist will return the original orientation of the image. In rare cases, words in upper case may be wrongly perceived by PDF Assist to be lower case. API Values in form fields like checkboxes and special characters cannot be extracted. Split sections within documents are supported; however, the following points must be taken into consideration: PDF Assist considers each line of text as one cutting across all the sections. In some documents, the split sections may not be aligned equally on the horizontal plane, causing PDF Assist to read each of the sectioned portions as a separate line. In some cases, images in “.tiff” format may be recognized as “.png” images. Though the API supports Java, it is not possible to use the API in platforms other than Windows. Limitations of PDF Automation Pro
13
Limitations of PDF Automation Pro
UI Certain documents may not load properly in the UI; however, this will not affect the working of the API. For example: If there is any text overlapping on top of an image, it may not be rendered properly. If there is any text which is aligned vertically in the document, it will be rendered horizontally within the UI. For API functions which return an array, the UI generates code only for the first element in the array. This code has to be extended if the user needs to access other elements of the array. PDF PerFORM: The API does not have any provision to obtain the page numbers under which each of the form fields are present (unless the document contains bookmarks) Limitations of PDF Automation Pro
14
Thank you
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.