Session 803: Processing PDF Files Gaeir Dietrich Director High Tech Center Training Unit
Overview Explanation of PDF Explanation of PDF Programs that work with PDF files Programs that work with PDF files –Adobe Reader –Acrobat Pro Processing with Acrobat Pro Processing with Acrobat Pro Processing with OCR Programs Processing with OCR Programs Clean-up in Word Clean-up in Word
PDF Great starting point Great starting point –Contains all text and graphics –Easy to generate Word files once you learn how –Reduces retyping Excellent format for creating large print Excellent format for creating large print
What is a PDF? Portable document format (PDF) Portable document format (PDF) Reads the same on any computer Reads the same on any computer Looks like the book Looks like the book Contains all the text Contains all the text Easy for publishers Easy for publishers
Types of PDF Documents Text-based PDF Text-based PDF –Searchable Graphical PDF Graphical PDF –Picture of text (i.e., a graphic) Use text-selection (I-beam) tool to tell the difference Use text-selection (I-beam) tool to tell the difference –Text can be selected; graphics cannot
PDFs and Publishers Easy for publishers Easy for publishers –Even small publishers can create a PDF Most accurate format Most accurate format –Looks like the book –Includes page numbers and all text Will be complete Will be complete –BUT watch out for teacher’s editions
Security Issues PDF files can be locked in various ways PDF files can be locked in various ways Some files can be read but no text extracted Some files can be read but no text extracted If you receive a locked PDF, go back to the publisher If you receive a locked PDF, go back to the publisher
Working with PDF Files Native utilities from Adobe Native utilities from Adobe –Adobe Reader –Acrobat Pro Optical character recognition (OCR) Optical character recognition (OCR) Free extraction tool: Balabolka Free extraction tool: Balabolka
Which PDF Software? Adobe Reader Adobe Reader –Free –Open, view, and read (including TTS) – Acrobat Pro Acrobat Pro –Discounted educational pricing –Crop pages, delete/combine pages, renumber pages, extract text –Highly recommended for alternate format producers
Reading Features in Adobe Reader Access text-based PDFs within Reader Access text-based PDFs within Reader Reads aloud Reads aloud –But does not highlight or track Enlarges text Enlarges text –Nice reflow feature Changes text/background colors Changes text/background colors Text highlighting, sticky notes, and comments Text highlighting, sticky notes, and comments
Production Features in Reader Really designed for reading, not reformatting Really designed for reading, not reformatting Export PDF Export PDF –Subscription service (about $20/year) –Upload PDF file, service auto-converts to Word, download
Process with Acrobat Pro Cropping Cropping Enlargement for printing Enlargement for printing Tiling Tiling Extracting/deleting pages Extracting/deleting pages Combining/inserting pages Combining/inserting pages Text extraction Text extraction –Works best with text-based PDF
Customize Quick Tools Click on the “gear” Click on the “gear” View > Show/hide > Toolbar Items > Quick Tools View > Show/hide > Toolbar Items > Quick Tools
Quick Tools Menu
Customize
Please Note To enable single-key shortcuts To enable single-key shortcuts –Open Preferences dialog box Ctrl + K – Under General > select Use Single-Key Accelerators To Access Tools (first checkbox under Basic Tools)
Cropping Tools > Pages > Crop Tools > Pages > Crop Shortcut: C Shortcut: C (Please note: This shortcut brings up the mouse-driven cropping tool—must double click to open the dialog box!) (Please note: This shortcut brings up the mouse-driven cropping tool—must double click to open the dialog box!)
Crop Tool
Crop Toolbox
Enlarging Choose paper size/printer Choose paper size/printer File > Print > Size…to Fit File > Print > Size…to Fit Shortcut: Ctrl + P (tab through) Shortcut: Ctrl + P (tab through) Tip: Crop document before enlarging Tip: Crop document before enlarging
Print to Fit
Tiling Choose paper size/printer Choose paper size/printer File > Print > Poster > Tile Scale and Overlap File > Print > Poster > Tile Scale and Overlap Shortcut: Ctrl + P (tab through) Shortcut: Ctrl + P (tab through) Tip: Crop document before tiling Tip: Crop document before tiling
Enlarge with Tiling
Extracting Pages Tools > Pages > Extract Tools > Pages > Extract Delete Shortcut: Ctrl + Shift + D Delete Shortcut: Ctrl + Shift + D Extract Pages Shortcut: Alt V + T + P (opens Pages pane; F6 focuses in pane and can arrow down) Extract Pages Shortcut: Alt V + T + P (opens Pages pane; F6 focuses in pane and can arrow down)
Extraction Tool
Tips for Extracting Chapters Crop on complete file before extracting Crop on complete file before extracting Work on a copy!!!!! Work on a copy!!!!! Extract from end toward front! Extract from end toward front! Use table of contents to help Use table of contents to help Place focus on first page of chapter to extract (beginning with last) Place focus on first page of chapter to extract (beginning with last)
Starting from the Back
Combining File > Pages > Insert File > Pages > Insert OR OR Create > Combine files Create > Combine files
Inserting Pages
Combining Pages
Auto Extracting Text File > Save As > MS Word File > Save As > MS Word –Retains styles and paragraphs File > Save As > More options… File > Save As > More options… –Text (Accessible) Lose styles, places hard returns at end of line Lose styles, places hard returns at end of line –Text (Plain) Lose styles, keeps paragraphs Lose styles, keeps paragraphs Shortcut: Alt F + A Shortcut: Alt F + A
Save As Options
More Control over Text For graphical PDFs For graphical PDFs Or Or To maintain more control over extracting text from text-based PDFs To maintain more control over extracting text from text-based PDFs Use an OCR program! Use an OCR program!
Better Text Extraction Use Optical Character Recognition (OCR) program Use Optical Character Recognition (OCR) program OCR programs analyze text and structure OCR programs analyze text and structure –Acrobat Pro has built-in OCR, but other programs provide more control
OCR Programs ABBYY FineReader Pro ABBYY FineReader Pro –Easier to learn –Somewhat better with structure –About $75 Nuance OmniPage Nuance OmniPage –A bit more accessible –A bit better with STEM materials –About $100
Kurzweil-users Note If students are using Kurzweil, then use Kurzweil for the OCR If students are using Kurzweil, then use Kurzweil for the OCR –Do not OCR and then load into Kurzweil unless you do not care about the page structure Use KESI virtual printer Use KESI virtual printer –Print from Acrobat or Adobe Reader –Creates KESI files –Will not work with locked files
OCR Programs Treat all graphics files the same Treat all graphics files the same –PDFs, TIFFs, JPEGs Load image file Load image file –Create templates Zone (analyze structure) Zone (analyze structure) Run OCR Run OCR
OCR Process Details Crop before loading into OCR program Crop before loading into OCR program Turn on multiple languages as needed Turn on multiple languages as needed –If doing math, turn on Greek –Only turn on the languages you need Edit in the OCR program Edit in the OCR program –Some OCR programs have font matching features Save to Word Save to Word
Once in Word Learn to use “show hidden” Learn to use “show hidden” –Ctrl + Shift + 8 Beware of the optional hyphen Beware of the optional hyphen –Search and replace to delete –Search for ^- replace with nothing –Run spell check Use styles to structure files for braille program Use styles to structure files for braille program
More information Gaeir (rhymes with “fire”) Dietrich Gaeir (rhymes with “fire”) Dietrich