DIzzIE's Scanning Tutorial (on using FineReader) BY: DIzzIE [antikopyright 2003] Intro. This is a quick tutorial on how to scan content (i.e. books, magazines, pamphlets and the like) using the popular software called FineReader. I'm well aware that there's a couple tuts already out on this, but they all focus around scanning fiction; that is to say they're all focused on OCRing text from text-only books, or conversely on perfecting images, i.e. scanning comic books. This guide will briefly discuss working with raw images (as well as touch on OCRing and the basic features of FineReader). 0. Naturally, you first need a scanner. You can pick one up cheap at pawnshop, salvation army type store, or from a friend or nearby library. You may also want to try scamming a scanner: dizzy.ws/kodak.htm . Alternatively, if you have a high quality digital camera some would suggest simply taking snapshots of pages. 1. Get ABBYY FineReader OCR Professional 7.0 (the latest version at the time of this guide). Download here: download.com.com/3000-2079- 10228095.html?tag=lst-0-1 or fill in your e-mail and get download link emailed to you here: download.abbyy.com/content/default.aspx 2. Download a keygen to register your try-n-buy version here: allcracks.net/html/a-1.html or find more places to download here: dizzy.ws/serials.htm 3. Once you install/run keygen, connect your scanner to your computer and run FineReader (FR). 4. First thing to do is go to Tools > Options, under the Scan/Open Image Tab if your scanner is not automatically listed in the TWAIN Driver box, click on Select Source. If nothing is showing up, this means FR can't detect your scanner. You should make sure the scanner is connected, turned on, and that you have the latest drivers. Go to the scanner manufacturer's website to download the latest drivers for your scanner. After this restart your computer. If after updating drivers/checking connection FR is still not picking up your scanner, try running the default software that came with your scanner. If even that does not work, contact your scanner's manufacturer. 5. If FR recognized your scanner in Step 4, that is if you can see your scanner's name in the TWAIN Driver box, then select the Use FineReader Interface radio button, not Use TWAIN-Source interface. If however in Step 4 you could only get your scanner to work with its default program, then keep the TWAIN-Source interface button checked. 6. If you are using the TWAIN-Source for scanner settings (the default program that came with your scanner) you will need to configure the same things I describe in the next steps in the default program for your scanner. Using the TWAIN-Source software is not recommended, only if you could not get FR to recognize your scanner. 7. Still in the Scan/Open Image Tab, click Scanner Settings. 8. Here you can configure a variety of options; these will vary depending on what you are scanning. A few guidelines: *Unless you are scanning something that is very light printed, the Brightness should be kept on Automatic (default), with the slide bar in the middle of the light/dark spectrum bar. *Paper size should be changed to match the dimensions of whatever you are scanning. This saves time in that you don't have to wait for the scanner bar to go all the way to the end for every scan. Also saves us the trouble of splitting excess image blocks later on. *Pause between pages is how long you want the scanner to wait before automatically scanning the next page. 5-10 seconds should be sufficient. *The Resolution should be at a minimum of 300dpi, moving upwards to 600dpi if what you are scanning is in small print/detailed pictures. *Pictures Scanning Mode should be color if you're scanning color images (magazines, book covers, etc), or grayscale if you're scanning text/b&w pictures. The black & white mode is not recommended as it produces grainy poor-quality images. *Unless you want to see the Scanner Settings dialog every time you scan a page, uncheck Show This Dialog Before Scanning. *If you have a feeder scanner (versus a flatbed), that is if you feed pages into your scanner like a fax machine versus lying them down on the scanner like a copy machine, you may want to select Use automatic document feeder (doesn't work for all feeder scanners). *Finally hit OK to exit out of Scanner Settings 9. Now let's configure a few more things in the Options menu again. Still in the Scan/Open Image Tab, select the following options: *Despeckle Image *Split Dual Pages (optional, more on this in a little bit) *Detect Image Orientation (during recognition) *Open Image During Scanning 10. Under the Recognition Tab select the following options: *Recognition Language: obviously make sure it's set to the language the content that you're scanning is in *Autodetect Layout *Clear Background Noise *Autodetect (print type) *Do not use user patterns 11. Under the Formatting Tab select the following options: *Retain Full Page Layout *Keep Pictures 12. Finally hit OK to exit out of the Options menu. Feel free to look at any other options and modify them as you wish, most are self explanatory and if not FR has a great help file (just hit F1) or download an additional FR tutorial from the manufacturer: download.abbyy.com/content/default.aspx 13. One more thing that needs to be changed: go to Process and select Start Background Recognition. 14. Before you start scanning, clean your scanner (if it's a flatbed) with some window cleaning solution, or just soapy water, use a window cleaner if possible to avoid streaks, or a towel with even swipes to avoid leaving streaks. Once your scanner is clean and dry proceed to step 15. 15. Now then, onto scanning. Position the material onto the scanner and hit the Scan&Read button. You should see a "collaborating scanner...." Pop-up window followed by a ScanGear progress bar. The image should then be scanned. Wait for the automatic recognition process to finish and then you can work on the image. 16. Let's look at the image you scanned. You should see a thumbnail picture of the image on the left-hand menu, a larger picture in the middle menu, and any recognized text on the right hand menu. The middle "image" window is where we'll be looking at next. 17. You need to make some decisions about how you want your finished scan to look: do you want it OCRed (optical character recognition), meaning that the words will be converted to text. The upside of OCRing is that your finished product will be smaller in terms of file size, it will be searchable for specific words, and it will be easier to read. The downside is that it takes more time to produce an OCRed text because it will require at least minimal proofreading of the text to root out any OCRing mistakes. OCRing is thus recommended if you're scanning a largely text-only book, have sufficient time on your hands to proofread the scan, and are not doing a precise text that involves important formulas/calculations. If you're scanning a magazine, comic book, or a scientific text with precise formulae, OCRing is NOT recommended. 18. In the Image menu, you should see a list of button on the left- hand side. The two we'll be working with are the OCR (text) button, and the Image button. They are the 2nd (The green-bordered T button) and 4th (the red-bordered mountain button) buttons from the top, respectively. Briefly, you select text blocks (that will be OCRed) and image blocks (that wont) and then hit the Read All button. But before you do this, there are a few things you need to do first. 19. If you had automatic image splitting enabled and FR didn't split the scanned images the way you want, or you want to get rid of excess borders and such, go to Image > Split Image and then select how you want to split the image. You can then delete portions you don't want by clicking on the thumbnail image in the left-hand menu, and pressing delete. 20. If the image is not rotated correctly, go to Image and choose the needed rotation. 21. Also if at any time you find that an image scanned badly or you skipped a page and such, scan the image again, it should now appear as the last numbered page in the thumbnailed Batch menu. Then, if you are simply replacing an image, select the image to be replaced in the thumbnailed Batch menu and delete the image. Then highlight (select) the rescanned image, go to Batch > Renumber Pages... and selecting Selected Pages, type in the page number that the original image was, thus sliding it into place. If you're inserting a missed image, things are a little trickier. Find the spot where the image should be and then do the following: (for this example the image should have been #21), select all images from the current 21 (inclusive, meaning select the current 21) to the end (non-inclusive, meaning don't select the image that you are going to be inserting), (click on number 21, hold down shift, and click on the last-to-last image). Go to Batch > Renumber Pages, and selecting All Pages, Continuous Page Renumbering, type in 22 for First Renumbered Page. Then repeat the steps for replacing an image explained in the preceding paragraph. 22. Now that you have your images scanned/fitted correctly, back to the middle Image menu we go. If you're not happy with the fields/boxes auto recognition selected for you, you can click on that box and just delete it. Then select portions that you want OCRed (if any), and the images. After you have done this for all scanned images hit Read All. Note that after you experiment with a few sample pages, you can select Scan&Read Multiple Images from the Scan&Read dropdown menu (this will save you the trouble of hitting the same button for every scan). 23. Once your new recognition has finished, if you have only chosen to recognize images (no OCRing) you are ready to save. 24. Click the Save button. If you want to save all your images as one PDF file (FR has a built-in PDF printer driver, so need to install any additional software ) click on Formats Settings... and go to the PDF Tab. Flirt with the Save Mode options, by saving only a page or two of your scan and seeing if you're satisfied with how it looks in the created PDF document. Text and Pictures Only will save only the pictures you recognized (recommended), while Page Image saves the original, unedited image seen as a thumbnail in the left- hand Batch menu. 25. Under Font Use Mode, keep the default Use Standard Fonts option, and under Reduce Picture Resolution To and JPEG Quality, experiment with amounts to balance the total file size (the higher resolution/quality the larger the file size) with image quality. You may want to create two versions of your scan, one with smaller file size and slightly worse quality, and one with a larger size and better quality. Regardless of different sizes/recognition ratios, any of your versions should be readable without eyestrain. After the Formats Settings, click Save to File (keeping the Keep Pictures box checked), select PDF, and keep the default save options unless there's something you want to change (all the save options are self explanatory so I wont go into them here). 26. Once you are satisfied with your PDF, you are done 27. If however, your scan involves text that you felt like OCRing, there is some more work that you will have to do. 28. Once you have selected all the text/image portions and clicked Read All (as per step 22), you will now need to edit/format the scanned material. This is best done in a word processing program rather than FR. 29. Select the Save button, and click on Formats Settings. Under the DOC/RTF/Word XML Tab select the following options: *Default Paper Size: Letter (the 'automatically increase paper size' feature usually does not matter, if you start getting irregular paper sizes, by all means uncheck it) *Make sure that everything else is unchecked, save for Retain Text Color and Save in Word 97 or Later Format (both are default options) 30. Back in the main save menu be sure to select either retain font and font size or remove all formatting in the retain layout section. Keeping the default radio button, retain full page layout, selected will result in restricted margins, awkward page breaks and other annoyances when you are editing the file. 31. Now save the scan as either doc or rtf (both can be opened using Microsoft Word, or the free WordPad or the free desktop publishing package Openoffice – .openoffice.org ). You will now have to proofread/format the scan. Some basic things to do include: *Thoroughly skim over the text/spell check to catch any spelling mistakes, as well as any false positives, that is words that when OCRed form real words, just not the correct contextual words, for instance "mom" instead of "morn" *Cut/Paste misplaced pictures/captions/titles. While the advantages of saving without the formatting feature in FR are many, one of the disadvantages is that graphics often get shifted from their correct order, sometimes requiring that you look at the original paper (treeware) document to see where they belong. *Set desired spacing. Various spacing issues may need to be fixed as well, these include (but are not limited to): paragraph indentation, spacing between chapters and removing the '-' mark that may have split words in the original treeware version. *Renumbering the Table of Contents (TOC). If your document included a table of contents, you may want to change the page numbers to change your scanned version. 32. Finally you are ready to save your work and release it to the public . You may save as rtf, which is a popular formatted similar to doc save for the fact that it is much more versatile and does not require special software, while at the same time allowing formatting features (unlike pure txt files). The downside is that rtf files are usually a bit larger than doc files. If you wish to save in pdf format you will need to get a pdf printer driver such as Fineprint PDF Factory PRO (check out dizzie.serein.us/serials.htm for tips on finding serial numbers). 33. Also remember that even if you scanned your document using another program, and now just have images of pages, you can easily import them into FR and OCR/format/save as pdf (basically any of the aforementioned steps). To import images go to File > Open Image and select the images you want to import (hold down ctrl or shift and select more than one image). If a pop-up window appears asking about resizing, select Leave Original. The images should now appear in the left-hand Batch menu. Well, I should wrap this up; this guide has gotten a tad bit longer than I intended. As you will have doubtless realized by now, FR is a very powerful tool with a vast array of features. To give a final summary, a basic process of creating an e-document involves: 1) scanning the document and 2) editing/formatting/proofing/saving the document. Obviously everything could not be covered in this guide; if you have a question about something, look through the official FR help file, and if you still can't find an answer feel free to drop me a line. - Comments? Get in touch: xcon0 @t yahoo \/d0t/\ c||o|m (or call +1 (610) 887-6072) For more knowledge check out www.rorta.net and www.dizzy.ws