Index

Convert Paper to Digital Files

Introduction to Optical Character Recognition (OCR)

Here is a very brief overview of OCR. This is not by any means complete, nor does it touch on more powerful capabilities of expensive software packages. But if you've ever wondered if OCR could be useful for you, read on.

With modern tools, it is possible to scan paper documents and convert them into text files for your computer faster and cheaper than having them retyped by hand. In theory this is easy to do. In actual practice, the details and process make it difficult without the right tools.

The right tools include a fast scanner with an automatic feed, good software to scan and store the images, and quality OCR software to convert the images of characters into text files accurately. Having all that, you still need someone to handle the paper, process the data and store the text files with proper naming so the digital files can be found when needed. It's not quite as simple as it first appears.

When you add setup and support costs plus overhead to the equipment and space costs, this outwardly simple process becomes expensive. It is only companies who need to process a lot of paper on a regular basis that can afford to bring this process inhouse. Large banks, mortgage companies and other financial organizations are the biggest inhouse users. Almost everyone else uses independent service organizations.

The reasons above for using outside service are even stronger for files which contain graphics material. These can be scanned and saved on a digital medium. A CD or DVD with an index can become a long term archive of old paper files, saving space and enabling backup copies to be easily made. This process requires more expensive software and takes much more storage space. It is only worthwhile if the graphics are essential to save with the other materials.

Scanning Paper to Text via OCR

Here are some guidelines for Paper to Text File Processing:

  1. Paper must be clean and characters must be clear. If this requirement is not met, the OCR accuracy degrades as the source gets worse.
  2. Folds, creases, and tears make the pages impossible to feed automatically. In this case, make clean copies (20 pound copy paper is fine) and scan the copies. This also solves the problems caused by staples or paper clips. The paper doesn't have to be perfect, but it must be free of creases and folds.
  3. Paper is usually processed in chunks of 50 or fewer pages. This does not limit the size of individual files, which may be much larger. You should separate the paper for each file into a separate folder or use different color separation pages.
  4. Paper clips or staples will crease the paper and could cause feed jams. Chunks delivered in that form cause extra work, and accuracy may be impacted.
  5. Each chunk must have a name that is meaningful to you, a relevant date, and optionally, a description for CD image files.
  6. OCR (Optical Character Recognition) is capable of 98%-99+% accuracy on clean input. Accuracy will depend on the issues listed above and the quality of printing plus the legibility of the character set. If you can't easily tell the O and the 0 apart, neither can the computer.
  7. In some cases, an adequate OCR copy can be made from a fax. It is also possible to send adequate OCR copy via fax using the high quality setting. This is only useful for small quantities or samples. OCR quality of faxes rarely matches that from a clean paper copy.
  8. In case you are considering something printed on a dot matrix, the only useful OCR conversions are limited to 24 pin dot matrix, with a good ribbon, using letter quality printout. In this case, making a copy of the original may increase the accuracy of conversion by darkening and filling in the letters. A test sample is recommended.

BW Services Paper to Text Pricing

BWS is now offering a conversion service for paper to text files. The following items outline the services offered and pricing. BWS will be happy to quote on special requirements.

  1. Paper to be converted to text files must be delivered to the scanning site at customer's expense. Paper will be returned if desired, also at customer's expense. Processing will be in order of receipt unless expedited service is paid for. Normal turn around for less than 1000 pages is two to five days.
  2. Large quantities (over 1000 pages) can qualify for a discount. Call for a specific quote. Special requirements are possible - please call for an estimate.
  3. Paper to be scanned must be separate pages (unbound) when delivered. Two sided pages are processed at the one page price per side.
  4. Full payment must accompany the items to be processed. If you want the paper to be returned, you must add the same $ amount as the shipping cost to the payment. Paper files will be returned the same way as shipped. Paper not returned will be recycled without shredding. Other arrangements can only be made in advance.
  5. Files up to 1.4 MB in size can be delivered on floppy disk using the standard FAT format with 8 character names at no extra charge. Files may also be transferred attached to email, with zip password encryption if desired. Files are also available on CD (Joilet Format) with long names at extra cost.
  6. In case of problems, we need a daytime phone contact number and person who can help us resolve anything that comes up.
  7. If there is a question of conversion results, it is best to send a representative sample, between 10 and 20 pages of each typestyle to be tested. Based on those results, you can judge if a full conversion is worthwhile. Test fees can be credited against the full run.

Contact BW Services

Shipping Address