Convert Paper to Digital Files
Introduction to Optical Character Recognition
(OCR)
Here is a very brief overview of OCR. This
is not by any means complete, nor does it
touch on more powerful capabilities of expensive
software packages. But if you've ever wondered
if OCR could be useful for you, read on.
With modern tools, it is possible to scan
paper documents and convert them into text
files for your computer faster and cheaper
than having them retyped by hand. In theory
this is easy to do. In actual practice, the
details and process make it difficult without
the right tools.
The right tools include a fast scanner with
an automatic feed, good software to scan
and store the images, and quality OCR software
to convert the images of characters into
text files accurately. Having all that, you
still need someone to handle the paper, process
the data and store the text files with proper
naming so the digital files can be found
when needed. It's not quite as simple as
it first appears.
When you add setup and support costs plus
overhead to the equipment and space costs,
this outwardly simple process becomes expensive.
It is only companies who need to process
a lot of paper on a regular basis that can
afford to bring this process inhouse. Large
banks, mortgage companies and other financial
organizations are the biggest inhouse users.
Almost everyone else uses independent service
organizations.
The reasons above for using outside service
are even stronger for files which contain
graphics material. These can be scanned and
saved on a digital medium. A CD or DVD with
an index can become a long term archive of
old paper files, saving space and enabling
backup copies to be easily made. This process
requires more expensive software and takes
much more storage space. It is only worthwhile
if the graphics are essential to save with
the other materials.
Scanning Paper to Text via OCR
Here are some guidelines for Paper to Text
File Processing:
- Paper must be clean and characters must be
clear. If this requirement is not met,
the
OCR accuracy degrades as the source gets
worse.
- Best results are with 10 point or larger
Times Roman, but other common type styles
do almost as well. Light color stains do
not degrade the OCR if the contrast is still
good.
- If the type style changes noticeably, it
usually must be processed as a separate group.
- Pages printed with weak coverage (all characters
are not dark) should be copied first with
the contrast turned up to make dark characters.
Light or mixed shade text does not convert
because the OCR cannot find the edges of
the characters.
- Graphics or illustrations on the text page
causes problems with OCR. This is not a problem
for image only scanning to CD or DVD, but
OCR is easily confused. Either test a batch
or make copies of the text only and scan
the copies.
- Folds, creases, and tears make the pages
impossible to feed automatically. In this
case, make clean copies (20 pound copy paper
is fine) and scan the copies. This also solves
the problems caused by staples or paper clips.
The paper doesn't have to be perfect, but
it must be free of creases and folds.
- Paper is usually processed in chunks of 50
or fewer pages. This does not limit the size
of individual files, which may be much larger.
You should separate the paper for each file
into a separate folder or use different color
separation pages.
- Paper clips or staples will crease the paper
and could cause feed jams. Chunks delivered
in that form cause extra work, and accuracy
may be impacted.
- Each chunk must have a name that is meaningful
to you, a relevant date, and optionally,
a description for CD image files.
- OCR (Optical Character Recognition) is capable
of 98%-99+% accuracy on clean input. Accuracy
will depend on the issues listed above and
the quality of printing plus the legibility
of the character set. If you can't easily
tell the O and the 0 apart, neither can the
computer.
- In some cases, an adequate OCR copy can be
made from a fax. It is also possible to send
adequate OCR copy via fax using the high
quality setting. This is only useful for
small quantities or samples. OCR quality
of faxes rarely matches that from a clean
paper copy.
- In case you are considering something printed
on a dot matrix, the only useful OCR conversions
are limited to 24 pin dot matrix, with a
good ribbon, using letter quality printout.
In this case, making a copy of the original
may increase the accuracy of conversion by
darkening and filling in the letters. A test
sample is recommended.
BW Services Paper to Text Pricing
BWS is now offering a conversion service
for paper to text files. The following items
outline the services offered and pricing.
BWS will be happy to quote on special requirements.
- Paper to be converted to text files must
be delivered to the scanning site at customer's
expense. Paper will be returned if desired,
also at customer's expense. Processing
will
be in order of receipt unless expedited
service
is paid for. Normal turn around for less
than 1000 pages is two to five days.
- Normal $0.20 per page
- Expedited $0.25 per page
- Fax to OCR $0.30 per page
- Files on CDROM $10.00 per CD
- Surcharge for paper clips and staples: 20%
- Surcharge for legal size paper: 25%
- Large quantities (over 1000 pages) can qualify
for a discount. Call for a specific quote.
Special requirements are possible - please
call for an estimate.
- Paper to be scanned must be separate pages
(unbound) when delivered. Two sided pages
are processed at the one page price per side.
- Full payment must accompany the items to
be processed. If you want the paper to be
returned, you must add the same $ amount
as the shipping cost to the payment. Paper
files will be returned the same way as shipped.
Paper not returned will be recycled without
shredding. Other arrangements can only be
made in advance.
- Files up to 1.4 MB in size can be delivered
on floppy disk using the standard FAT format
with 8 character names at no extra charge.
Files may also be transferred attached to
email, with zip password encryption if desired.
Files are also available on CD (Joilet Format)
with long names at extra cost.
- In case of problems, we need a daytime phone
contact number and person who can help us
resolve anything that comes up.
- If there is a question of conversion results,
it is best to send a representative sample,
between 10 and 20 pages of each typestyle
to be tested. Based on those results, you
can judge if a full conversion is worthwhile.
Test fees can be credited against the full
run.
Contact BW Services
- voice: (360) 458-9851
- email: bwserve | at | ywave.com (all lower
case)
Shipping Address
- Post Office Only: PO Box 2107
- By other carrier: 17242 Carwilliam Lane
- Both: Yelm, WA 98597