jump to navigation

Favorite tools: ABBYY FineReader September 24, 2008

Posted by Jill (@bonnjill) in Business practices, Tools, Translation Sites.
trackback

It seems like 80% of my source texts are now in PDF format, so my all-time favorite tool is without a doubt ABBYY FineReader. To quote the ABBYY web site:

ABBYY FineReader® is an intelligent and easy-to-use OCR (Optical Character Recognition) and PDF conversion software that is the optimal alternative to manual data entry and typing. It is an ideal choice for professionals that want to save time and effort while producing excellent results. It provides powerful functionality for converting images received from a scanner, a digital camera or by fax, as well as PDF files, into editable and searchable formats. The program accurately retains formatting and layout of documents and supports a wide range of recognition languages and output file formats.

ABBYY FineReader is very intuitive to use and sometimes even replicates graphics and logos. Users can process documents in 184 languages, including Chinese, Japanese, Thai, Hebrew, Armenian, Cyrillic, Greek, and Latin. ABBYY FineReader also reads pre- and post-reform German orthography, Old German script, scripting languages, and simple chemical formulas. The text recognition software includes dictionaries with spell-checking capabilities for 38 languages allowing verification of recognized text directly in the FineReader Editor. Also, apparently FineReader 9.0 Professional Edition now automatically recognizes the document’s language, which saves you the step of manually selecting the appropriate document recognition language, as well as spreadsheets and tables.

PDF Transformer is scaled-down version of ABBYY FineReader. It is also “a comprehensive PDF conversion and creation tool” that “accurately transforms PDF files into editable formats and creates searchable PDF documents from Microsoft Office applications.” PDF Transformer only costs $99.99 and does basically the same thing (PDF conversion and PDF creation).

One thing I really like about FineReader is that it does not create as many text boxes as, say, OmniPage. FineReader can convert PDFs as well as graphics (such as TIFs or the eFax attachments I receive) into Word files that can be processed with a translation memory tool. Its Check Spelling feature allows me to ensure that the words were recognized properly, and I can correct them before saving the file. I have also been known to do a “down and dirty” OCR without spellchecking just to get a quick and fairly accurate word count estimate with PractiCount (or AnyCount, Total Assistant or whichever counting tool you prefer). You can also play with the save options to find your ideal settings.

I sometimes have to clean up the file by hand, because it has problems with handwritten text and tables (and checkboxes, some bullet lists, etc.). I simply open a clean Word file and copy and paste the text into the new file using the Edit->Paste Special command, which removes all the formatting. I then format the text manually. This also ensures that the margins are consistent and not haphazard, which sometimes happens as well during the conversion. I also find that the program sometimes creates columns when a table would be much more convenient. Again, I simply create the table and then copy and paste the text into it.

Some translators I know feel that OCR and formatting should be the job of the agency or project manager and/or they should be paid extra for converting a file into Word. I disagree, but I am willing to accept alternate opinions in the comments below. 🙂 Some smaller agencies aren’t as savvy and don’t know how to use OCR programs. Converting the file also allows me to quickly translate it using my translation memory tool (although one client specifically does not want me to use Trados with its files for confidentiality reasons, which I respect). I also feel I have more control over the actual document I am going to work with if I actually do the conversion myself. I have seen too many files that were sloppily OCRed and not spellchecked, which then makes the text confusing and this could easily lead to a mistranslation. If I do get a file that has been run through an OCR program I always insist on getting the original as well so that I can compare the two files.

One very important tip I have is that if you are going to order it, try ordering it from a country other than the United States. I bought my copy off eBay.de. The prices displayed to people accessing the Abbyy website from the States differ drastically from the prices shown to people accessing the website from Europe. ABBYY FineReader 9.0 Professional is displayed to European visitors as costing EUR 139/GBP 89 for the download version, while a price of $399.99 is displayed to someone who lives in the States. I bought my version (FineReader 8.0) for €90 ($116) and received the original CD and a manual in the mail, but you can also order a digital version of the software.

Download the 30-day trials of ABBYY FineReader and PDF Transformer to see which one you prefer and shop around for the best price. You’ll find the tools quickly pay for themselves.

Advertisement

Comments»

1. Fabio - September 25, 2008

FineReader is really good. Version 9 is a must for every translator who works with PDF and JPG sources. So much better than Omnipage and everything else. But fortunately I don’t have to use OCR software as often as you do, since my souces are almost always DOC and XLS files.
Also, IMO, PDF/JPG conversion should be done by the agency, unless of course the agency is willing to pay extra for this non-translation work.
On the other hand, just yesterday I got a bunch of DOC files that were PDF-converted by the end client (not the agency), and they had some flaws that required e-mails back and forth and will certainly require more attention from me during translation. I had a feeling that if I had done the conversion myself I would be more comfortable now. 🙂

2. Maxim - September 25, 2008

Jill, I would not recommend the 9.0 version. In the ninth version the developers tweaked something to improve the recognition of low-quality documents. This worsened the recognition of high-quality texts. So, if you don’t work with old illegible papers, I recommend you don’t upgrade your 8.0, as it works perfectly.

Btw, here in Russia version 9 is offered for $150 (3750 rubles).

3. jillsommer - September 25, 2008

Thanks for the tip, Maxim, because I was seriously thinking about upgrading.

4. Mariana Hernández - October 18, 2013

Thanks for the article. I just downloaded the tril version and am converting the document. How does it work with CAT tools? Is it easy to use? I work with MemoQ.

Jill (@bonnjill) - October 20, 2013

I usually take the finished product and then paste it into a fresh Word document without the formatting. Once I have formatted the file in Word I can then import it into MemoQ and translate it. It takes a little bit of effort to format it but I find there is less chance of weird margins and other formatting glitches if I do it myself.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: