Very important discovery about Adobe Acrobat Standard

2007-02-08

I just made two very important discoveries. As much as I loathed buying Acrobat Standard, and as poorly as it ran on my machine using a case sensitive filesystem, it does have a redeeming feature.

It has a built-in OCR engine, which I knew. I hadn’t tried it. I decided to try it on an academic paper that I had in my archive. However, when I loaded the paper (which looked scanned) I was already able to select the text, though I didn’t know why. I was also able to do so in Preview, so it couldn’t have been a feature of Acrobat.

I took another paper that was clearly scanned, and tried to run OCR on it. It didn’t have the selectability that the first one did to start with. However, after OCR… it did.

So, the two important discoveries are that Acrobat will overlay your scanned documents with selectable text information transparently, and that Circulation Research appears to have already done this on their downloadable PDFs from older articles.

This explains why PDFs that I thought were scanned have been showing up in Spotlight searches that pick up their contents.

ADDENDUM: Apparently Acrobat Pro can do this in batch mode. This has major implications for me. I might even consider buying it at some point, once they come out with a universal binary.

3 Comments »

  1. News flash: Acrobat 8 Professional *is* a Universal Binary application; there is no Acrobat 8 Standard version on the Mac.

    Comment posted 2007-02-08 @ 22:30

  2. I looked after writing the post and saw that Acrobat 8 had come out. I wasn’t aware — I’m still using Acrobat Standard 7 with Rosetta.

    Comment posted 2007-02-09 @ 05:06

  3. [...] When I read scientific papers, I mark them up a lot. I underline useful snippets of information, circle reference numbers to check later, write questions in the margins, and so on. Until recently, I was doing this on paper, and then transcribing my notes manually to a FreeMind outline. However, since discovering that I can do all of this and more in Adobe Acrobat, using the built-in OCR scanner to produce text even from scanned PDFs, I’ve gone digital. [...]

    Pingback posted 2007-05-29 @ 20:09

RSS feed for comments on this post. TrackBack URL

Leave a comment