How to remove OCR from a PDF?












11















I have been searching Google for some time but cannot find an answer to my question.



I have unwanted layers of OCR in a document that I recently scanned with Adobe Acrobat. It has not been OCRed properly, and I want to redact some information, but the OCR is making the wanted information to get erased. I converted the files to TIFs, but noticed a (very) significant quality loss. I have heard that printing to another PDF either keeps the text or reduces the image quality.



I appreciate any help in solving this issue ASAP.



Thank You.










share|improve this question





























    11















    I have been searching Google for some time but cannot find an answer to my question.



    I have unwanted layers of OCR in a document that I recently scanned with Adobe Acrobat. It has not been OCRed properly, and I want to redact some information, but the OCR is making the wanted information to get erased. I converted the files to TIFs, but noticed a (very) significant quality loss. I have heard that printing to another PDF either keeps the text or reduces the image quality.



    I appreciate any help in solving this issue ASAP.



    Thank You.










    share|improve this question



























      11












      11








      11


      5






      I have been searching Google for some time but cannot find an answer to my question.



      I have unwanted layers of OCR in a document that I recently scanned with Adobe Acrobat. It has not been OCRed properly, and I want to redact some information, but the OCR is making the wanted information to get erased. I converted the files to TIFs, but noticed a (very) significant quality loss. I have heard that printing to another PDF either keeps the text or reduces the image quality.



      I appreciate any help in solving this issue ASAP.



      Thank You.










      share|improve this question
















      I have been searching Google for some time but cannot find an answer to my question.



      I have unwanted layers of OCR in a document that I recently scanned with Adobe Acrobat. It has not been OCRed properly, and I want to redact some information, but the OCR is making the wanted information to get erased. I converted the files to TIFs, but noticed a (very) significant quality loss. I have heard that printing to another PDF either keeps the text or reduces the image quality.



      I appreciate any help in solving this issue ASAP.



      Thank You.







      pdf adobe-acrobat ocr tif






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Oct 12 '14 at 15:00







      Sanoo

















      asked Oct 11 '14 at 6:32









      SanooSanoo

      1282521




      1282521






















          6 Answers
          6






          active

          oldest

          votes


















          3














          In Acrobat Pro DC, the appropriate command is "Remove Hidden Information," which is available through both the "Protect" and "Redact" tools.



          On running the command, it just searches out the hidden information but does not change the document. You must then tell Acrobat which information to remove. In this case, select "Hidden Text" in the Results pane, then click the Remove button and save the changed document.






          share|improve this answer


























          • I have used the "remove hidden information", but for me for some reason that just removes parts of the image on certain pages. Thanks for your reply however.

            – Sanoo
            Apr 11 '17 at 4:20











          • This is not universally true. Somehow (probably macOS PDFKit bugs) my ABBYY FineReader-OCRed text got corrupted, and checking "Hidden text" under Redact → Remove Hidden did remove the text without any issues; I was then able to successfully use Enhance Scans → Recognize Text to perform OCR within Acrobat itself.

            – Nicholas Riley
            Jan 21 '18 at 20:16











          • The problem for me is that after I remove the hidden text, I'm still not able to run an OCR with "ClearScan" (i.e. "Editable Text and Images"). It's strange because the text layer appears to be gone, yet running OCR produces the error "Acrobat could not perform recognition because: page contains renderable text."

            – user1125483
            Sep 18 '18 at 10:38



















          1














          After a lot of experimenting, I found that printing to Adobe PDF from Adobe Acrobat prints the document without the OCR and without losing the quality (an unnoticeable at first glance resolution is lost).



          However, many sites claim that this does not work. I also tried the other printers such as Foxit Reader and OneNote but the quality was reduced. JPEG too was the same.



          Please keep in mind that your mileage may vary.



          Note: I am leaving this thread marked as unanswered in hope of finding a better answer than mine.






          share|improve this answer

































            1














            In Acrobat Pro: use 'remove hidden information' (under 'protection'). Select all, execute, OCR is gone






            share|improve this answer































              1














              In Acrobat X, under Protection, there is a Sanitize Document button that removes EVERYTHING but what can be seen (including OCR'd text layer), converting the document to a flattened bit map.






              share|improve this answer

































                0














                (one year ago...)



                If, as you say, the documents are scanned and not printed to PDF from Word for example, you can easily remove with your Adobe:



                Select Document, Examine Document and now you can remove the hidden text (OCR).






                share|improve this answer
























                • Thanks for your reply. I'll test it out as soon as I can and let you know. Thanks for the answer!

                  – Sanoo
                  Feb 19 '16 at 14:31











                • I thought I already commented on this, but the problem is that I have Acrobat DC Pro, and those menus have been removed. Thanks for your answer anyway.

                  – Sanoo
                  Jul 17 '16 at 7:43



















                0














                I built a tool to do this free PDF Redactor. If you upload the image and just click redact it'll flatten your pdf and remove OCR. If you want you can also draw redaction marks on the document as well.






                share|improve this answer

























                  Your Answer








                  StackExchange.ready(function() {
                  var channelOptions = {
                  tags: "".split(" "),
                  id: "3"
                  };
                  initTagRenderer("".split(" "), "".split(" "), channelOptions);

                  StackExchange.using("externalEditor", function() {
                  // Have to fire editor after snippets, if snippets enabled
                  if (StackExchange.settings.snippets.snippetsEnabled) {
                  StackExchange.using("snippets", function() {
                  createEditor();
                  });
                  }
                  else {
                  createEditor();
                  }
                  });

                  function createEditor() {
                  StackExchange.prepareEditor({
                  heartbeatType: 'answer',
                  autoActivateHeartbeat: false,
                  convertImagesToLinks: true,
                  noModals: true,
                  showLowRepImageUploadWarning: true,
                  reputationToPostImages: 10,
                  bindNavPrevention: true,
                  postfix: "",
                  imageUploader: {
                  brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                  contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                  allowUrls: true
                  },
                  onDemand: true,
                  discardSelector: ".discard-answer"
                  ,immediatelyShowMarkdownHelp:true
                  });


                  }
                  });














                  draft saved

                  draft discarded


















                  StackExchange.ready(
                  function () {
                  StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f823808%2fhow-to-remove-ocr-from-a-pdf%23new-answer', 'question_page');
                  }
                  );

                  Post as a guest















                  Required, but never shown

























                  6 Answers
                  6






                  active

                  oldest

                  votes








                  6 Answers
                  6






                  active

                  oldest

                  votes









                  active

                  oldest

                  votes






                  active

                  oldest

                  votes









                  3














                  In Acrobat Pro DC, the appropriate command is "Remove Hidden Information," which is available through both the "Protect" and "Redact" tools.



                  On running the command, it just searches out the hidden information but does not change the document. You must then tell Acrobat which information to remove. In this case, select "Hidden Text" in the Results pane, then click the Remove button and save the changed document.






                  share|improve this answer


























                  • I have used the "remove hidden information", but for me for some reason that just removes parts of the image on certain pages. Thanks for your reply however.

                    – Sanoo
                    Apr 11 '17 at 4:20











                  • This is not universally true. Somehow (probably macOS PDFKit bugs) my ABBYY FineReader-OCRed text got corrupted, and checking "Hidden text" under Redact → Remove Hidden did remove the text without any issues; I was then able to successfully use Enhance Scans → Recognize Text to perform OCR within Acrobat itself.

                    – Nicholas Riley
                    Jan 21 '18 at 20:16











                  • The problem for me is that after I remove the hidden text, I'm still not able to run an OCR with "ClearScan" (i.e. "Editable Text and Images"). It's strange because the text layer appears to be gone, yet running OCR produces the error "Acrobat could not perform recognition because: page contains renderable text."

                    – user1125483
                    Sep 18 '18 at 10:38
















                  3














                  In Acrobat Pro DC, the appropriate command is "Remove Hidden Information," which is available through both the "Protect" and "Redact" tools.



                  On running the command, it just searches out the hidden information but does not change the document. You must then tell Acrobat which information to remove. In this case, select "Hidden Text" in the Results pane, then click the Remove button and save the changed document.






                  share|improve this answer


























                  • I have used the "remove hidden information", but for me for some reason that just removes parts of the image on certain pages. Thanks for your reply however.

                    – Sanoo
                    Apr 11 '17 at 4:20











                  • This is not universally true. Somehow (probably macOS PDFKit bugs) my ABBYY FineReader-OCRed text got corrupted, and checking "Hidden text" under Redact → Remove Hidden did remove the text without any issues; I was then able to successfully use Enhance Scans → Recognize Text to perform OCR within Acrobat itself.

                    – Nicholas Riley
                    Jan 21 '18 at 20:16











                  • The problem for me is that after I remove the hidden text, I'm still not able to run an OCR with "ClearScan" (i.e. "Editable Text and Images"). It's strange because the text layer appears to be gone, yet running OCR produces the error "Acrobat could not perform recognition because: page contains renderable text."

                    – user1125483
                    Sep 18 '18 at 10:38














                  3












                  3








                  3







                  In Acrobat Pro DC, the appropriate command is "Remove Hidden Information," which is available through both the "Protect" and "Redact" tools.



                  On running the command, it just searches out the hidden information but does not change the document. You must then tell Acrobat which information to remove. In this case, select "Hidden Text" in the Results pane, then click the Remove button and save the changed document.






                  share|improve this answer















                  In Acrobat Pro DC, the appropriate command is "Remove Hidden Information," which is available through both the "Protect" and "Redact" tools.



                  On running the command, it just searches out the hidden information but does not change the document. You must then tell Acrobat which information to remove. In this case, select "Hidden Text" in the Results pane, then click the Remove button and save the changed document.







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Sep 22 '17 at 1:06









                  Warren Young

                  2,25711424




                  2,25711424










                  answered Apr 11 '17 at 4:11









                  user1125483user1125483

                  1313




                  1313













                  • I have used the "remove hidden information", but for me for some reason that just removes parts of the image on certain pages. Thanks for your reply however.

                    – Sanoo
                    Apr 11 '17 at 4:20











                  • This is not universally true. Somehow (probably macOS PDFKit bugs) my ABBYY FineReader-OCRed text got corrupted, and checking "Hidden text" under Redact → Remove Hidden did remove the text without any issues; I was then able to successfully use Enhance Scans → Recognize Text to perform OCR within Acrobat itself.

                    – Nicholas Riley
                    Jan 21 '18 at 20:16











                  • The problem for me is that after I remove the hidden text, I'm still not able to run an OCR with "ClearScan" (i.e. "Editable Text and Images"). It's strange because the text layer appears to be gone, yet running OCR produces the error "Acrobat could not perform recognition because: page contains renderable text."

                    – user1125483
                    Sep 18 '18 at 10:38



















                  • I have used the "remove hidden information", but for me for some reason that just removes parts of the image on certain pages. Thanks for your reply however.

                    – Sanoo
                    Apr 11 '17 at 4:20











                  • This is not universally true. Somehow (probably macOS PDFKit bugs) my ABBYY FineReader-OCRed text got corrupted, and checking "Hidden text" under Redact → Remove Hidden did remove the text without any issues; I was then able to successfully use Enhance Scans → Recognize Text to perform OCR within Acrobat itself.

                    – Nicholas Riley
                    Jan 21 '18 at 20:16











                  • The problem for me is that after I remove the hidden text, I'm still not able to run an OCR with "ClearScan" (i.e. "Editable Text and Images"). It's strange because the text layer appears to be gone, yet running OCR produces the error "Acrobat could not perform recognition because: page contains renderable text."

                    – user1125483
                    Sep 18 '18 at 10:38

















                  I have used the "remove hidden information", but for me for some reason that just removes parts of the image on certain pages. Thanks for your reply however.

                  – Sanoo
                  Apr 11 '17 at 4:20





                  I have used the "remove hidden information", but for me for some reason that just removes parts of the image on certain pages. Thanks for your reply however.

                  – Sanoo
                  Apr 11 '17 at 4:20













                  This is not universally true. Somehow (probably macOS PDFKit bugs) my ABBYY FineReader-OCRed text got corrupted, and checking "Hidden text" under Redact → Remove Hidden did remove the text without any issues; I was then able to successfully use Enhance Scans → Recognize Text to perform OCR within Acrobat itself.

                  – Nicholas Riley
                  Jan 21 '18 at 20:16





                  This is not universally true. Somehow (probably macOS PDFKit bugs) my ABBYY FineReader-OCRed text got corrupted, and checking "Hidden text" under Redact → Remove Hidden did remove the text without any issues; I was then able to successfully use Enhance Scans → Recognize Text to perform OCR within Acrobat itself.

                  – Nicholas Riley
                  Jan 21 '18 at 20:16













                  The problem for me is that after I remove the hidden text, I'm still not able to run an OCR with "ClearScan" (i.e. "Editable Text and Images"). It's strange because the text layer appears to be gone, yet running OCR produces the error "Acrobat could not perform recognition because: page contains renderable text."

                  – user1125483
                  Sep 18 '18 at 10:38





                  The problem for me is that after I remove the hidden text, I'm still not able to run an OCR with "ClearScan" (i.e. "Editable Text and Images"). It's strange because the text layer appears to be gone, yet running OCR produces the error "Acrobat could not perform recognition because: page contains renderable text."

                  – user1125483
                  Sep 18 '18 at 10:38













                  1














                  After a lot of experimenting, I found that printing to Adobe PDF from Adobe Acrobat prints the document without the OCR and without losing the quality (an unnoticeable at first glance resolution is lost).



                  However, many sites claim that this does not work. I also tried the other printers such as Foxit Reader and OneNote but the quality was reduced. JPEG too was the same.



                  Please keep in mind that your mileage may vary.



                  Note: I am leaving this thread marked as unanswered in hope of finding a better answer than mine.






                  share|improve this answer






























                    1














                    After a lot of experimenting, I found that printing to Adobe PDF from Adobe Acrobat prints the document without the OCR and without losing the quality (an unnoticeable at first glance resolution is lost).



                    However, many sites claim that this does not work. I also tried the other printers such as Foxit Reader and OneNote but the quality was reduced. JPEG too was the same.



                    Please keep in mind that your mileage may vary.



                    Note: I am leaving this thread marked as unanswered in hope of finding a better answer than mine.






                    share|improve this answer




























                      1












                      1








                      1







                      After a lot of experimenting, I found that printing to Adobe PDF from Adobe Acrobat prints the document without the OCR and without losing the quality (an unnoticeable at first glance resolution is lost).



                      However, many sites claim that this does not work. I also tried the other printers such as Foxit Reader and OneNote but the quality was reduced. JPEG too was the same.



                      Please keep in mind that your mileage may vary.



                      Note: I am leaving this thread marked as unanswered in hope of finding a better answer than mine.






                      share|improve this answer















                      After a lot of experimenting, I found that printing to Adobe PDF from Adobe Acrobat prints the document without the OCR and without losing the quality (an unnoticeable at first glance resolution is lost).



                      However, many sites claim that this does not work. I also tried the other printers such as Foxit Reader and OneNote but the quality was reduced. JPEG too was the same.



                      Please keep in mind that your mileage may vary.



                      Note: I am leaving this thread marked as unanswered in hope of finding a better answer than mine.







                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited Oct 13 '14 at 7:53

























                      answered Oct 13 '14 at 6:06









                      SanooSanoo

                      1282521




                      1282521























                          1














                          In Acrobat Pro: use 'remove hidden information' (under 'protection'). Select all, execute, OCR is gone






                          share|improve this answer




























                            1














                            In Acrobat Pro: use 'remove hidden information' (under 'protection'). Select all, execute, OCR is gone






                            share|improve this answer


























                              1












                              1








                              1







                              In Acrobat Pro: use 'remove hidden information' (under 'protection'). Select all, execute, OCR is gone






                              share|improve this answer













                              In Acrobat Pro: use 'remove hidden information' (under 'protection'). Select all, execute, OCR is gone







                              share|improve this answer












                              share|improve this answer



                              share|improve this answer










                              answered Oct 20 '16 at 15:55









                              jazzzzjazzzz

                              111




                              111























                                  1














                                  In Acrobat X, under Protection, there is a Sanitize Document button that removes EVERYTHING but what can be seen (including OCR'd text layer), converting the document to a flattened bit map.






                                  share|improve this answer






























                                    1














                                    In Acrobat X, under Protection, there is a Sanitize Document button that removes EVERYTHING but what can be seen (including OCR'd text layer), converting the document to a flattened bit map.






                                    share|improve this answer




























                                      1












                                      1








                                      1







                                      In Acrobat X, under Protection, there is a Sanitize Document button that removes EVERYTHING but what can be seen (including OCR'd text layer), converting the document to a flattened bit map.






                                      share|improve this answer















                                      In Acrobat X, under Protection, there is a Sanitize Document button that removes EVERYTHING but what can be seen (including OCR'd text layer), converting the document to a flattened bit map.







                                      share|improve this answer














                                      share|improve this answer



                                      share|improve this answer








                                      edited Jan 30 '18 at 16:51









                                      darthbith

                                      340215




                                      340215










                                      answered Dec 14 '17 at 8:49









                                      DaveDave

                                      111




                                      111























                                          0














                                          (one year ago...)



                                          If, as you say, the documents are scanned and not printed to PDF from Word for example, you can easily remove with your Adobe:



                                          Select Document, Examine Document and now you can remove the hidden text (OCR).






                                          share|improve this answer
























                                          • Thanks for your reply. I'll test it out as soon as I can and let you know. Thanks for the answer!

                                            – Sanoo
                                            Feb 19 '16 at 14:31











                                          • I thought I already commented on this, but the problem is that I have Acrobat DC Pro, and those menus have been removed. Thanks for your answer anyway.

                                            – Sanoo
                                            Jul 17 '16 at 7:43
















                                          0














                                          (one year ago...)



                                          If, as you say, the documents are scanned and not printed to PDF from Word for example, you can easily remove with your Adobe:



                                          Select Document, Examine Document and now you can remove the hidden text (OCR).






                                          share|improve this answer
























                                          • Thanks for your reply. I'll test it out as soon as I can and let you know. Thanks for the answer!

                                            – Sanoo
                                            Feb 19 '16 at 14:31











                                          • I thought I already commented on this, but the problem is that I have Acrobat DC Pro, and those menus have been removed. Thanks for your answer anyway.

                                            – Sanoo
                                            Jul 17 '16 at 7:43














                                          0












                                          0








                                          0







                                          (one year ago...)



                                          If, as you say, the documents are scanned and not printed to PDF from Word for example, you can easily remove with your Adobe:



                                          Select Document, Examine Document and now you can remove the hidden text (OCR).






                                          share|improve this answer













                                          (one year ago...)



                                          If, as you say, the documents are scanned and not printed to PDF from Word for example, you can easily remove with your Adobe:



                                          Select Document, Examine Document and now you can remove the hidden text (OCR).







                                          share|improve this answer












                                          share|improve this answer



                                          share|improve this answer










                                          answered Dec 10 '15 at 10:50









                                          FranFran

                                          1




                                          1













                                          • Thanks for your reply. I'll test it out as soon as I can and let you know. Thanks for the answer!

                                            – Sanoo
                                            Feb 19 '16 at 14:31











                                          • I thought I already commented on this, but the problem is that I have Acrobat DC Pro, and those menus have been removed. Thanks for your answer anyway.

                                            – Sanoo
                                            Jul 17 '16 at 7:43



















                                          • Thanks for your reply. I'll test it out as soon as I can and let you know. Thanks for the answer!

                                            – Sanoo
                                            Feb 19 '16 at 14:31











                                          • I thought I already commented on this, but the problem is that I have Acrobat DC Pro, and those menus have been removed. Thanks for your answer anyway.

                                            – Sanoo
                                            Jul 17 '16 at 7:43

















                                          Thanks for your reply. I'll test it out as soon as I can and let you know. Thanks for the answer!

                                          – Sanoo
                                          Feb 19 '16 at 14:31





                                          Thanks for your reply. I'll test it out as soon as I can and let you know. Thanks for the answer!

                                          – Sanoo
                                          Feb 19 '16 at 14:31













                                          I thought I already commented on this, but the problem is that I have Acrobat DC Pro, and those menus have been removed. Thanks for your answer anyway.

                                          – Sanoo
                                          Jul 17 '16 at 7:43





                                          I thought I already commented on this, but the problem is that I have Acrobat DC Pro, and those menus have been removed. Thanks for your answer anyway.

                                          – Sanoo
                                          Jul 17 '16 at 7:43











                                          0














                                          I built a tool to do this free PDF Redactor. If you upload the image and just click redact it'll flatten your pdf and remove OCR. If you want you can also draw redaction marks on the document as well.






                                          share|improve this answer






























                                            0














                                            I built a tool to do this free PDF Redactor. If you upload the image and just click redact it'll flatten your pdf and remove OCR. If you want you can also draw redaction marks on the document as well.






                                            share|improve this answer




























                                              0












                                              0








                                              0







                                              I built a tool to do this free PDF Redactor. If you upload the image and just click redact it'll flatten your pdf and remove OCR. If you want you can also draw redaction marks on the document as well.






                                              share|improve this answer















                                              I built a tool to do this free PDF Redactor. If you upload the image and just click redact it'll flatten your pdf and remove OCR. If you want you can also draw redaction marks on the document as well.







                                              share|improve this answer














                                              share|improve this answer



                                              share|improve this answer








                                              edited Jan 31 at 8:19

























                                              answered Jan 31 at 7:31









                                              levinologylevinology

                                              1113




                                              1113






























                                                  draft saved

                                                  draft discarded




















































                                                  Thanks for contributing an answer to Super User!


                                                  • Please be sure to answer the question. Provide details and share your research!

                                                  But avoid



                                                  • Asking for help, clarification, or responding to other answers.

                                                  • Making statements based on opinion; back them up with references or personal experience.


                                                  To learn more, see our tips on writing great answers.




                                                  draft saved


                                                  draft discarded














                                                  StackExchange.ready(
                                                  function () {
                                                  StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f823808%2fhow-to-remove-ocr-from-a-pdf%23new-answer', 'question_page');
                                                  }
                                                  );

                                                  Post as a guest















                                                  Required, but never shown





















































                                                  Required, but never shown














                                                  Required, but never shown












                                                  Required, but never shown







                                                  Required, but never shown

































                                                  Required, but never shown














                                                  Required, but never shown












                                                  Required, but never shown







                                                  Required, but never shown







                                                  Popular posts from this blog

                                                  flock() on closed filehandle LOCK_FILE at /usr/bin/apt-mirror

                                                  Mangá

                                                  Eduardo VII do Reino Unido