LEADTOOLS入門教程:Leadtools .NET OCR用法
LEADTOOLS OCR功能提供了將光學(xué)字符識(shí)別(OCR)技術(shù)融合到應(yīng)用程序中的方法。OCR可將位圖圖像轉(zhuǎn)換為文本。
一旦在系統(tǒng)中安裝LEADTOOLS .NET OCR工具包,用戶便可以在程序中使用LEADTOOLS OCR。需要注意的是,在用戶使用OCR屬性,方法和事件之前,必須對(duì)OCR功能解鎖。
用戶可以添加引用到Leadtools.Forms.Ocr.dll和 Leadtools.Forms.DocumentWriter.dll組件從而啟動(dòng)LEADTOOLS for .NET OCR。這些組件包含了各種接口、類、結(jié)構(gòu)和委托。
由于LEADTOOLS OCR工具包支持多個(gè)引擎,一旦創(chuàng)建了IOcrEngine接口實(shí)例,與引擎接口的實(shí)際代碼便被存儲(chǔ)在一個(gè)被動(dòng)態(tài)加載的單獨(dú)程序集中。因此,你必須確保即將使用的引擎程序集位于旁邊的Leadtools.Forms.Ocr.dll組件。如果你需要自動(dòng)檢測(cè)依賴關(guān)系,你可以將引擎程序集作為引用添加到程序中。
LEADTOOLS提供了實(shí)現(xiàn)下列功能的方法:
- 從各種文字、文字處理、數(shù)據(jù)庫或者電子表格文檔中識(shí)別和導(dǎo)出文本;
- 在單線程或者多線程環(huán)境下執(zhí)行OCR處理;
- 選擇需要識(shí)別的文檔語言,如英語,丹麥語,荷蘭語,芬蘭語,法語,德語,意大利語,挪威語,葡萄牙語,俄語,西班牙語或瑞典語;
- 自動(dòng)或手動(dòng)將復(fù)雜頁面劃分為文本區(qū),圖像區(qū),表格區(qū),線,頁眉和頁腳;
- 識(shí)別前,設(shè)置精度閾值以控制識(shí)別精度;
- 自動(dòng)檢測(cè)傳真,點(diǎn)陣和其他degraded文檔;
- 支持多種文檔保存格式,如Adobe PDF、 PDF/A, MS Word, MS Excel和UNICODE文本等等。
- 處理文本和圖形。
LEADTOOLS通過OCR手柄與OCR引擎和包含的頁面列表的OCR文檔進(jìn)行交互。OCR手柄是安裝在系統(tǒng)上的LEADTOOLS OCR和OCR引擎之間的通信會(huì)話。OCR手柄是一種內(nèi)部結(jié)構(gòu),包含了識(shí)別、獲取信息、設(shè)置信息和文本驗(yàn)證的所有必要信息。
識(shí)別單頁或多頁的步驟如下:
1、選擇所需引擎類型并創(chuàng)建IOcrEngine接口實(shí)例;
2、利用 IOcrEngine.Startup方法啟動(dòng)OCR引擎;
3、創(chuàng)建單頁或多頁OCR文檔;
4、手動(dòng)或自動(dòng)創(chuàng)建頁面區(qū)域;
5、設(shè)置OCR引擎所需的活動(dòng)語言;
6、設(shè)置拼寫檢查語言;
7、識(shí)別;
8、保存識(shí)別結(jié)果;
9、關(guān)閉OCR引擎。
步驟4,5,6和7可以不必依照順序進(jìn)行,只要在OCR引擎啟動(dòng)后和頁面識(shí)別之間執(zhí)行這幾個(gè)步驟即可。
下面的示例展示了如何執(zhí)行上述步驟:
Visual Basic
' Assuming you added "Imports Leadtools.Forms.Ocr" and "Imports Leadtools.Forms.DocumentWriter" at the beginning of this class ' *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. ' We will use the LEADTOOLS OCR Plus engine and use it in the same process Dim ocrEngine As IOcrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Plus, False) ' *** Step 2: Startup the engine. ' Use the default parameters ocrEngine.Startup(Nothing, Nothing, Nothing, "C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime") ' *** Step 3: Create an OCR document with one or more pages. Dim ocrDocument As IOcrDocument = ocrEngine.DocumentManager.CreateDocument() ' Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages("C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, Nothing) ' *** Step 4: Establish zones on the page(s), either manually or automatically ' Automatic zoning ocrDocument.Pages.AutoZone(Nothing) ' *** Step 5: (Optional) Set the active languages to be used by the OCR engine ' Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(New String() {"en", "de"}) ' *** Step 6: (Optional) Set the spell checking language ' Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native ocrEngine.SpellCheckManager.SpellLanguage = "en" ' *** Step 7: (Optional) Set any special recognition module options ' Change the fill method for the first zone in the first page to be Omr Dim ocrZone As OcrZone = ocrDocument.Pages(0).Zones(0) ocrZone.FillMethod = OcrZoneFillMethod.Omr ocrDocument.Pages(0).Zones(0) = ocrZone ' *** Step 8: Recognize ocrDocument.Pages.Recognize(Nothing) ' *** Step 9: Save recognition results ' Save the results to a PDF file ocrDocument.Save("C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, Nothing) ocrDocument.Dispose() ' *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown() ocrEngine.Dispose()
C#
// Assuming you added "using Leadtools.Codecs;", "using Leadtools.Forms.Ocr;" and "using Leadtools.Forms.DocumentWriters;" at the beginning of this class // *** Step 1: Select the engine type and create an instance of the IOcrEngine interface. // We will use the LEADTOOLS OCR Plus engine and use it in the same process IOcrEngine ocrEngine = OcrEngineManager.CreateEngine(OcrEngineType.Advantage, false); // *** Step 2: Startup the engine. // Use the default parameters ocrEngine.Startup(null, null, null, @"C:\LEADTOOLS 18\Bin\Common\OcrAdvantageRuntime"); // *** Step 3: Create an OCR document with one or more pages. IOcrDocument ocrDocument = ocrEngine.DocumentManager.CreateDocument(); // Add all the pages of a multi-page TIF image to the document ocrDocument.Pages.AddPages(@"C:\Users\Public\Documents\LEADTOOLS Images\Ocr.tif", 1, -1, null); // *** Step 4: Establish zones on the page(s), either manually or automatically // Automatic zoning ocrDocument.Pages.AutoZone(null); // *** Step 5: (Optional) Set the active languages to be used by the OCR engine // Enable English and German languages ocrEngine.LanguageManager.EnableLanguages(new string[] { "en", "de" }); // *** Step 6: (Optional) Set the spell checking language // Enable the spell checking system and set English as the spell language ocrEngine.SpellCheckManager.SpellCheckEngine = OcrSpellCheckEngine.Native; ocrEngine.SpellCheckManager.SpellLanguage = "en"; // *** Step 7: (Optional) Set any special recognition module options // Change the fill method for the first zone in the first page to be default OcrZone ocrZone = ocrDocument.Pages[0].Zones[0]; ocrZone.FillMethod = OcrZoneFillMethod.Default; ocrDocument.Pages[0].Zones[0] = ocrZone; // *** Step 8: Recognize ocrDocument.Pages.Recognize(null); // *** Step 9: Save recognition results // Save the results to a PDF file ocrDocument.Save(@"C:\Users\Public\Documents\LEADTOOLS Images\Document.pdf", DocumentFormat.Pdf, null); ocrDocument.Dispose(); // *** Step 10: Shut down the OCR engine when finished ocrEngine.Shutdown(); ocrEngine.Dispose();