將Aspose.Words與Azure Data Lake集成
Aspose.Words是一種高級Word文檔處理API,用于執(zhí)行各種文檔管理和操作任務(wù)。API支持生成,修改,轉(zhuǎn)換,呈現(xiàn)和打印文檔,而無需在跨平臺應(yīng)用程序中直接使用Microsoft Word。
Aspose API支持流行文件格式處理,并允許將各類文檔導(dǎo)出或轉(zhuǎn)換為固定布局文件格式和最常用的圖像/多媒體格式。
Aspose技術(shù)交流群(761297826)
Aspose.Words可以與Microsoft Azure Data Lake服務(wù)集成:Azure Data Lake Analytics(ADLA)和Azure Data Lake Storage(ADLS)。這允許你將 Azure Data Lake 云存儲解決方案的大數(shù)據(jù)分析功能與 Aspose.Words 的強大功能相結(jié)合,使應(yīng)用程序能夠以編程方式執(zhí)行各種文檔處理任務(wù),例如生成、修改、呈現(xiàn)、讀取或轉(zhuǎn)換不同格式之間的文檔。
本文介紹如何使用 ADLA 在 Visual Studio 中配置 C# 項目,并提供一個演示 Aspose.Words 和 Azure Data Lake 集成的示例。
先決條件
-
Active Microsoft Azure 訂閱。如果您沒有免費帳戶,請在開始之前創(chuàng)建一個免費帳戶。
-
安裝了 Azure 開發(fā)的 Visual Studio 2019 或 Visual Studio 2017。
-
安裝了 Azure Data Lake Tools for Visual Studio。
-
使用 ADLA 帳戶配置了 Visual Studio。
使用來自 Azure 數(shù)據(jù)湖的數(shù)據(jù)創(chuàng)建文檔
本主題演示如何使用 Aspose.Words 從 Azure Data Lake 上的數(shù)據(jù)庫生成包含表的文檔。這需要創(chuàng)建一個簡單的數(shù)據(jù)庫并實現(xiàn)IOutputter接口來創(chuàng)建用戶定義的輸出器,該輸出器以Aspose.Words支持的格式從ADLS輸出數(shù)據(jù)。
在 Azure 數(shù)據(jù)湖存儲 (ADLS) 中創(chuàng)建數(shù)據(jù)庫
出于演示目的,需要創(chuàng)建一個簡單的數(shù)據(jù)庫,其中包含用于填充結(jié)果文檔的示例數(shù)據(jù)。
客戶示例表駐留在 ADLS 上的sample_db數(shù)據(jù)庫中。若要創(chuàng)建此示例數(shù)據(jù)庫,請登錄到 ADLA 帳戶,單擊“新建作業(yè)”,然后提交以下腳本:
U-SQL
CREATE DATABASE IF NOT EXISTS sample_db; USE DATABASE sample_db; CREATE SCHEMA IF NOT EXISTS dbo; DROP TABLE IF EXISTS dbo.Customers; CREATE TABLE dbo.Customers ( Customer_id int, Customer_name string, Customer_domain string, Customer_city string, INDEX idx_customer_id CLUSTERED (Customer_id ASC) ) DISTRIBUTED BY RANGE (Customer_id); INSERT INTO sample_db.dbo.Customers (Customer_id, Customer_name, Customer_domain, Customer_city) VALUES (1, "John Smith", "History", "Boston"), (2, "Lisa Jaine", "Chemistry", "LA"), (3, "James Johnson", "Heraldry", "Milwaukee"), (4, "Sara Soyer", "IT", "Miami");
實現(xiàn) IOutputter 接口
在 Visual Studio 中,通過添加 C# 類庫(對于 U-SQL 應(yīng)用程序)來創(chuàng)建新項目,并將 NuGet 引用添加到 Aspose.Words。
下面的代碼示例演示如何實現(xiàn) IOutputter 接口:
using Microsoft.Analytics.Interfaces; using System; using System.IO; using System.Linq; using Aspose.Words; namespace AsposeWordsOutputterUSql { [SqlUserDefinedOutputter(AtomicFileProcessing = true)] public class AsposeWordsOutputer : IOutputter { public AsposeWordsOutputer(SaveFormat saveFormat) { // Pass the specified save format. mSaveFormat = saveFormat; // Create an instance of DocumentBuilder, which will be used to build the document. mDocumentBuilder = new DocumentBuilder(); } /// <summary> /// The Close method is used to write the document to the file. It is executed only once, after all rows. /// </summary> public override void Close() { // End the table. mDocumentBuilder.EndTable(); // The stream passed from IUnstructuredWriter.BaseStream does not support seeking. // This causes an exception when saving to PDF. // To avoid problems, save the output document into MemoryStream first // and then write its content to the IUnstructuredWriter.BaseStream. using (BinaryWriter writer = new BinaryWriter(mOutputStream)) { // Save the document and close the stream. using (MemoryStream ms = new MemoryStream()) { mDocumentBuilder.Document.Save(ms, mSaveFormat); writer.Write(ms.ToArray()); } } } public override void Output(IRow row, IUnstructuredWriter output) { // Table with header row output--runs only once. if (mIsHeaderRow) ProcessHeaderRow(row.Schema); ProcessRow(row); // Reference to the instance of the IO.Stream object for saving document. mOutputStream = output.BaseStream; } /// <summary> /// Create HeaderRow of the table. /// </summary> private void ProcessHeaderRow(ISchema schema) { // Start the table before building it. mDocumentBuilder.StartTable(); // Build the table. for (int i = 0; i < schema.Count(); i++) { IColumn col = schema[i]; mDocumentBuilder.InsertCell(); // Write a header with bold font. mDocumentBuilder.Font.Bold = true; mDocumentBuilder.Write(col.Name); } mDocumentBuilder.EndRow(); // Write data with normal font. mDocumentBuilder.Font.Bold = false; // Table with header row output--runs only once. mIsHeaderRow = false; } /// <summary> /// Create Row of the table. /// </summary> private void ProcessRow(IRow row) { // Metadata schema initialization to enumerate column names. ISchema schema = row.Schema; // Data row output. for (int i = 0; i < schema.Count(); i++) { IColumn col = schema[i]; string val = ""; Type type = col.Type; // Get the cell value in the current row by column name and cast it to the column type. if (type == typeof(string)) val = row.Get<string>(col.Name); else if (type == typeof(int)) val = row.Get<int>(col.Name).ToString(); else val = "Column type is not supported."; mDocumentBuilder.InsertCell(); mDocumentBuilder.Write(val); } mDocumentBuilder.EndRow(); } private readonly DocumentBuilder mDocumentBuilder; private readonly SaveFormat mSaveFormat; private Stream mOutputStream; private bool mIsHeaderRow = true; static AsposeWordsOutputer() { // Note: The Aspose.Words license needs to be applied only once before any Document instance is created. // To execute the code only once, a static constructor is used. The below code will find and activate the license. // Uncomment the following code and add your license file as an embedded resource in the project. // Aspose.Words.License lic = new Aspose.Words.License(); // lic.SetLicense("Aspose.Words.lic"); } } }
請注意上面代碼示例中描述的許可細(xì)微差別。
在 Azure 數(shù)據(jù)湖分析 (ADLA) 中注冊程序集
若要將項目的 C# 類庫與 ADLA 帳戶集成,請將程序集注冊到 ADLA 帳戶:
- 在 Visual Studio 中,右鍵單擊項目名稱,然后選擇“注冊程序集”。
- 選擇 ADLA 帳戶名稱和數(shù)據(jù)庫名稱。
- 展開“托管依賴項”面板并選中 Aspose.Words,如下面的屏幕截圖所示。
在 Azure 門戶中運行 U-SQL 作業(yè)
若要啟動應(yīng)用程序,需要在 ADLA 中運行以下 U-SQL 代碼,該代碼包含必要的引用并調(diào)用用戶定義的輸出器:
U-SQL
USE DATABASE [sample_db];
REFERENCE ASSEMBLY AsposeWordsOutputterUSQL; REFERENCE ASSEMBLY [Aspose.Words]; @test = SELECT * FROM dbo.Customers; OUTPUT @test TO "/output/Customers_AW.docx" USING new AsposeWordsOutputterUSql.AsposeWordsOutputer(Aspose.Words.SaveFormat.Docx);
您可以使用適用于特定項目的各種格式輸出文檔,例如 Docx、Doc、Pdf、Rtf、文本、Jpeg 等。有關(guān)詳細(xì)信息,請參閱保存格式枚舉。
在 ADLS 的輸出文件夾中找到該文件并下載它。
以下屏幕截圖顯示了執(zhí)行應(yīng)用程序后輸出文檔的外觀。