SQL語法提示工具SQL Prompt教程:為什么SELECT *(BP005)在生產代碼中不好?(下)
SQL Prompt根據(jù)數(shù)據(jù)庫的對象名稱、語法和代碼片段自動進行檢索,為用戶提供合適的代碼選擇。自動腳本設置使代碼簡單易讀--當開發(fā)者不大熟悉腳本時尤其有用。SQL Prompt安裝即可使用,能大幅提高編碼效率。此外,用戶還可根據(jù)需要進行自定義,使之以預想的方式工作。
如果“提示”警告您在SELECT語句中使用星號或“star”(*),請考慮將其替換為顯式列列表。它將防止不必要的網絡負載和查詢性能問題,并避免在插入表時如果列順序更改而造成問題。這篇文章主要描述該教程的后半部分內容,“為什么SELECT *在生產代碼中不好?”的一些內容(緊接上文),還有“在應用程序中選擇*”的內容。
誤解
使用SELECT *,您不能確保代碼始終以相同的順序返回相同的列,這意味著它對數(shù)據(jù)庫重構沒有彈性。對表源的上游修改可以更改列的順序或數(shù)量。如果使用來傳輸數(shù)據(jù),INSERT INTO…SELECT *,那么最佳結果將是一個錯誤,因為分配數(shù)據(jù)的后果是錯誤的目標列可能會令人恐懼
我將演示如果在生產代碼中使用它,然后需要進行一些數(shù)據(jù)庫重構,那么這將是多么危險。在這里,我們在復制敏感信息時會犯一個錯誤。這是非常容易做到的,并且可能導致財務違規(guī),而不會觸發(fā)任何錯誤。如果您情緒緊張,請立即移開視線。
/* we create a table just for our testing */ CREATE TABLE dbo.ExchangeRates --lets pretend we have this data ( CurrencyRateDate DATETIME NOT NULL, AverageRate MONEY NOT NULL, EndOfDayRate MONEY NOT NULL, FromCurrency NVARCHAR(50) NOT NULL, FromRegion NVARCHAR(50) NOT NULL, ToCurrency NVARCHAR(50) NOT NULL, ToRegion NVARCHAR(50) NOT NULL ); /* we now steal data for it from AdventureWorks next-door */ INSERT INTO dbo.ExchangeRates SELECT CurrencyRate.CurrencyRateDate, CurrencyRate.AverageRate, CurrencyRate.EndOfDayRate, Currency.Name AS FromCurrency, CountryRegion.Name AS FromRegion, CurrencyTo.Name AS ToCurrency, CountryRegionTo.Name AS ToRegion FROM Adventureworks2016.Sales.CurrencyRate INNER JOIN Adventureworks2016.Sales.Currency ON CurrencyRate.FromCurrencyCode = Currency.CurrencyCode INNER JOIN Adventureworks2016.Sales.CountryRegionCurrency ON Currency.CurrencyCode = CountryRegionCurrency.CurrencyCode INNER JOIN Adventureworks2016.Person.CountryRegion ON CountryRegionCurrency.CountryRegionCode = CountryRegion.CountryRegionCode INNER JOIN Adventureworks2016.Sales.Currency AS CurrencyTo ON CurrencyRate.ToCurrencyCode = CurrencyTo.CurrencyCode INNER JOIN Adventureworks2016.Sales.CountryRegionCurrency AS CountryRegionCurrencyTo ON CurrencyTo.CurrencyCode = CountryRegionCurrencyTo.CurrencyCode INNER JOIN Adventureworks2016.Person.CountryRegion AS CountryRegionTo ON CountryRegionCurrencyTo.CountryRegionCode = CountryRegionTo.CountryRegionCode; GO /* so we start our test by creating a view to show exchange rates from equador */ CREATE VIEW dbo.EquadorExhangeRates AS SELECT ExchangeRates.CurrencyRateDate, ExchangeRates.AverageRate, ExchangeRates.EndOfDayRate, ExchangeRates.FromCurrency, ExchangeRates.FromRegion, ExchangeRates.ToCurrency, ExchangeRates.ToRegion FROM dbo.ExchangeRates WHERE ExchangeRates.FromRegion = 'Ecuador'; go /* now we just fill a table variable with the first ten rows from the view and display them */ DECLARE @MyUsefulExchangeRates TABLE ( CurrencyRateDate DATETIME NOT NULL, AverageRate MONEY NOT NULL, EndOfDayRate MONEY NOT NULL, FromCurrency NVARCHAR(50) NOT NULL, FromRegion NVARCHAR(50) NOT NULL, ToCurrency NVARCHAR(50) NOT NULL, ToRegion NVARCHAR(50) NOT NULL ); INSERT INTO @MyUsefulExchangeRates ( CurrencyRateDate, AverageRate, EndOfDayRate, FromCurrency, FromRegion,ToCurrency, ToRegion) SELECT * --this isn't good at all FROM dbo.EquadorExhangeRates; --disply the first ten rows from the table to see what we have SELECT TOP 10 UER.CurrencyRateDate, UER.AverageRate, UER.EndOfDayRate, UER.ToCurrency, UER.ToRegion, UER.FromCurrency, UER.FromRegion FROM @MyUsefulExchangeRates AS UER ORDER BY UER.CurrencyRateDate DESC; GO /* end of first part. Now someone decides to alter the view */ alter VIEW dbo.EquadorExhangeRates AS SELECT ExchangeRates.CurrencyRateDate, ExchangeRates.AverageRate, ExchangeRates.EndOfDayRate, ExchangeRates.ToCurrency, ExchangeRates.ToRegion, ExchangeRates.FromCurrency, ExchangeRates.FromRegion FROM dbo.ExchangeRates WHERE ExchangeRates.FromRegion = 'Ecuador'; GO /* we repeat the routine to extract the first ten rows exactly as before */ DECLARE @MyUsefulExchangeRates TABLE ( CurrencyRateDate DATETIME NOT NULL, AverageRate MONEY NOT NULL, EndOfDayRate MONEY NOT NULL, FromCurrency NVARCHAR(50) NOT NULL, FromRegion NVARCHAR(50) NOT NULL, ToCurrency NVARCHAR(50) NOT NULL, ToRegion NVARCHAR(50) NOT NULL ); INSERT INTO @MyUsefulExchangeRates( CurrencyRateDate, AverageRate, EndOfDayRate, FromCurrency, FromRegion,ToCurrency, ToRegion) SELECT * --bad, bad, bad FROM dbo.EquadorExhangeRates; --check that the data is the same. It isn't is it? No sir! SELECT TOP 10 UER.CurrencyRateDate, UER.AverageRate, UER.EndOfDayRate, UER.ToCurrency, UER.ToRegion, UER.FromCurrency, UER.FromRegion FROM @MyUsefulExchangeRates AS UER ORDER BY UER.CurrencyRateDate DESC; GO /* now just tidy up and tear down */ DROP VIEW dbo.EquadorExhangeRates DROP TABLE dbo.ExchangeRates
這是“之前”和“之后”結果…。
如您所見,通過切換“to”和“from”列,我們“無意”破壞了數(shù)據(jù)。引用列列表在您的代碼中是多余的。但是,它的執(zhí)行速度甚至比僅用星號指定所有列(假設它們按特定順序排列)時的速度甚至更快。
約束問題
當我們使用SELECT *與大量的聯(lián)接表時,我們可以并且可能會有重復的列名。這是來自AdventureWorks的簡單查詢:
SELECT * FROM HumanResources.Employee AS e INNER JOIN Person.Person AS p ON p.BusinessEntityID = e.BusinessEntityID INNER JOIN HumanResources.EmployeeDepartmentHistory AS edh ON e.BusinessEntityID = edh.BusinessEntityID INNER JOIN HumanResources.Department AS d ON edh.DepartmentID = d.DepartmentID WHERE (edh.EndDate IS NULL);
此代碼將顯示重復的列名稱:
DECLARE @SourceCode NVARCHAR(4000)=' SELECT * FROM HumanResources.Employee AS e INNER JOIN Person.Person AS p ON p.BusinessEntityID = e.BusinessEntityID INNER JOIN HumanResources.EmployeeDepartmentHistory AS edh ON e.BusinessEntityID = edh.BusinessEntityID INNER JOIN HumanResources.Department AS d ON edh.DepartmentID = d.DepartmentID WHERE (edh.EndDate IS NULL); --' SELECT Count(*) AS Duplicates, name FROM sys.dm_exec_describe_first_result_set(@SourceCode, NULL, 1) GROUP BY name HAVING Count(*) > 1 ORDER BY Count(*) DESC;
這將給試圖在選擇命名列時理解這種結果的應用程序帶來問題。如果您嘗試根據(jù)結果創(chuàng)建一個臨時表,使用SELECT…INTO會失敗。
SELECT * INTO MyTempTable FROM HumanResources.Employee AS e INNER JOIN Person.Person AS p ON p.BusinessEntityID = e.BusinessEntityID INNER JOIN HumanResources.EmployeeDepartmentHistory AS edh ON e.BusinessEntityID = edh.BusinessEntityID INNER JOIN HumanResources.Department AS d ON edh.DepartmentID = d.DepartmentID WHERE (edh.EndDate IS NULL); Msg 2705, Level 16, State 3, Line 19 Column names in each table must be unique. Column name 'BusinessEntityID' in table 'MyTempTable' is specified more than once.
同樣,這意味著您的SELECT *代碼很脆弱。如果有人在一個表中更改了名稱,則可能會在SELECT * INTO其他位置的上創(chuàng)建重復的列,而您只能撓頭,想知道為什么正常工作的例程突然崩潰了
有一個地方SELECT *具有特殊的意義,不能被替代。這是在將結果轉換為JSON時,并且您需要將聯(lián)接表作為對象嵌入的結果時發(fā)生的情況。
SELECT * FROM HumanResources.Employee AS employee INNER JOIN Person.Person AS person ON person.BusinessEntityID = employee.BusinessEntityID INNER JOIN HumanResources.EmployeeDepartmentHistory AS history ON employee.BusinessEntityID = history.BusinessEntityID INNER JOIN HumanResources.Department AS d ON history.DepartmentID = d.DepartmentID WHERE ( history.EndDate IS NULL) FOR JSON AUTO
這將為您提供…(我僅顯示數(shù)組中的第一個文檔)
[{"BusinessEntityID": 1,"NationalIDNumber": "295847284","LoginID": "adventure-works\\ken0","JobTitle": "Chief Executive Officer","BirthDate": "1969-01-29","MaritalStatus": "S","Gender": "M","HireDate": "2009-01-14","SalariedFlag": true, "VacationHours": 99, "SickLeaveHours": 69, "CurrentFlag": true, "rowguid": "F01251E5-96A3-448D-981E-0F99D789110D","ModifiedDate": "2014-06-30T00:00:00", "person": [{ "BusinessEntityID": 1, "PersonType": "EM","NameStyle": false, "FirstName": "Ken","MiddleName": "J","LastName": "Sánchez","EmailPromotion": 0, "Demographics": "0<\/TotalPurchaseYTD><\/IndividualSurvey>","rowguid": "92C4279F-1207-48A3-8448-4636514EB7E2","ModifiedDate": "2009-01-07T00:00:00", "history": [{ "BusinessEntityID": 1, "DepartmentID": 16, "ShiftID": 1, "StartDate": "2009-01-14","ModifiedDate": "2009-01-13T00:00:00", "d": [{ "DepartmentID": 16, "Name": "Executive","GroupName": "Executive General and Administration","ModifiedDate": "2008-04-30T00:00:00" }] }] }] }}
這里沒有沖突,因為ModifiedDate列被封裝在表示源表的對象中
對應的XML給出如下:
<employee BusinessEntityID="1" NationalIDNumber="295847284" LoginID="adventure-works\ken0" JobTitle="Chief Executive Officer" BirthDate="1969-01-29" MaritalStatus="S" Gender="M" HireDate="2009-01-14" SalariedFlag="1" VacationHours="99" SickLeaveHours="69" CurrentFlag="1" rowguid="F01251E5-96A3-448D-981E-0F99D789110D" ModifiedDate="2014-06-30T00:00:00"> <person BusinessEntityID="1" PersonType="EM" NameStyle="0" FirstName="Ken" MiddleName="J" LastName="Sánchez" EmailPromotion="0" rowguid="92C4279F-1207-48A3-8448-4636514EB7E2" ModifiedDate="2009-01-07T00:00:00"> <Demographics> <IndividualSurvey xmlns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey"> <TotalPurchaseYTD>0</TotalPurchaseYTD> </IndividualSurvey> </Demographics> <history BusinessEntityID="1" DepartmentID="16" ShiftID="1" StartDate="2009-01-14" ModifiedDate="2009-01-13T00:00:00"> <d DepartmentID="16" Name="Executive" GroupName="Executive General and Administration" ModifiedDate="2008-04-30T00:00:00"/> </history> </person> </employee>
可維護性
在布置代碼時,您指定的列不僅避免在將值分配給正確的列或變量時出錯,而且還使代碼更具可讀性。盡您所能,僅出于將來的目的,或者有一天要負責維護代碼的可憐的靈魂,就應詳細說明所涉及的列的名稱。當然,代碼看起來有些笨拙,但是如果您的肩膀上出現(xiàn)了一位仙女,并說如果您兩次鍵入代碼,您的代碼將更加清晰和可靠,您會這樣做嗎?
在應用程序中選擇*
有時,您會看到長時間運行的查詢,這些查詢請求所有列并且源于一個應用程序,通常是使用LINQ的應用程序。通常,這不是故意的,但是開發(fā)人員犯了一個錯誤,沒有指定列的說明,看起來無辜的LINQ查詢會轉換為SELECT *或包含每個列的列列表。如果該WHERE條款過于籠統(tǒng),或者甚至被完全遺漏,那么后果就更加復雜了,因為網絡始終是最慢的組件,所有不必要的數(shù)據(jù)都在網絡上堆積。
例如,使用Adventureworks和LinqPad,可以在LINQ中執(zhí)行此操作:
Persons.OrderBy (p => p.BusinessEntityID).Take (100)
…LINQ將其轉換為實際執(zhí)行的查詢。您會看到它選擇了所有列…
SELECT TOP (100) [t0].[BusinessEntityID], [t0].[PersonType], [t0].[NameStyle], [t0].[Title], [t0].[FirstName], [t0].[MiddleName], [t0].[LastName], [t0].[Suffix], [t0].[EmailPromotion], [t0].[AdditionalContactInfo], [t0].[Demographics], [t0].[rowguid] AS [Rowguid], [t0].[ModifiedDate] FROM [Person].[Person] AS [t0] ORDER BY [t0].[BusinessEntityID]
同樣,這個表達式
from row in Persons select row
…將提供整個表格中每一行的每一列。
SELECT [t0].[BusinessEntityID], [t0].[PersonType], [t0].[NameStyle], [t0].[Title], [t0].[FirstName], [t0].[MiddleName], [t0].[LastName], [t0].[Suffix], [t0].[EmailPromotion], [t0].[AdditionalContactInfo], [t0].[Demographics], [t0].[rowguid] AS [Rowguid], [t0].[ModifiedDate] FROM [Person].[Person] AS [t0]
相比之下,這…
from row in Persons.Where(i => i.LastName == "Bradley") select row.FirstName+" "+row.LastName
…翻譯成更明智的:
-- Region Parameters DECLARE @p0 NVarChar(1000) = 'Bradley' DECLARE @p1 NVarChar(1000) = ' ' -- EndRegion SELECT ([t0].[FirstName] + @p1) + [t0].[LastName] AS [value] FROM [Person].[Person] AS [t0] WHERE [t0].[LastName] = @p0
結論
一般的代碼味道是請求提供比您需要的更多的數(shù)據(jù)。允許數(shù)據(jù)源為您進行過濾幾乎總是更好、更快的方法。使用SELECT *,在某些情況下是完全合法的,通常是這個更普遍問題的標志。對于那些精通C#或VB但不精通SQL的開發(fā)人員來說,誘使他們下載整行甚至整個表,并在更熟悉的領域進行過濾是很誘人的。額外的網絡負載和延遲本身應該足以阻止這種做法,但這通常被誤認為是“數(shù)據(jù)庫慢”。長列列表(通常列出所有列)幾乎與SELECT *一樣有害,盡管SELECT *在進行任何重構時會帶來額外的風險。
本教程內容到這里就結束了,感興趣的朋友可以繼續(xù)關注我們,后面會不管更新新的文章內容!您也可以下載SQL Prompt免費版評估一下~
相關內容推薦:
SQL語法提示工具SQL Prompt教程:為什么SELECT *(BP005)在生產代碼中不好?(上)
想要購買SQL Prompt正版授權,或了解更多產品信息請點擊“咨詢在線客服”