developer tip

개수 (*) 대 개수 (1)-SQL Server

optionbox 2020. 9. 30. 10:25
반응형

개수 (*) 대 개수 (1)-SQL Server


너희들 중 하나를 사용하는 경우 그냥 궁금 Count(1)이상 Count(*)과 성능 또는 경우에 띄는 차이가 있으면이 사라 과거의 일에서 이월 된 단지 기존의 습관은?

(특정 데이터베이스는 SQL Server 2005입니다.)


다른 점이 없다.

이유:

온라인 도서에 " COUNT ( { [ [ ALL | DISTINCT ] expression ] | * } )" 라고 표시됨

"1"은 널이 아닌 표현식이므로 COUNT(*). 옵티마이 저는 그것이 무엇인지를 인식합니다.

EXISTS (SELECT * ...또는 동일EXISTS (SELECT 1 ...

예:

SELECT COUNT(1) FROM dbo.tab800krows
SELECT COUNT(1),FKID FROM dbo.tab800krows GROUP BY FKID

SELECT COUNT(*) FROM dbo.tab800krows
SELECT COUNT(*),FKID FROM dbo.tab800krows GROUP BY FKID

동일한 IO, 동일한 계획, 작업

편집, 2011 년 8 월

DBA.SE에 대한 유사한 질문 .

편집, 2011 년 12 월

COUNT(*)ANSI-92 에서 특별히 언급 되었습니다 ( " Scalar expressions 125"검색).

케이스:

a) COUNT (*)가 지정된 경우 결과는 T의 카디널리티입니다.

즉, ANSI 표준은 당신이 의미하는 바가 명백한 출혈로 인식합니다. 이 미신 때문에COUNT(1) RDBMS 공급 업체에 의해 최적화되었습니다 . 그렇지 않으면 ANSI에 따라 평가됩니다.

b) 그렇지 않으면 TX를 T의 각 행에 <value expression>을 적용하고 null 값을 제거한 결과 인 단일 열 테이블이되도록합니다. 하나 이상의 null 값이 제거되면 완료 조건이 발생합니다. warning-


SQL Server에서 이러한 문은 동일한 계획을 생성합니다.

대중의 의견과는 달리 Oracle에서도 마찬가지입니다.

SYS_GUID() Oracle에서는 상당히 계산 집약적 인 기능입니다.

내 테스트 데이터베이스에서 t_even이있는 테이블1,000,000

이 쿼리 :

SELECT  COUNT(SYS_GUID())
FROM    t_even

대한 실행 48기능 요구 이후의 초 각 평가하는 SYS_GUID()그것이 아니다 있는지 확인 반환 NULL.

그러나이 쿼리 :

SELECT  COUNT(*)
FROM    (
        SELECT  SYS_GUID()
        FROM    t_even
        )

2평가를 시도하지 않기 때문에 몇 초 동안 실행됩니다 SYS_GUID()( *에 인수 임에도 불구하고 COUNT(*))


분명히 COUNT (*)와 COUNT (1)은 항상 동일한 결과를 반환합니다. 따라서 하나가 다른 것보다 느리면 사실상 옵티 마이저 버그 때문일 것입니다. 두 형식 모두 쿼리에서 매우 자주 사용되기 때문에 DBMS가 이러한 버그가 수정되지 않은 상태로 유지되도록 허용하는 것은 의미가 없습니다. 따라서 두 형식의 성능이 모든 주요 SQL DBMS에서 (아마도) 동일하다는 것을 알 수 있습니다.


SQL-92 표준에서 COUNT(*)특히 "테이블 표현식의 카디널리티"를 의미합니다 (기본 테이블,`VIEW, 파생 테이블, CTE 등이 될 수 있음).

COUNT(*)파싱하기 쉬운 아이디어라고 생각합니다 . 다른 표현식을 사용하려면 파서가 열을 참조하지 않도록해야합니다 ( COUNT('a')여기서는 a리터럴이고 COUNT(a)어디 a는 열이 다른 결과를 생성 할 수 있는지 ).

같은 맥락 COUNT(*)에서 SQL 표준에 익숙한 인간 코더가 쉽게 선택할 수 있으며, 이는 둘 이상의 공급 업체의 SQL 오퍼링으로 작업 할 때 유용한 기술입니다.

또한 특별한 경우 SELECT COUNT(*) FROM MyPersistedTable;에는 DBMS가 테이블의 카디널리티에 대한 통계를 보유 할 가능성이 있다고 생각합니다.

Therefore, because COUNT(1) and COUNT(*) are semantically equivalent, I use COUNT(*).


COUNT(*) and COUNT(1) are same in case of result and performance.


I work on the SQL Server team and I can hopefully clarify a few points in this thread (I had not seen it previously, so I am sorry the engineering team has not done so previously).

First, there is no semantic difference between select count(1) from table vs. select count(*) from table. They return the same results in all cases (and it is a bug if not). As noted in the other answers, select count(column) from table is semantically different and does not always return the same results as count(*).

Second, with respect to performance, there are two aspects that would matter in SQL Server (and SQL Azure): compilation-time work and execution-time work. The Compilation time work is a trivially small amount of extra work in the current implementation. There is an expansion of the * to all columns in some cases followed by a reduction back to 1 column being output due to how some of the internal operations work in binding and optimization. I doubt it would show up in any measurable test, and it would likely get lost in the noise of all the other things that happen under the covers (such as auto-stats, xevent sessions, query store overhead, triggers, etc.). It is maybe a few thousand extra CPU instructions. So, count(1) does a tiny bit less work during compilation (which will usually happen once and the plan is cached across multiple subsequent executions). For execution time, assuming the plans are the same there should be no measurable difference. (One of the earlier examples shows a difference - it is most likely due to other factors on the machine if the plan is the same).

As to how the plan can potentially be different. These are extremely unlikely to happen, but it is potentially possible in the architecture of the current optimizer. SQL Server's optimizer works as a search program (think: computer program playing chess searching through various alternatives for different parts of the query and costing out the alternatives to find the cheapest plan in reasonable time). This search has a few limits on how it operates to keep query compilation finishing in reasonable time. For queries beyond the most trivial, there are phases of the search and they deal with tranches of queries based on how costly the optimizer thinks the query is to potentially execute. There are 3 main search phases, and each phase can run more aggressive(expensive) heuristics trying to find a cheaper plan than any prior solution. Ultimately, there is a decision process at the end of each phase that tries to determine whether it should return the plan it found so far or should it keep searching. This process uses the total time taken so far vs. the estimated cost of the best plan found so far. So, on different machines with different speeds of CPUs it is possible (albeit rare) to get different plans due to timing out in an earlier phase with a plan vs. continuing into the next search phase. There are also a few similar scenarios related to timing out of the last phase and potentially running out of memory on very, very expensive queries that consume all the memory on the machine (not usually a problem on 64-bit but it was a larger concern back on 32-bit servers). Ultimately, if you get a different plan the performance at runtime would differ. I don't think it is remotely likely that the difference in compilation time would EVER lead to any of these conditions happening.

Net-net: Please use whichever of the two you want as none of this matters in any practical form. (There are far, far larger factors that impact performance in SQL beyond this topic, honestly).

I hope this helps. I did write a book chapter about how the optimizer works but I don't know if its appropriate to post it here (as I get tiny royalties from it still I believe). So, instead of posting that I'll post a link to a talk I gave at SQLBits in the UK about how the optimizer works at a high level so you can see the different main phases of the search in a bit more detail if you want to learn about that. Here's the video link: https://sqlbits.com/Sessions/Event6/inside_the_sql_server_query_optimizer


I would expect the optimiser to ensure there is no real difference outside weird edge cases.

As with anything, the only real way to tell is to measure your specific cases.

That said, I've always used COUNT(*).


As this question comes up again and again, here is one more answer. I hope to add something for beginners wondering about "best practice" here.

SELECT COUNT(*) FROM something counts records which is an easy task.

SELECT COUNT(1) FROM something retrieves a 1 per record and than counts the 1s that are not null, which is essentially counting records, only more complicated.

Having said this: Good dbms notice that the second statement will result in the same count as the first statement and re-interprete it accordingly, as not to do unnecessary work. So usually both statements will result in the same execution plan and take the same amount of time.

However from the point of readability you should use the first statement. You want to count records, so count records, not expressions. Use COUNT(expression) only when you want to count non-null occurences of something.


I ran a quick test on SQL Server 2012 on an 8 GB RAM hyper-v box. You can see the results for yourself. I was not running any other windowed application apart from SQL Server Management Studio while running these tests.

My table schema:

CREATE TABLE [dbo].[employee](
    [Id] [bigint] IDENTITY(1,1) NOT NULL,
    [Name] [nvarchar](50) NOT NULL,
 CONSTRAINT [PK_employee] PRIMARY KEY CLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

Total number of records in Employee table: 178090131 (~ 178 million rows)

First Query:

Set Statistics Time On
Go    
Select Count(*) From Employee
Go    
Set Statistics Time Off
Go

Result of First Query:

 SQL Server parse and compile time: 
 CPU time = 0 ms, elapsed time = 35 ms.

 (1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 10766 ms,  elapsed time = 70265 ms.
 SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

Second Query:

    Set Statistics Time On
    Go    
    Select Count(1) From Employee
    Go    
    Set Statistics Time Off
    Go

Result of Second Query:

 SQL Server parse and compile time: 
   CPU time = 14 ms, elapsed time = 14 ms.

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 11031 ms,  elapsed time = 70182 ms.
 SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

You can notice there is a difference of 83 (= 70265 - 70182) milliseconds which can easily be attributed to exact system condition at the time queries are run. Also I did a single run, so this difference will become more accurate if I do several runs and do some averaging. If for such a huge data-set the difference is coming less than 100 milliseconds, then we can easily conclude that the two queries do not have any performance difference exhibited by the SQL Server Engine.

Note : RAM hits close to 100% usage in both the runs. I restarted SQL Server service before starting both the runs.


SET STATISTICS TIME ON

select count(1) from MyTable (nolock) -- table containing 1 million records. 

SQL Server Execution Times:
CPU time = 31 ms, elapsed time = 36 ms.

select count(*) from MyTable (nolock) -- table containing 1 million records. 

SQL Server Execution Times:
CPU time = 46 ms, elapsed time = 37 ms.

I've ran this hundreds of times, clearing cache every time.. The results vary from time to time as server load varies, but almost always count(*) has higher cpu time.


There is an article showing that the COUNT(1) on Oracle is just an alias to COUNT(*), with a proof about that.

I will quote some parts:

There is a part of the database software that is called “The Optimizer”, which is defined in the official documentation as “Built-in database software that determines the most efficient way to execute a SQL statement“.

One of the components of the optimizer is called “the transformer”, whose role is to determine whether it is advantageous to rewrite the original SQL statement into a semantically equivalent SQL statement that could be more efficient.

Would you like to see what the optimizer does when you write a query using COUNT(1)?

With a user with ALTER SESSION privilege, you can put a tracefile_identifier, enable the optimizer tracing and run the COUNT(1) select, like: SELECT /* test-1 */ COUNT(1) FROM employees;.

After that, you need to localize the trace files, what can be done with SELECT VALUE FROM V$DIAG_INFO WHERE NAME = 'Diag Trace';. Later on the file, you will find:

SELECT COUNT(*) “COUNT(1)” FROM “COURSE”.”EMPLOYEES” “EMPLOYEES”

As you can see, it's just an alias for COUNT(*).

Another important comment: the COUNT(*) was really faster two decades ago on Oracle, before Oracle 7.3:

Count(1) has been rewritten in count(*) since 7.3 because Oracle like to Auto-tune mythic statements. In earlier Oracle7, oracle had to evaluate (1) for each row, as a function, before DETERMINISTIC and NON-DETERMINISTIC exist.

So two decades ago, count(*) was faster

For another databases as Sql Server, it should be researched individually for each one.

I know that this question is specific for Sql Server, but the other questions on SO about the same subject, without mention the database, was closed and marked as duplicated from this answer.


COUNT(1) is not substantially different from COUNT(*), if at all. As to the question of COUNTing NULLable COLUMNs, this can be straightforward to demo the differences between COUNT(*) and COUNT(<some col>)--

USE tempdb;
GO

IF OBJECT_ID( N'dbo.Blitzen', N'U') IS NOT NULL DROP TABLE dbo.Blitzen;
GO

CREATE TABLE dbo.Blitzen (ID INT NULL, Somelala CHAR(1) NULL);

INSERT dbo.Blitzen SELECT 1, 'A';
INSERT dbo.Blitzen SELECT NULL, NULL;
INSERT dbo.Blitzen SELECT NULL, 'A';
INSERT dbo.Blitzen SELECT 1, NULL;

SELECT COUNT(*), COUNT(1), COUNT(ID), COUNT(Somelala) FROM dbo.Blitzen;
GO

DROP TABLE dbo.Blitzen;
GO

In all RDBMS, the two ways of counting are equivalent in terms of what result they produce. Regarding performance, I have not observed any performance difference in SQL Server, but it may be worth pointing out that some RDBMS, e.g. PostgreSQL 11, have less optimal implementations for COUNT(1) as they check for the argument expression's nullability as can be seen in this post.

I've found a 10% performance difference for 1M rows when running:

-- Faster
SELECT COUNT(*) FROM t;

-- 10% slower
SELECT COUNT(1) FROM t;

참고URL : https://stackoverflow.com/questions/1221559/count-vs-count1-sql-server

반응형