Postgres_쿼리 실행 계획 분석

728x90

참조 : https://seungtaek-overflow.tistory.com/5

쿼리 실행 계획 분석하는 이유

DB 쿼리 성능을 최적화하기 위해선 1) DB에서 쿼리를 어떻게 실행하는지 그리고 2) 쿼리의 성능이 어떤지를 알면 좋다. 이를 위해서 DB의 Query execution plan(쿼리 실행 계획)을 분석해야 한다.

테스트 준비

# id, title, author, created_at 컬럼 4개로 구성된 post 테이블 생성하기

CREATE TABLE post (
    id serial PRIMARY KEY,
    title varchar(255),
    author varchar(255),
    created_at timestamp
);

# 데이터 삽입 100000개
DO $$
DECLARE
    i INTEGER := 1;
BEGIN
    WHILE i < 1000000 LOOP
        INSERT INTO post(title, author, created_at)
            VALUES(CONCAT('title', i), CONCAT('author', i % 100), now() + i * INTERVAL '1 second');
        i := i + 1;
    END LOOP;
END $$;

Query Plan이란

DB 엔진은 쿼리를 수행하기 전에 어떤 방식으로 스캔을 하는지? 어떤 순서로 처리할까?에 대한 계획을 수립하는 영리한 친구다. 이 계획을 Query execution plan(쿼리 실행 계획)이라 부른다.
이 계획을 알기 위해선 EXPLAIN 키워드를 sql문에 추가해주면 된다. 이에 ANALYZE 키워드도 함께 추가하여 실행계획을 구체적으로 분석한다.

Table Scan

PostgreSQL이 어떤 방식으로 스캔하는지 분석하기 위해선 어떤 스캔 유형이 있는지 알아야 한다. 스캔 방식은 총 5개다.

Sequential Scan : 모든 데이터를 하나씩 확인하는 방법. index없는 컬럼들을 검색 시 주로 활용
Index Scan: Index를 탐색하는 방법. 주로 조건문 조회 시 활용
Index Only Scan : index에 필요한 값이 이미 포함된 경우 사용하는 방식.
Bitmap Scan : Index Scan과 Sequential Scan 섞은 방식이다.
TID Scan : 테이블 데이터 식별하기 위해 사용하는 TID(Tuple Indicator)를 활용한 쿼리 스캔 방식.

Sequential Scan

# Sequential Scan

postgres=# explain analyze select * from post where title = 'title412';
                                                    QUERY PLAN

-----------------------------------------------------------------------------------
--------------------------------
 Gather  (cost=1000.00..14369.43 rows=1 width=31) (actual time=0.803..76.245 rows=1
 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on post  (cost=0.00..13369.33 rows=1 width=31) (actual tim
e=7.497..30.564 rows=0 loops=3)
         Filter: ((title)::text = 'title412'::text)
         Rows Removed by Filter: 333333
 Planning Time: 0.355 ms
 Execution Time: 76.284 ms
(8 rows)

쿼리 실행 계획은 2단계로 진행

Gather
Seq Scan

-> 뒤에 있는 Seq Scan은 child plan을 의미하며, 안쪽에서 먼저 실행한 뒤 역으로 바깥의 단계로 넘어가는 바텀업 방식으로 진행된다.

title은 인덱스가 없어서 Seq Scan으로 순차적으로 검색했다. 2개의 워커 프로세스를 만들어서 병렬적으로 테이블을 traverse했다. 각 워커 프로세스가 찾아낸 row를 종합하여 쿼리를 수행했다.

Row Removed by Filter은 조회를 위해 참고는 했는데 조건에 맞지 않아 누락시켜버린 row 수를 말한다. 그만큼 누락의 수가 많다는 것은 허탕을 많이 쳤다는 말이다. Row Removed by Filter수가 적으면 쿼리의 최적화 수준이 높다는 것을 알 수 있는 지표다.

Execution Time : 총 실행 소요 시간이다.

Index Scan

postgres=# explain analyze select * from post where id = 3333;

QUERY PLAN
--------------------------------------------------------
 Index Scan using post_pkey on post  (cost=0.42..8.44 rows=1 width=31) (actual time
=0.060..0.063 rows=1 loops=1)
   Index Cond: (id = 3333)
 Planning Time: 0.229 ms
 Execution Time: 0.108 ms
(4 rows)

postgres=# explain analyze select * from post where id < 1000;
                                                      QUERY PLAN
-----------------------------------
 Index Scan using post_pkey on post  (cost=0.42..40.47 rows=974 width=31) (actual t
ime=0.024..0.348 rows=999 loops=1)
   Index Cond: (id < 1000)
 Planning Time: 0.224 ms
 Execution Time: 0.443 ms
(4 rows)

id column은 pk여서 자동으로 인덱스가 지정되어있다. 인덱스는 스캔 방법 중 가장 빠른 방법이다.

postgres=# explain analyze select * from post where id < 1000 order by id desc;
                                                          QUERY PLAN

-----------------------------------------------------------------------------------
--------------------------------------------
 Index Scan Backward using post_pkey on post  (cost=0.42..40.47 rows=974 width=31)
(actual time=0.029..0.371 rows=999 loops=1)
   Index Cond: (id < 1000)
 Planning Time: 0.305 ms
 Execution Time: 0.500 ms
(4 rows)

Index Scan Backward는 id를 역순으로 조회했다는 말이다. 이는 index가 항상 정렬되어있기 때문에 이런 구조로부터 얻는 효율성을 이용하고자 역으로 조회한 것이다.

postgres=# explain analyze select * from post where id > 1000;
                                                  QUERY PLAN

-----------------------------------------------------------------------------------
---------------------------
 Seq Scan on post  (cost=0.00..20660.99 rows=999024 width=31) (actual time=0.099..8
6.495 rows=998999 loops=1)
   Filter: (id > 1000)
   Rows Removed by Filter: 1000
 Planning Time: 0.179 ms
 Execution Time: 122.320 ms
(5 rows)

이번에는 1000 미만이 아니라 1000 초과하는 id 번호를 조회하는 해본다. 이때는 이전과 달리 Seq Scan으로 서치를 한다. 이는 DB 엔진이 index 스캔 방식보다 seq 스캔 방식이 더 효율적이라고 판단했기 때문이다. 스마트하다.

Index Only Scan

인덱스에 필요한 데이터가 있는 경우 바로 가져오는 방식을 Index Only Scan이라 한다. index가 적용된 컬럼만 참조할 경우 쓴다.

postgres=# explain analyze select id from post where id < 3000;
                                                         QUERY PLAN

-----------------------------------------------------------------------------------
-----------------------------------------
 Index Only Scan using post_pkey on post  (cost=0.42..87.61 rows=2925 width=4) (act
ual time=0.023..0.606 rows=2999 loops=1)
   Index Cond: (id < 3000)
   Heap Fetches: 0
 Planning Time: 0.233 ms
 Execution Time: 0.859 ms
(5 rows)

Bitmap Scan

# author 인덱스 생성
postgres=# create index idx_author on post(author);
CREATE INDEX

postgres=# explain analyze select id from post where id < 600000 and author = 'author54';
                                                         QUERY PLAN

----------------------------------------------------------------------------------------
-------------------------------------
 Bitmap Heap Scan on post  (cost=115.98..8868.17 rows=6228 width=4) (actual time=5.256..
24.752 rows=6000 loops=1)
   Recheck Cond: ((author)::text = 'author54'::text)
   Filter: (id < 600000)
   Rows Removed by Filter: 4000
   Heap Blocks: exact=8160
   ->  Bitmap Index Scan on idx_author  (cost=0.00..114.42 rows=10400 width=0) (actual t
ime=2.642..2.642 rows=10000 loops=1)
         Index Cond: ((author)::text = 'author54'::text)
 Planning Time: 0.228 ms
 Execution Time: 25.081 ms
(9 rows)

우선 child plan으로 Bitmap Index Scan이 수행되고, 그 다음 Bitmap Heap Scan이 수행되었다.

Bitmap Index Scan
- 인덱스 자료 구조 스캔하여 조건에 해당하는 인덱스 데이터를 이용하여 bitmap을 생성한다. bitmap에는 인덱스와 매핑되는 테이블의 row 위치 정보가 담겨있다.
- ex) author가 aurthor54인 데이터 10,000개의 row의 위치정보가 담긴 bitmap 생성
Bitmap Heap Scan
- 앞서 만든 Bitmap을 스캔하면서 우리가 찾고자 하는 데이터를 추출해낸다
- ex) 앞서 만든 bitmap에서 id가 600000 미만인 값을 필터링(4000개개는 버리기)하여 총 6000개의 데이터를 추출하기

물론 범위가 바뀌면 DB엔진이 Bitmap Index Scan 대신 Index Scan 방식을 사용하기도 한다.

TID Scan

postgres=# explain analyze select id from post where ctid = '(1, 1)';
                                          QUERY PLAN

----------------------------------------------------------------------------------------
------
 Tid Scan on post  (cost=0.00..4.01 rows=1 width=4) (actual time=0.014..0.015 rows=1 loo
ps=1)
   TID Cond: (ctid = '(1,1)'::tid)
 Planning Time: 0.283 ms
 Execution Time: 0.059 ms
(4 rows)

참고

https://seungtaek-overflow.tistory.com/5
https://www.cybertec-postgresql.com/en/postgresql-indexing-index-scan-vs-bitmap-scan-vs-sequential-scan-basics/

728x90

저작자표시 (새창열림)

'Research > Database' 카테고리의 다른 글

postgres_row count 성능 개선 방법 (0)	2023.03.13
Postgres_index로 가지고 놀기 (0)	2023.03.09
db_Redis 클라우드 사용하는 방법 (0)	2023.03.01
nestjs_cache-manager로 redis 사용하기 (0)	2023.02.27
DB_Redis 설치 및 명령어 Cheatsheet(Mac OS) (0)	2023.02.27

Programmer Way

Postgres_쿼리 실행 계획 분석

쿼리 실행 계획 분석하는 이유

테스트 준비

Query Plan이란

Table Scan

Sequential Scan

Index Scan

Index Only Scan

Bitmap Scan

TID Scan

참고

'Research > Database' 카테고리의 다른 글

댓글

티스토리툴바

Postgres_쿼리 실행 계획 분석

쿼리 실행 계획 분석하는 이유

테스트 준비

Query Plan이란

Table Scan

Sequential Scan

Index Scan

Index Only Scan

Bitmap Scan

TID Scan

참고

'Research > Database' 카테고리의 다른 글

관련글

댓글

티스토리툴바