[AWS] S3: Inventory

[์›๋ณธ ๋งํฌ]

S3๋Š” ํŽธํ•˜๊ธฐ๋„ ํ•˜์ง€๋งŒ ์‚ฌ์šฉ๋Ÿ‰์ด ๋Š˜๊ฒŒ ๋˜๋ฉด ์ƒ๋‹นํ•œ ๊ณจ์นซ๊ฑฐ๋ฆฌ๊ฐ€ ๋œ๋‹ค.
์‚ฌ์šฉ ํŒจํ„ด์„ ํ™•์ธํ•˜๊ณ  ๋น„์šฉ์„ ์ตœ์ ํ™”ํ•˜๋Š” ๊ฒƒ์ด ์ƒ๋‹นํžˆ ๊นŒ๋‹ค๋กญ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๋„๋Œ€์ฒด ๋ฌด์Šจ ํŒŒ์ผ๋“ค์ด ์šฉ๋Ÿ‰์„ ๋จน๊ณ  ์žˆ๋Š”์ง€, ์ง€์›Œ๋„ ๋˜๋Š”๊ฑด์ง€ ํ™•์ธํ•˜๊ธฐ๊ฐ€ ์ฐธ ๋‚œ๊ฐํ•˜๋‹ค.

S3 Inventory๋Š” ๊ทธ๋Ÿฐ ์ƒํ™ฉ์—์„œ ํŒŒ์ผ ๋ชฉ๋ก์„ ์ถ”์ ํ•˜๊ธฐ ์œ„ํ•œ ์ˆ˜๋‹จ์ด๋‹ค. ์–ด๋–ค ํŒŒ์ผ๋“ค์ด ์–ด๋–ค ๊ฒฝ๋กœ๋กœ ์žˆ๋Š”์ง€, ํฌ๊ธฐ๋Š” ์–ผ๋งˆ๋‚˜ ๋˜๋Š”์ง€ ๋“ฑ์„ ์ž๋™์œผ๋กœ ๋ฝ‘์•„์„œ ๋‹ค๋ฅธ S3 ๋ฒ„ํ‚ท์— ๋ณด๊ณ ์„œ๋กœ ์ •๋ฆฌํ•ด์ฃผ๋Š” ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•œ๋‹ค.
LIST API ์จ์„œ ์ง์ ‘ ๋ฐ›์•„์™€๋„ ๋˜๊ธด ํ•˜๋Š”๋ฐ, ๋„ˆ๋ฌด ์˜ค๋ž˜๊ฑธ๋ฆฌ๊ณ , ๋น„์šฉ๋„ ๋น„์šฉ๋Œ€๋กœ ๋“ค๊ณ , ๋ถˆํŽธํ•˜๋‹ค.




๋น„์šฉ

์ด๊ฒƒ๋„ ๊ณต์งœ๋Š” ์•„๋‹ˆ๋‹ค.
S3์— ์ €์žฅํ•˜๋Š” ๋น„์šฉ๋„ ๋น„์šฉ์ด๊ณ , ๊ฐœ์ˆ˜๋‹น ๋น„์šฉ์„ ๋˜ ๋ฐ›๋Š”๋‹ค.

https://aws.amazon.com/ko/s3/pricing/?nc=sn&loc=4


S3 ๋ฒ„ํ‚ท ์ง€ํ‘œ์— ๊ฐ€๋ณด๋ฉด ๊ฐœ์ˆ˜๊ฐ€ ๋œจ๋‹ˆ๊นŒ ๊ทธ๊ฑธ๋กœ ๋น„์šฉ์„ ์ถ”์‚ฐํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค.
์ด ๊ฒฝ์šฐ์—๋Š” 22.3 X 0.0028์ด๋‹ˆ๊นŒ 0.06๋‹ฌ๋Ÿฌ์ธ ์…ˆ์ด๋‹ค. ๋ง‰ ์—„์ฒญ ๋น„์‹ธ์ง€๋Š” ์•Š๋‹ค.




Inventory ์„ธํŒ…ํ•˜๊ธฐ

๋จผ์ € ์ €์žฅํ•  ๋ฒ„ํ‚ท์„ ๋งˆ๋ จํ•˜๋‹ค.
๊ธฐ์กด ๋ฒ„ํ‚ท์˜ ํ•˜์œ„ ๋””๋ ‰ํ„ฐ๋ฆฌ๋กœ ์ €์žฅํ•ด๋„ ๋˜๊ธด ํ•˜๋Š”๋ฐ, ๋ณ„๋„๋กœ ํŒŒ๋Š”๊ฑธ ๊ถŒํ•œ๋‹ค.

๊ทธ๋ฆฌ๊ณ  S3์˜ ๊ด€๋ฆฌ ํƒญ์œผ๋กœ ๋“ค์–ด๊ฐ€๋ฉด ์ธ๋ฒคํ† ๋ฆฌ ๊ตฌ์„ฑ ํ™”๋ฉด์œผ๋กœ ๋“ค์–ด๊ฐˆ ์ˆ˜ ์žˆ๋‹ค.

์ €์žฅ๋  ๋ฒ„ํ‚ท ๊ฒฝ๋กœ ์ง€์ •ํ•˜๊ณ 


์–ด๋–ค ํ˜•์‹์œผ๋กœ ์ €์žฅ๋  ์ง€๋ฅผ ์„ ํƒํ•  ์ˆ˜ ์žˆ๋‹ค.
์ง„์งœ ์—„์ฒญ ๋งŽ์•„์„œ ๋น„์šฉ์ด๋‚˜ ์ฝ๊ธฐ ์„ฑ๋Šฅ์ด ๋ถ€๋‹ด๋  ๊ฒƒ ๊ฐ™๋‹ค๋ฉด Parquet์œผ๋กœ ๋ฝ‘์œผ๋ฉด ๋˜๊ณ , ๋Œ€์ถฉ ํŽธํ•˜๊ฒŒ ๋ณด๊ณ  ์‹ถ์œผ๋ฉด CSV๋กœ ๋ฝ‘์œผ๋ฉด ๋œ๋‹ค.


์ถ”๊ฐ€ ํ•„๋“œ๋„ ์„ ํƒํ•  ์ˆ˜ ์žˆ๋‹ค.
์ด ์ •๋„๋งŒ ๋„ฃ์–ด๋„ ์ถฉ๋ถ„ํ•˜์ง€ ์‹ถ๋‹ค.


๊ทธ๋ ‡๊ฒŒ ๋งŒ๋“ค๊ณ  ๋‚˜๋ฉด... ์•„๋ฌด๋Ÿฐ ์ผ๋„ ์—†๋‹ค.
์‹ค์‹œ๊ฐ„ ์„œ๋น„์Šค๊ฐ€ ์•„๋‹ˆ๋ผ์„œ ๋ฐ”๋กœ ๋ญ๊ฐ€ ์ƒ๊ธฐ์ง„ ์•Š๊ณ , ์ตœ๋Œ€ 48์‹œ๊ฐ„ ์ด๋‚ด์— ๋ชจ์•„์„œ ๋˜์ ธ์ค„ ๊ฒƒ์ด๋‹ค.
๊ทผ๋ฐ ๋ง์ด 48์‹œ๊ฐ„์ด์ง€, ๋ณดํ†ต ๋ช‡์‹œ๊ฐ„ ์ด๋‚ด๋กœ ์ฃผ๊ธด ํ•˜๋”๋ผ.

์ด๋ ‡๊ฒŒ ๋ง์ด๋‹ค.

๊ทธ๋Ÿผ ์ €๊ฑธ ๋‹ค์šด๋ฐ›๋“ , athena๋ฅผ ์“ฐ๋“  ํ•ด์„œ ์ฝ์œผ๋ฉด ๋œ๋‹ค.
๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ๊ฑด ๊ทธ๋ƒฅ athena ๋ถ™์—ฌ์„œ ์ฟผ๋ฆฌ๋ฅผ ๋ฐ”๋กœ ๋‚ ๋ฆฌ๋Š” ๊ฒƒ์ด๋‹ค.

CREATE EXTERNAL TABLE s3_inventory (
  bucket                STRING,
  key                   STRING,
  size                  BIGINT,
  last_modified_date    STRING,
  storage_class         STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
STORED AS INPUTFORMAT  'org.apache.hadoop.mapred.TextInputFormat'
         OUTPUTFORMAT  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://<dest-bucket>/<dest-prefix>/<src-bucket>/<config-id>/data/'
TBLPROPERTIES ('skip.header.line.count'='0');

๊ทธ๋Ÿผ ์ด๋Ÿฐ ์‹์œผ๋กœ ์ž˜ ๋ฝ‘ํ˜€๋‚˜์˜ฌ ๊ฒƒ์ด๋‹ค.


SELECT
  split_part(key, '/', 1)                              AS top_dir,
  COUNT(*)                                             AS object_count,
  SUM(CAST(size AS BIGINT))                            AS total_bytes,
  ROUND(AVG(CAST(size AS BIGINT)), 0)                  AS avg_bytes,
  MIN(CAST(size AS BIGINT))                            AS min_bytes,
  MAX(CAST(size AS BIGINT))                            AS max_bytes,
  -- ์‚ฌ๋žŒ์ด ๋ณด๊ธฐ ์ข‹์€ ๋‹จ์œ„(GB/MB)
  ROUND(SUM(CAST(size AS BIGINT)) / 1024.0/1024/1024, 2) AS total_gb,
  ROUND(AVG(CAST(size AS BIGINT)) / 1024.0/1024, 2)      AS avg_mb,
  ROUND(MAX(CAST(size AS BIGINT)) / 1024.0/1024, 2)      AS max_mb
FROM s3_inventory
GROUP BY split_part(key, '/', 1)
ORDER BY total_bytes DESC;

์ตœ์ƒ์œ„ prefix ๊ธฐ์ค€์œผ๋กœ ์ง‘๊ณ„ํ•˜๋Š” ๋‹จ์ˆœํ•œ ์ฟผ๋ฆฌ๋‹ค. ์ด๋ ‡๊ฒŒ ์ซ“์•„๊ฐ€๋ฉด ๋œ๋‹ค.



์ฐธ์กฐ
https://docs.aws.amazon.com/ko_kr/AmazonS3/latest/userguide/storage-inventory.html
https://docs.aws.amazon.com/ko_kr/AmazonS3/latest/userguide/configure-inventory.html
https://dev.classmethod.jp/articles/extract_object_metadata_from_s3_bucket_using_s3_inventory/