Post

๐Ÿ–ฅ๏ธ Video segmentation with Python using SAM2! - ํŒŒ์ด์ฌ SAM2 ์‹ค์Šต : ๋น„๋””์˜ค์—์„œ ๋ˆ„๋ผ๋”ฐ๊ธฐ!

๐Ÿ–ฅ๏ธ Video segmentation with Python using SAM2! - ํŒŒ์ด์ฌ SAM2 ์‹ค์Šต : ๋น„๋””์˜ค์—์„œ ๋ˆ„๋ผ๋”ฐ๊ธฐ!

๐Ÿฆ–(ENGLISH) Python SAM2 Practice: Background Removal from Video!

In this post, weโ€™ll walk through a practical use case of SAM2, the successor to Metaโ€™s Segment Anything Model (SAM).
SAM2 is already supported by the popular ultralytics library, making it incredibly simple to use!
That means โ€” anyone can try it, easily and quickly! ๐Ÿš€


๐Ÿงฑ 1. Installing ultralytics

  • Already installed? Then skip this step!
1
pip install ultralytics

๐Ÿ“ฆ 2. Download & Load the Model

1
2
3
4
5
6
7
from ultralytics import SAM
import cv2
import matplotlib.pyplot as plt
import numpy as np

# Load the SAM2 model
model = SAM("sam2.1_b.pt")

We also loaded a few additional packages for visualization.


๐ŸงŠ 3. Run Image Segmentation!

Letโ€™s segment a dog image using a bounding box prompt:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
img_name = "dog.jpg"

my_bboxes = [1430.2, 828, 4471.9, 3836.4]
# Run inference with a box prompt [x_min, y_min, x_max, y_max]
results = model(img_name, bboxes=my_bboxes)

# Load the original image (for visualization)
image = cv2.imread(img_name)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# Show results
plt.figure(figsize=(10, 10))
plt.imshow(image_rgb)

# Overlay mask
for result in results:
    if result.masks is not None:
        mask = result.masks.data[0].cpu().numpy()
        plt.imshow(mask, alpha=0.5, cmap='jet')

# Draw bounding box
rect = plt.Rectangle((my_bboxes[0], my_bboxes[1]), my_bboxes[2] - my_bboxes[0], my_bboxes[3] - my_bboxes[1],
                     linewidth=2, edgecolor='red', facecolor='none', label=f'my_bboxes {my_bboxes}')
plt.gca().add_patch(rect)

# Final touches
plt.title(f"SAM2 Segmentation with Box Prompt on {img_name}")
plt.legend()
plt.axis('off')
plt.show()

# Optional: Print extra info
print("Segmentation Result:")
print(f"Number of masks: {len(results[0].masks.data)}")
print(f"Mask shape: {results[0].masks.data[0].shape}")

sam2_dog

Pretty impressive, right?
But to be fair โ€” this is something even SAM1 handled quite well ๐Ÿ˜Ž


๐Ÿš€ 4. Run Video Segmentation!

Now letโ€™s try something SAM1 couldnโ€™t do: video segmentation โ€” a core strength of SAM2!

I used a highway CCTV video and provided a single point prompt at (405, 205) in the first frame where a car is visible:

1
2
3
4
5
6
7
8
from ultralytics.models.sam import SAM2VideoPredictor

# Create SAM2VideoPredictor
overrides = dict(conf=0.25, task="segment", mode="predict", imgsz=1024, model="sam2_b.pt")
predictor = SAM2VideoPredictor(overrides=overrides)

# Run inference with a single point
results = predictor(source="street.mp4", points=[405, 205], labels=[1])

I canโ€™t upload videos here, but as shown in the screenshots below,
SAM2 was able to segment the car only while it was visible โ€” and stopped at the right moment. Very smart!

Image
Image
Image


๐ŸŽ‰ Final Thoughts

I was genuinely impressed not only by SAM2โ€™s ability to segment in videos,
but also by how well it tracked objects over time with minimal input.

Highly recommend trying this out โ€” especially for any video segmentation tasks!


๐Ÿฆ–(ํ•œ๊ตญ์–ด) ํŒŒ์ด์ฌ SAM2 ์‹ค์Šต : ๋น„๋””์˜ค์—์„œ ๋ˆ„๋ผ๋”ฐ๊ธฐ!

์ด๋ฒˆ ํฌ์ŠคํŒ…์€ SAM์˜ ํ›„์† ๋ชจ๋ธ์ธ SAM2 ์‹ค์Šต์ž…๋‹ˆ๋‹ค!
SAM2๋Š” ์œ ๋ช…ํ•œ ๋ชจ๋ธ์ด์–ด์„œ ๊ทธ๋Ÿฐ์ง€, ultralytics ์˜ ํŒจํ‚ค์ง€๋กœ ์‰ฝ๊ฒŒ ์‚ฌ์šฉํ• ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค!! ๊ทธ๋ž˜์„œ ๋ชจ๋‘ ๊ฐ„๋‹จํžˆ, ์‰ฝ๊ฒŒ ์‚ฌ์šฉ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค!!!


๐Ÿงฑ 1. ultralytics ์„ค์น˜

  • ์ด๋ฏธ ์„ค์น˜๋œ ๋ถ„์ด๋ผ๋ฉด ์Šคํ‚ต์“ฐ!
    1
    
    pip install ultralytics
    

๐Ÿ“ฆ 2. ๋ชจ๋ธ ๋‹ค์šด ๋ฐ ๋กœ๋“œ!

1
2
3
4
5
6
7
8
from ultralytics import SAM
import cv2
import matplotlib.pyplot as plt
import numpy as np

# ๋ชจ๋ธ ๋กœ๋“œ
model = SAM("sam2.1_b.pt")

์‹œ๊ฐํ™”๋ฅผ ์œ„ํ•ด ๊ทธ์™ธ ํŒจํ‚ค์ง€๋„ ๋กœ๋“œํ•ด์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค!


๐ŸงŠ 3. ์ด๋ฏธ์ง€ Segment ์‹คํ–‰!!

๊ธฐ์กด ์‚ฌ์šฉํ–ˆ๋˜ ๊ฐ•์•„์ง€ ์ด๋ฏธ์ง€๋ฅผ, bbox์™€ ํ•จ๊ป˜ segmentํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค!!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
img_name = "dog.jpg"

my_bboxes=[1430.2,   828,  4471.9, 3836.4]
# ๋ฐ•์Šค ํ”„๋กฌํ”„ํŠธ๋กœ ์ถ”๋ก  ([x_min, y_min, x_max, y_max])
results = model(img_name, bboxes=my_bboxes)

# ์›๋ณธ ์ด๋ฏธ์ง€ ๋กœ๋“œ (์‹œ๊ฐํ™”๋ฅผ ์œ„ํ•ด)
image = cv2.imread(img_name)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  # BGR -> RGB ๋ณ€ํ™˜

# ๊ฒฐ๊ณผ ์‹œ๊ฐํ™”
plt.figure(figsize=(10, 10))
plt.imshow(image_rgb)

# ๋งˆ์Šคํฌ ์˜ค๋ฒ„๋ ˆ์ด
for result in results:
    if result.masks is not None:
        mask = result.masks.data[0].cpu().numpy()  # ์ฒซ ๋ฒˆ์งธ ๋งˆ์Šคํฌ ์ถ”์ถœ
        plt.imshow(mask, alpha=0.5, cmap='jet')  # ๋งˆ์Šคํฌ๋ฅผ ๋ฐ˜ํˆฌ๋ช…ํ•˜๊ฒŒ ํ‘œ์‹œ

# ๋ฐ•์Šค ํ”„๋กฌํ”„ํŠธ ํ‘œ์‹œ
rect = plt.Rectangle((my_bboxes[0], my_bboxes[1]), my_bboxes[2] - my_bboxes[0], my_bboxes[3] - my_bboxes[1], 
                     linewidth=2, edgecolor='red', facecolor='none', label=f'my_bboxes {my_bboxes}')
plt.gca().add_patch(rect)

# ์ œ๋ชฉ ๋ฐ ์„ค์ •
plt.title(f"SAM2 Segmentation with Box Prompt on {img_name}")
plt.legend()
plt.axis('off')
plt.show()

# ์ถ”๊ฐ€ ์ •๋ณด ์ถœ๋ ฅ (์„ ํƒ ์‚ฌํ•ญ)
print("Segmentation Result:")
print(f"Number of masks: {len(results[0].masks.data)}")
print(f"Mask shape: {results[0].masks.data[0].shape}")

sam2_dog

์ฐธ ์ž˜๋˜์ฃ ~ ๊ทธ๋Ÿฐ๋ฐ ์ด๊ฑด SAM๋„ ์ž˜ํ•˜๊ฑด๊ฑด๋ฐ!?


๐Ÿš€ 4. ๋น„๋””์˜ค Segment ์‹คํ–‰!!

๊ทธ๋ž˜์„œ, ์ด๋ฒˆ์—” SAM2์˜ ํŠน์ง•์ธ!
๋น„๋””์˜ค์˜ segment๋„ ์ง„ํ–‰ํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค!

์ €๋Š” ๊ณ ์†๋„๋กœ์˜ CCTV์˜์ƒ์„ ๋ฐ”ํƒ•์œผ๋กœ ์ง„ํ–‰ํ–ˆ๊ตฌ์š”! ์ฒซ ํ”„๋ž˜์ž„์—์„œ ์ฐจ๊ฐ€ ์žˆ๋Š” ๊ณณ์˜ ์œ„์น˜(405,205)๋ฅผ ํ”„๋กฌํฌํŠธ๋กœ ์ œ๊ณตํ–ˆ์Šต๋‹ˆ๋‹ค!

1
2
3
4
5
6
7
8
from ultralytics.models.sam import SAM2VideoPredictor

# Create SAM2VideoPredictor
overrides = dict(conf=0.25, task="segment", mode="predict", imgsz=1024, model="sam2_b.pt")
predictor = SAM2VideoPredictor(overrides=overrides)

# Run inference with single point
results = predictor(source="street.mp4", points=[405, 205], labels=[1])

๋™์˜์ƒ์„ ์˜ฌ๋ฆด์ˆœ ์—†์ง€๋งŒ!!
์•„๋ž˜ ์Šคํฌ๋ฆฐ์ƒท๊ฐ™์ด ์ฐจ๊ฐ€ ์‚ฌ๋ผ์ง€๋Š” ์‹œ์ ๊นŒ์ง€๋งŒ ๋”ฑ!!segment๋ฅผ ์ •๋ง ์ž˜ํ•˜๋”๋ผ๊ตฌ์š”!!

Image Image Image


๐ŸŽ‰ ๋งˆ๋ฌด๋ฆฌ

๋™์˜์ƒ์˜ segmentation์— ๋”ํ•˜์—ฌ, ์ €๋Š” Tracking์ด ์ด๋ ‡๊ฒŒ ์ž˜๋œ๋‹ค๋Š”๊ฒƒ์ด ๋„ˆ๋ฌด ์ธ์ƒ์ ์ด์—ˆ์Šต๋‹ˆ๋‹ค!

This post is licensed under CC BY 4.0 by the author.