๐ฅ๏ธ Video segmentation with Python using SAM2! - ํ์ด์ฌ SAM2 ์ค์ต : ๋น๋์ค์์ ๋๋ผ๋ฐ๊ธฐ!
๐ฆ(ENGLISH) Python SAM2 Practice: Background Removal from Video!
In this post, weโll walk through a practical use case of SAM2, the successor to Metaโs Segment Anything Model (SAM).
SAM2 is already supported by the popular ultralytics
library, making it incredibly simple to use!
That means โ anyone can try it, easily and quickly! ๐
๐งฑ 1. Installing ultralytics
- Already installed? Then skip this step!
1
pip install ultralytics
๐ฆ 2. Download & Load the Model
1
2
3
4
5
6
7
from ultralytics import SAM
import cv2
import matplotlib.pyplot as plt
import numpy as np
# Load the SAM2 model
model = SAM("sam2.1_b.pt")
We also loaded a few additional packages for visualization.
๐ง 3. Run Image Segmentation!
Letโs segment a dog image using a bounding box prompt:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
img_name = "dog.jpg"
my_bboxes = [1430.2, 828, 4471.9, 3836.4]
# Run inference with a box prompt [x_min, y_min, x_max, y_max]
results = model(img_name, bboxes=my_bboxes)
# Load the original image (for visualization)
image = cv2.imread(img_name)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Show results
plt.figure(figsize=(10, 10))
plt.imshow(image_rgb)
# Overlay mask
for result in results:
if result.masks is not None:
mask = result.masks.data[0].cpu().numpy()
plt.imshow(mask, alpha=0.5, cmap='jet')
# Draw bounding box
rect = plt.Rectangle((my_bboxes[0], my_bboxes[1]), my_bboxes[2] - my_bboxes[0], my_bboxes[3] - my_bboxes[1],
linewidth=2, edgecolor='red', facecolor='none', label=f'my_bboxes {my_bboxes}')
plt.gca().add_patch(rect)
# Final touches
plt.title(f"SAM2 Segmentation with Box Prompt on {img_name}")
plt.legend()
plt.axis('off')
plt.show()
# Optional: Print extra info
print("Segmentation Result:")
print(f"Number of masks: {len(results[0].masks.data)}")
print(f"Mask shape: {results[0].masks.data[0].shape}")
Pretty impressive, right?
But to be fair โ this is something even SAM1 handled quite well ๐
๐ 4. Run Video Segmentation!
Now letโs try something SAM1 couldnโt do: video segmentation โ a core strength of SAM2!
I used a highway CCTV video and provided a single point prompt at (405, 205) in the first frame where a car is visible:
1
2
3
4
5
6
7
8
from ultralytics.models.sam import SAM2VideoPredictor
# Create SAM2VideoPredictor
overrides = dict(conf=0.25, task="segment", mode="predict", imgsz=1024, model="sam2_b.pt")
predictor = SAM2VideoPredictor(overrides=overrides)
# Run inference with a single point
results = predictor(source="street.mp4", points=[405, 205], labels=[1])
I canโt upload videos here, but as shown in the screenshots below,
SAM2 was able to segment the car only while it was visible โ and stopped at the right moment. Very smart!
๐ Final Thoughts
I was genuinely impressed not only by SAM2โs ability to segment in videos,
but also by how well it tracked objects over time with minimal input.
Highly recommend trying this out โ especially for any video segmentation tasks!
๐ฆ(ํ๊ตญ์ด) ํ์ด์ฌ SAM2 ์ค์ต : ๋น๋์ค์์ ๋๋ผ๋ฐ๊ธฐ!
์ด๋ฒ ํฌ์คํ
์ SAM์ ํ์ ๋ชจ๋ธ์ธ SAM2 ์ค์ต์
๋๋ค!
SAM2๋ ์ ๋ช
ํ ๋ชจ๋ธ์ด์ด์ ๊ทธ๋ฐ์ง, ultralytics ์ ํจํค์ง๋ก ์ฝ๊ฒ ์ฌ์ฉํ ์ ์์์ต๋๋ค!! ๊ทธ๋์ ๋ชจ๋ ๊ฐ๋จํ, ์ฝ๊ฒ ์ฌ์ฉ๊ฐ๋ฅํฉ๋๋ค!!!
๐งฑ 1. ultralytics ์ค์น
- ์ด๋ฏธ ์ค์น๋ ๋ถ์ด๋ผ๋ฉด ์คํต์ฐ!
1
pip install ultralytics
๐ฆ 2. ๋ชจ๋ธ ๋ค์ด ๋ฐ ๋ก๋!
1
2
3
4
5
6
7
8
from ultralytics import SAM
import cv2
import matplotlib.pyplot as plt
import numpy as np
# ๋ชจ๋ธ ๋ก๋
model = SAM("sam2.1_b.pt")
์๊ฐํ๋ฅผ ์ํด ๊ทธ์ธ ํจํค์ง๋ ๋ก๋ํด์ฃผ์์ต๋๋ค!
๐ง 3. ์ด๋ฏธ์ง Segment ์คํ!!
๊ธฐ์กด ์ฌ์ฉํ๋ ๊ฐ์์ง ์ด๋ฏธ์ง๋ฅผ, bbox์ ํจ๊ป segmentํด๋ณด์์ต๋๋ค!!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
img_name = "dog.jpg"
my_bboxes=[1430.2, 828, 4471.9, 3836.4]
# ๋ฐ์ค ํ๋กฌํํธ๋ก ์ถ๋ก ([x_min, y_min, x_max, y_max])
results = model(img_name, bboxes=my_bboxes)
# ์๋ณธ ์ด๋ฏธ์ง ๋ก๋ (์๊ฐํ๋ฅผ ์ํด)
image = cv2.imread(img_name)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) # BGR -> RGB ๋ณํ
# ๊ฒฐ๊ณผ ์๊ฐํ
plt.figure(figsize=(10, 10))
plt.imshow(image_rgb)
# ๋ง์คํฌ ์ค๋ฒ๋ ์ด
for result in results:
if result.masks is not None:
mask = result.masks.data[0].cpu().numpy() # ์ฒซ ๋ฒ์งธ ๋ง์คํฌ ์ถ์ถ
plt.imshow(mask, alpha=0.5, cmap='jet') # ๋ง์คํฌ๋ฅผ ๋ฐํฌ๋ช
ํ๊ฒ ํ์
# ๋ฐ์ค ํ๋กฌํํธ ํ์
rect = plt.Rectangle((my_bboxes[0], my_bboxes[1]), my_bboxes[2] - my_bboxes[0], my_bboxes[3] - my_bboxes[1],
linewidth=2, edgecolor='red', facecolor='none', label=f'my_bboxes {my_bboxes}')
plt.gca().add_patch(rect)
# ์ ๋ชฉ ๋ฐ ์ค์
plt.title(f"SAM2 Segmentation with Box Prompt on {img_name}")
plt.legend()
plt.axis('off')
plt.show()
# ์ถ๊ฐ ์ ๋ณด ์ถ๋ ฅ (์ ํ ์ฌํญ)
print("Segmentation Result:")
print(f"Number of masks: {len(results[0].masks.data)}")
print(f"Mask shape: {results[0].masks.data[0].shape}")
์ฐธ ์๋์ฃ ~ ๊ทธ๋ฐ๋ฐ ์ด๊ฑด SAM๋ ์ํ๊ฑด๊ฑด๋ฐ!?
๐ 4. ๋น๋์ค Segment ์คํ!!
๊ทธ๋์, ์ด๋ฒ์ SAM2์ ํน์ง์ธ!
๋น๋์ค์ segment๋ ์งํํด๋ณด์์ต๋๋ค!
์ ๋ ๊ณ ์๋๋ก์ CCTV์์์ ๋ฐํ์ผ๋ก ์งํํ๊ตฌ์! ์ฒซ ํ๋์์์ ์ฐจ๊ฐ ์๋ ๊ณณ์ ์์น(405,205)๋ฅผ ํ๋กฌํฌํธ๋ก ์ ๊ณตํ์ต๋๋ค!
1
2
3
4
5
6
7
8
from ultralytics.models.sam import SAM2VideoPredictor
# Create SAM2VideoPredictor
overrides = dict(conf=0.25, task="segment", mode="predict", imgsz=1024, model="sam2_b.pt")
predictor = SAM2VideoPredictor(overrides=overrides)
# Run inference with single point
results = predictor(source="street.mp4", points=[405, 205], labels=[1])
๋์์์ ์ฌ๋ฆด์ ์์ง๋ง!!
์๋ ์คํฌ๋ฆฐ์ท๊ฐ์ด ์ฐจ๊ฐ ์ฌ๋ผ์ง๋ ์์ ๊น์ง๋ง ๋ฑ!!segment๋ฅผ ์ ๋ง ์ํ๋๋ผ๊ตฌ์!!
๐ ๋ง๋ฌด๋ฆฌ
๋์์์ segmentation์ ๋ํ์ฌ, ์ ๋ Tracking์ด ์ด๋ ๊ฒ ์๋๋ค๋๊ฒ์ด ๋๋ฌด ์ธ์์ ์ด์์ต๋๋ค!