Video segmentation with Python using SAM2! - 파이썬 SAM2 실습 : 비디오에서 누끼따기!

Posted May 29, 2025

By DrFirst

6 min read

🦖(ENGLISH) Python SAM2 Practice: Background Removal from Video!

In this post, we’ll walk through a practical use case of SAM2, the successor to Meta’s Segment Anything Model (SAM).
SAM2 is already supported by the popular ultralytics library, making it incredibly simple to use!
That means — anyone can try it, easily and quickly! 🚀

🧱 1. Installing `ultralytics`

Already installed? Then skip this step!

pip install ultralytics

📦 2. Download & Load the Model

  
from ultralytics import SAM
import cv2
import matplotlib.pyplot as plt
import numpy as np

# Load the SAM2 model
model = SAM("sam2.1_b.pt")

We also loaded a few additional packages for visualization.

🧊 3. Run Image Segmentation!

Let’s segment a dog image using a bounding box prompt:

  
img_name = "dog.jpg"

my_bboxes = [1430.2, 828, 4471.9, 3836.4]
# Run inference with a box prompt [x_min, y_min, x_max, y_max]
results = model(img_name, bboxes=my_bboxes)

# Load the original image (for visualization)
image = cv2.imread(img_name)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# Show results
plt.figure(figsize=(10, 10))
plt.imshow(image_rgb)

# Overlay mask
for result in results:
    if result.masks is not None:
        mask = result.masks.data[0].cpu().numpy()
        plt.imshow(mask, alpha=0.5, cmap='jet')

# Draw bounding box
rect = plt.Rectangle((my_bboxes[0], my_bboxes[1]), my_bboxes[2] - my_bboxes[0], my_bboxes[3] - my_bboxes[1],
                     linewidth=2, edgecolor='red', facecolor='none', label=f'my_bboxes {my_bboxes}')
plt.gca().add_patch(rect)

# Final touches
plt.title(f"SAM2 Segmentation with Box Prompt on {img_name}")
plt.legend()
plt.axis('off')
plt.show()

# Optional: Print extra info
print("Segmentation Result:")
print(f"Number of masks: {len(results[0].masks.data)}")
print(f"Mask shape: {results[0].masks.data[0].shape}")

Pretty impressive, right?
But to be fair — this is something even SAM1 handled quite well 😎

🚀 4. Run Video Segmentation!

Now let’s try something SAM1 couldn’t do: video segmentation — a core strength of SAM2!

I used a highway CCTV video and provided a single point prompt at (405, 205) in the first frame where a car is visible:

  
from ultralytics.models.sam import SAM2VideoPredictor

# Create SAM2VideoPredictor
overrides = dict(conf=0.25, task="segment", mode="predict", imgsz=1024, model="sam2_b.pt")
predictor = SAM2VideoPredictor(overrides=overrides)

# Run inference with a single point
results = predictor(source="street.mp4", points=[405, 205], labels=[1])

I can’t upload videos here, but as shown in the screenshots below,
SAM2 was able to segment the car only while it was visible — and stopped at the right moment. Very smart!

🎉 Final Thoughts

I was genuinely impressed not only by SAM2’s ability to segment in videos,
but also by how well it tracked objects over time with minimal input.

Highly recommend trying this out — especially for any video segmentation tasks!

🦖(한국어) 파이썬 SAM2 실습 : 비디오에서 누끼따기!

이번 포스팅은 SAM의 후속 모델인 SAM2 실습입니다!
SAM2는 유명한 모델이어서 그런지, ultralytics 의 패키지로 쉽게 사용할수 있었습니다!! 그래서 모두 간단히, 쉽게 사용가능합니다!!!

🧱 1. ultralytics 설치

이미 설치된 분이라면 스킵쓰!
1 pip install ultralytics

📦 2. 모델 다운 및 로드!

  
from ultralytics import SAM
import cv2
import matplotlib.pyplot as plt
import numpy as np

# 모델 로드
model = SAM("sam2.1_b.pt")

시각화를 위해 그외 패키지도 로드해주었습니다!

🧊 3. 이미지 Segment 실행!!

기존 사용했던 강아지 이미지를, bbox와 함께 segment해보았습니다!!

  
img_name = "dog.jpg"

my_bboxes=[1430.2,   828,  4471.9, 3836.4]
# 박스 프롬프트로 추론 ([x_min, y_min, x_max, y_max])
results = model(img_name, bboxes=my_bboxes)

# 원본 이미지 로드 (시각화를 위해)
image = cv2.imread(img_name)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  # BGR -> RGB 변환

# 결과 시각화
plt.figure(figsize=(10, 10))
plt.imshow(image_rgb)

# 마스크 오버레이
for result in results:
    if result.masks is not None:
        mask = result.masks.data[0].cpu().numpy()  # 첫 번째 마스크 추출
        plt.imshow(mask, alpha=0.5, cmap='jet')  # 마스크를 반투명하게 표시

# 박스 프롬프트 표시
rect = plt.Rectangle((my_bboxes[0], my_bboxes[1]), my_bboxes[2] - my_bboxes[0], my_bboxes[3] - my_bboxes[1], 
                     linewidth=2, edgecolor='red', facecolor='none', label=f'my_bboxes {my_bboxes}')
plt.gca().add_patch(rect)

# 제목 및 설정
plt.title(f"SAM2 Segmentation with Box Prompt on {img_name}")
plt.legend()
plt.axis('off')
plt.show()

# 추가 정보 출력 (선택 사항)
print("Segmentation Result:")
print(f"Number of masks: {len(results[0].masks.data)}")
print(f"Mask shape: {results[0].masks.data[0].shape}")

참 잘되죠~ 그런데 이건 SAM도 잘하건건데!?

🚀 4. 비디오 Segment 실행!!

그래서, 이번엔 SAM2의 특징인!
비디오의 segment도 진행해보았습니다!

저는 고속도로의 CCTV영상을 바탕으로 진행했구요! 첫 프래임에서 차가 있는 곳의 위치(405,205)를 프롬포트로 제공했습니다!

  
from ultralytics.models.sam import SAM2VideoPredictor

# Create SAM2VideoPredictor
overrides = dict(conf=0.25, task="segment", mode="predict", imgsz=1024, model="sam2_b.pt")
predictor = SAM2VideoPredictor(overrides=overrides)

# Run inference with single point
results = predictor(source="street.mp4", points=[405, 205], labels=[1])

동영상을 올릴순 없지만!!
아래 스크린샷같이 차가 사라지는 시점까지만 딱!!segment를 정말 잘하더라구요!!

🎉 마무리

동영상의 segmentation에 더하여, 저는 Tracking이 이렇게 잘된다는것이 너무 인상적이었습니다!

AI, Experiment

This post is licensed under CC BY 4.0 by the author.