🖥️ Image segmentation with Python using SAM! - 파이썬으로 누끼따기!? SAM (Segment Anything Model) 실습

Posted May 5, 2025

By DrFirst

17 min read

🧼 Image segmentation with Python? SAM in Action!

Today, let’s practice how to do Image segmentation using SAM (Segment Anything Model),
based on the theory we explored in the previous post!

We’ll use Ultralytics to run SAM easily,
instead of downloading and configuring the model from Meta or Hugging Face.

Let’s start by loading the SAM model using Ultralytics:

  
from ultralytics import SAM
import cv2
import matplotlib.pyplot as plt
import numpy as np

# Load model
model = SAM("sam_b.pt")
model.eval()

Once the model is loaded, the architecture consists mainly of:

image_encoder
prompt_encoder
mask_decoder

And more specifically, the structure looks like this:

# SAM (Segment Anything Model)
- model: SAMModel
  - image_encoder: ImageEncoderViT
    - patch_embed: PatchEmbed
      - proj: Conv2d(3, 768, kernel_size=16, stride=16)
    - blocks: ModuleList
      - Block[0-11] (12 Transformer Blocks)
        - norm1: LayerNorm(768)
        - attn: REAttention
          - qkv: Linear(768 → 2304)
          - proj: Linear(768 → 768)
        - norm2: LayerNorm(768)
        - mlp: MLPBlock
          - lin1: Linear(768 → 3072)
          - lin2: Linear(3072 → 768)
          - act: GELU
    - neck: Sequential
      - Conv2d(768 → 256, 1x1)
      - LayerNorm2d
      - Conv2d(256 → 256, 3x3, padding=1)
      - LayerNorm2d

  - prompt_encoder: PromptEncoder
    - pe_layer: PositionEmbeddingRandom
    - point_embeddings: ModuleList (4 x Embedding(1, 256))
    - not_a_point_embed: Embedding(1, 256)
    - no_mask_embed: Embedding(1, 256)
    - mask_downscaling: Sequential
      - Conv2d(1 → 4, kernel=2, stride=2)
      - LayerNorm2d
      - GELU
      - Conv2d(4 → 16, kernel=2, stride=2)
      - LayerNorm2d
      - GELU
      - Conv2d(16 → 256, kernel=1, stride=1)

  - mask_decoder: MaskDecoder
    - transformer: TwoWayTransformer
      - layers: ModuleList (2 x TwoWayAttentionBlock)
        - self_attn: Attention
        - cross_attn_token_to_image: Attention
        - cross_attn_image_to_token: Attention
        - norm1, norm2, norm3, norm4: LayerNorm(256)
        - mlp: MLPBlock
          - lin1: Linear(256 → 2048)
          - lin2: Linear(2048 → 256)
          - act: ReLU
      - final_attn_token_to_image: Attention
      - norm_final_attn: LayerNorm(256)
    - iou_token: Embedding(1, 256)
    - mask_tokens: Embedding(4, 256)
    - output_upscaling: Sequential
      - ConvTranspose2d(256 → 64, kernel=2, stride=2)
      - LayerNorm2d
      - GELU
      - ConvTranspose2d(64 → 32, kernel=2, stride=2)
      - GELU
    - output_hypernetworks_mlps: ModuleList (4 x MLP)
      - Linear(256 → 256)
      - Linear(256 → 256)
      - Linear(256 → 32)
    - iou_prediction_head: MLP
      - Linear(256 → 256)
      - Linear(256 → 256)
      - Linear(256 → 4)

Now let’s try out three different segmentation methods — point, bounding box, and fully automatic!

We’ll use the following image of a dog:

🟢 Prompt Type 1: Point-Based Segmentation

You specify a single point, and the model segments the object around it.

from ultralytics import SAM
import cv2
import matplotlib.pyplot as plt
import numpy as np

model = SAM("sam_b.pt")

img_name = "dog.jpg"
my_points = [3000, 2000]

results = model(img_name, points=[my_points], labels=[1])

image = cv2.imread(img_name)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

mask = None
for result in results:
    if result.masks is not None:
        mask = result.masks.data[0].cpu().numpy()
        break

if mask is None:
    raise ValueError("No mask found.")

def draw_point(img, point, color=(255, 0, 0), radius=15):
    img_with_point = img.copy()
    cv2.circle(img_with_point, tuple(point), radius, color, -1)
    return img_with_point

image_with_point = draw_point(image_rgb, my_points)

mask_visual = (mask * 255).astype(np.uint8)
colored_mask = cv2.applyColorMap(mask_visual, cv2.COLORMAP_JET)
combined = cv2.addWeighted(image_with_point, 0.7, colored_mask, 0.5, 0)

plt.figure(figsize=(20, 6))

plt.subplot(1, 3, 1)
plt.imshow(image_with_point)
plt.title("Original + Point")
plt.axis("off")

plt.subplot(1, 3, 2)
plt.imshow(mask, cmap='gray')
plt.title("Mask Only")
plt.axis("off")

plt.subplot(1, 3, 3)
plt.imshow(combined)
plt.title("Overlay: Image + Mask + Point")
plt.axis("off")

plt.tight_layout()
plt.show()

As shown above, the model extracts the dog’s head based on the red point!

🔴 Prompt Type 2: Bounding Box-Based Segmentation

You provide a rectangular bounding box, and the model segments the object inside that region.

from ultralytics import SAM
import cv2
import matplotlib.pyplot as plt
import numpy as np

model = SAM("sam_b.pt")

img_name = "dog.jpg"
my_bboxes = [1430.2, 828, 4471.9, 3836.4]

results = model(img_name, bboxes=my_bboxes)

image = cv2.imread(img_name)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

plt.figure(figsize=(10, 10))
plt.imshow(image_rgb)

for result in results:
    if result.masks is not None:
        mask = result.masks.data[0].cpu().numpy()
        plt.imshow(mask, alpha=0.5, cmap='jet')

rect = plt.Rectangle((my_bboxes[0], my_bboxes[1]),
                     my_bboxes[2] - my_bboxes[0],
                     my_bboxes[3] - my_bboxes[1],
                     linewidth=2, edgecolor='red', facecolor='none')
plt.gca().add_patch(rect)

plt.title(f"SAM Segmentation with Box Prompt on {img_name}")
plt.axis('off')
plt.show()

The red bounding box lets SAM extract the subject precisely within the box.

⚙️ Prompt Type 3: No Prompt (Automatic Mask Generation / AMG)

This time, we don’t provide any prompts at all — SAM automatically segments the image into multiple parts.

from ultralytics import SAM
import cv2
import matplotlib.pyplot as plt
import numpy as np

print("\n--- Method 3: Automatic Mask Generation (No Prompt) ---")

model_amg = SAM("sam_b.pt")
image_path = "dog.jpg"

image = cv2.imread(image_path)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

print(f"Running prompt-free segmentation on '{image_path}'...")
results_amg = model_amg(image_path)
print("Done.")

plt.figure(figsize=(12, 10))
plt.imshow(image_rgb)
plt.title("Method 3: SAM Automatic Mask Generation")
plt.axis('off')

if results_amg and results_amg[0].masks is not None:
    masks_amg = results_amg[0].masks.data.cpu().numpy()
    print(f"{len(masks_amg)} masks generated")

    np.random.seed(42)
    for i, mask in enumerate(masks_amg):
        color = np.concatenate([np.random.random(3) * 255, np.array([128])], axis=0)
        h, w = mask.shape[-2:]
        mask_image = mask.reshape(h, w, 1) * color.reshape(1, 1, -1)
        plt.imshow(mask_image.astype(np.uint8))
else:
    print("No masks found.")

plt.show()

The image is segmented into 33 distinct masks, each representing different parts of the dog or the background.

🎉 Conclusion

With just a few lines of Python, we’ve performed professional-level segmentation using SAM!
The model’s performance is truly impressive. Whether using points, boxes, or no prompt at all,
SAM is a powerful tool for background removal and object segmentation.

Let’s keep exploring!

(한국어) 파이썬으로 누끼따기!? SAM 실습

누끼(일본어: 抜き)는 주로 시각 디자인 현장에서 쓰이는 용어로,
원본 이미지의 피사체로부터 배경을 분리하기 위해 피사체의 외곽선을 따는 것을 ‘누끼’ 혹은 ‘누끼 따기’라고 지칭합니다!

오늘은 지난 포스팅 에서 알아본 SAM의 원리를 바탕으로 실습을 진행해보겠습니다!!
Python 코드로 누끼따기!!
SAM (Segment Anything Model)을 활용해서 실습해보아요~~ SAM 모델을 다운받는 법은 Meta 공식 사이트에서 다운받는 방법, Huggingface 모델 받는 방법 등이 가장 근본적인 방법이며,
오늘은 보다 간단하게 사용할수 있는 Ultralytics 를 활용해서쉽게 SAM을 진행해보겠습니다!

아래와 같이 ultralytics로 부터 SAM 모델을 불러옵니다~~

  
from ultralytics import SAM
import cv2
import matplotlib.pyplot as plt
import numpy as np

# 모델 로드
model = SAM("sam_b.pt")
model.eval()

로드된 모델구조를 분석해보면!!
간단히는 image_encoder / prompt_encoder / mask_decoder 로,
세부적으로는는 아래와 같았습니다~!

# SAM (Segment Anything Model)
- model: SAMModel
  - image_encoder: ImageEncoderViT
    - patch_embed: PatchEmbed
      - proj: Conv2d(3, 768, kernel_size=16, stride=16)
    - blocks: ModuleList
      - Block[0-11] (12개의 Transformer Blocks)
        - norm1: LayerNorm(768)
        - attn: REAttention
          - qkv: Linear(768 → 2304)
          - proj: Linear(768 → 768)
        - norm2: LayerNorm(768)
        - mlp: MLPBlock
          - lin1: Linear(768 → 3072)
          - lin2: Linear(3072 → 768)
          - act: GELU
    - neck: Sequential
      - Conv2d(768 → 256, 1x1)
      - LayerNorm2d
      - Conv2d(256 → 256, 3x3, padding=1)
      - LayerNorm2d

  - prompt_encoder: PromptEncoder
    - pe_layer: PositionEmbeddingRandom
    - point_embeddings: ModuleList (4 x Embedding(1, 256))
    - not_a_point_embed: Embedding(1, 256)
    - no_mask_embed: Embedding(1, 256)
    - mask_downscaling: Sequential
      - Conv2d(1 → 4, kernel=2, stride=2)
      - LayerNorm2d
      - GELU
      - Conv2d(4 → 16, kernel=2, stride=2)
      - LayerNorm2d
      - GELU
      - Conv2d(16 → 256, kernel=1, stride=1)

  - mask_decoder: MaskDecoder
    - transformer: TwoWayTransformer
      - layers: ModuleList (2 x TwoWayAttentionBlock)
        - self_attn: Attention
        - cross_attn_token_to_image: Attention
        - cross_attn_image_to_token: Attention
        - norm1, norm2, norm3, norm4: LayerNorm(256)
        - mlp: MLPBlock
          - lin1: Linear(256 → 2048)
          - lin2: Linear(2048 → 256)
          - act: ReLU
      - final_attn_token_to_image: Attention
      - norm_final_attn: LayerNorm(256)
    - iou_token: Embedding(1, 256)
    - mask_tokens: Embedding(4, 256)
    - output_upscaling: Sequential
      - ConvTranspose2d(256 → 64, kernel=2, stride=2)
      - LayerNorm2d
      - GELU
      - ConvTranspose2d(64 → 32, kernel=2, stride=2)
      - GELU
    - output_hypernetworks_mlps: ModuleList (4 x MLP)
      - Linear(256 → 256)
      - Linear(256 → 256)
      - Linear(256 → 32)
    - iou_prediction_head: MLP
      - Linear(256 → 256)
      - Linear(256 → 256)
      - Linear(256 → 4)

이제~!!
점,bounding box, 아무것도 없이, 세가지 방식으로 진행해보겠습니다!!!

진행에는 아래의 강아지 이미지를 바탕으로 고고고!!

프롬포트1. Point 기반!!

Point 기반은 하나의 점을 제시하고,
그 점을 둘러싼 부분의 Segment 를 추출합니다!

  
from ultralytics import SAM
import cv2
import matplotlib.pyplot as plt
import numpy as np

# 모델 로드
model = SAM("sam_b.pt")

# 이미지 및 포인트 정의
img_name = "dog.jpg"
my_points = [3000, 2000]

# 세그멘테이션 수행
results = model(img_name, points=[my_points], labels=[1])

# 이미지 로드 및 변환
image = cv2.imread(img_name)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# 마스크 추출
mask = None
for result in results:
    if result.masks is not None:
        mask = result.masks.data[0].cpu().numpy()
        break

if mask is None:
    raise ValueError("마스크가 없습니다.")

# 빨간 점 그리기 함수
def draw_point(img, point, color=(255, 0, 0), radius=15):
    img_with_point = img.copy()
    cv2.circle(img_with_point, tuple(point), radius, color, -1)
    return img_with_point

# 이미지 + 점
image_with_point = draw_point(image_rgb, my_points)

# 마스크만 시각화용으로 변환
mask_visual = (mask * 255).astype(np.uint8)

# 마스크 컬러맵 오버레이 생성
colored_mask = cv2.applyColorMap(mask_visual, cv2.COLORMAP_JET)
combined = cv2.addWeighted(image_with_point, 0.7, colored_mask, 0.5, 0)

# 시각화
plt.figure(figsize=(20, 6))

# 1. 원본 이미지 + 점
plt.subplot(1, 3, 1)
plt.imshow(image_with_point)
plt.title("Original + Point")
plt.axis("off")

# 2. 마스크만
plt.subplot(1, 3, 2)
plt.imshow(mask, cmap='gray')
plt.title("Mask Only")
plt.axis("off")

# 3. 마스크 + 원본 이미지 + 점
plt.subplot(1, 3, 3)
plt.imshow(combined)
plt.title("Overlay: Image + Mask + Point")
plt.axis("off")

plt.tight_layout()
plt.show()

# 결과 출력
print("Segmentation Result:")
print(f"Number of masks: {len(results[0].masks.data)}")
print(f"Mask shape: {results[0].masks.data[0].shape}")

그 결과!!
첫번쨰 이미지와 같이 빨간점이 있는것을 중심으로 강아지 segment 를 실시하면!
두번째 이미지와 같이 마스크를 추출하게 되고,
이 두 이미지를 겹처보면 세번쨰와 같이 강아지의 얼굴 부분만을! 추출하게 됩니다~!

프롬포트2. Bounding Box 기반!!

Bounding Box 기반은 네 개의 점으로 bouding box를 제시하고,
그 박스 내의 부분의 Segment 를 추출합니다!

  
from ultralytics import SAM
import cv2
import matplotlib.pyplot as plt
import numpy as np

# 모델 로드
model = SAM("sam_b.pt")

img_name = "dog.jpg"

my_bboxes=[1430.2,   828,  4471.9, 3836.4]
# 박스 프롬프트로 추론 ([x_min, y_min, x_max, y_max])
results = model(img_name, bboxes=my_bboxes)

# 원본 이미지 로드 (시각화를 위해)
image = cv2.imread(img_name)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  # BGR -> RGB 변환

# 결과 시각화
plt.figure(figsize=(10, 10))
plt.imshow(image_rgb)

# 마스크 오버레이
for result in results:
    if result.masks is not None:
        mask = result.masks.data[0].cpu().numpy()  # 첫 번째 마스크 추출
        plt.imshow(mask, alpha=0.5, cmap='jet')  # 마스크를 반투명하게 표시

# 박스 프롬프트 표시
rect = plt.Rectangle((my_bboxes[0], my_bboxes[1]), my_bboxes[2] - my_bboxes[0], my_bboxes[3] - my_bboxes[1], 
                     linewidth=2, edgecolor='red', facecolor='none', label=f'my_bboxes {my_bboxes}')
plt.gca().add_patch(rect)

# 제목 및 설정
plt.title(f"SAM Segmentation with Box Prompt on {img_name}")
plt.legend()
plt.axis('off')
plt.show()

# 추가 정보 출력 (선택 사항)
print("Segmentation Result:")
print(f"Number of masks: {len(results[0].masks.data)}")
print(f"Mask shape: {results[0].masks.data[0].shape}")

그 결과!!
빨간 Bbox(bounding box) 내에서 강아지 이미지를 추출해줍니다~!
참 쉽지요~?
두번째 이미지와 같이 마스크를 추출하게 되고,
이 두 이미지를 겹처보면 세번쨰와 같이 강아지의 얼굴 부분만을! 추출하게 됩니다~!

프롬포트3. 자율로 맡기기!! AMG 방식 기반!!

이번에는 별도의 프롬포트 없이 이미지를 세세하게 나누게됩니다!!

  
from ultralytics import SAM
import cv2
import matplotlib.pyplot as plt
import numpy as np

print("\n--- 방법 3: Automatic Mask Generation (AMG) 방식 (프롬프트 없음) ---")

# 모델 로드 (동일 모델 사용 가능)
model_amg = SAM("sam_b.pt")

# 이미지 경로
image_path = "dog.jpg"

# 원본 이미지 로드
image = cv2.imread(image_path)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# **프롬프트 없이** 추론 실행 (AMG 방식)
# ultralytics 라이브러리에서는 프롬프트를 제공하지 않으면
# 내부적으로 AMG와 유사한 방식으로 여러 세그먼트를 생성하려고 시도할 수 있습니다.
print(f"'{image_path}'에서 프롬프트 없이 (AMG 방식) 추론 중...")
results_amg = model_amg(image_path) # bboxes 인자 없음!
print("추론 완료.")

# 결과 시각화
plt.figure(figsize=(12, 10))
plt.imshow(image_rgb)
plt.title("방법 3: SAM Automatic Mask Generation (No Prompt)")
plt.axis('off')

# 마스크 오버레이
if results_amg and results_amg[0].masks is not None:
    masks_amg = results_amg[0].masks.data.cpu().numpy()
    print(f"AMG 방식 결과: {len(masks_amg)}개의 마스크 생성됨")

    # 여러 개의 마스크를 다른 색상으로 표시
    np.random.seed(42) # 일관된 색상 생성을 위한 시드 고정
    for i, mask in enumerate(masks_amg):
        # 각 마스크에 랜덤 색상 적용 (반투명)
        color = np.concatenate([np.random.random(3) * 255, np.array([128])], axis=0)
        h, w = mask.shape[-2:]
        mask_image = mask.reshape(h, w, 1) * color.reshape(1, 1, -1)
        plt.imshow(mask_image.astype(np.uint8)) # 정수형으로 변환하여 오버레이
        # print(f"  - Mask {i+1} shape: {mask.shape}") # 개별 마스크 정보 출력 (선택 사항)
else:
    print("AMG 방식 세그멘테이션 결과를 찾을 수 없습니다.")

plt.show()

# 추가 정보 출력
if results_amg and results_amg[0].masks is not None:
    print("\nAMG 방식 세그멘테이션 상세 정보:")
    print(f" - 생성된 총 마스크 수: {len(results_amg[0].masks.data)}")
    # AMG 결과에는 보통 각 마스크에 대한 추가 정보(점수, 영역 등)가 있을 수 있으나,
    # ultralytics 래퍼의 결과 구조를 확인해야 함.
    # print(f" - 결과 객체 타입: {type(results_amg[0])}") # 결과 구조 확인용

이를 통해서 아래와 같이 강아지 이미지가 여러개의 이미지로
분할된 것을 확인할수 있습니다!!

33개의 세그먼트로 분할되었네요~!

지금까지 알아본 SAM의 파이썬 실습!!

정말 모델의 성능이 놀랍습니다!!

AI, Experiment

This post is licensed under CC BY 4.0 by the author.