Post

๐Ÿ–ฅ๏ธ Studying CAM with Python! - ํŒŒ์ด์ฌ์œผ๋กœ CAM ๊ณต๋ถ€ํ•˜๊ธฐ

๐Ÿ–ฅ๏ธ Studying CAM with Python! - ํŒŒ์ด์ฌ์œผ๋กœ CAM ๊ณต๋ถ€ํ•˜๊ธฐ

(English) Studying CAM with Python

Today, we will delve into CAM (Class Activation Map) in detail using Python code!! Before we begin, the necessary packages are as follows!!
Donโ€™t worry, itโ€™s all possible with CPU without a GPU~!^^

1
2
3
4
5
6
7
8
import torch
from torchvision import models, transforms
from PIL import Image
import requests
from io import BytesIO
import matplotlib.pyplot as plt
import numpy as np
import cv2

CAM basically starts with image classification!
Today, we aim to create a CAM image based on a ResNet classification model trained on ImageNet!!!

What is ImageNet!?

image_net

  • Contains over 14 million images and is categorized into approximately 20,000+ noun hierarchical structures (based on WordNet).
  • Made a significant contribution to the development of deep learning in the field of computer vision.
  • ResNet is also trained based on this ImageNet data!!

What is ResNet?

resnet

  • Innovative model in the vision field: An important structure that greatly improved image recognition performance!! - Announced in 2015 by MS Research!!
  • Overcame the difficulty of training deep neural networks with residual connections.
  • Residual connections: Prevents gradient vanishing by adding the learned changes to the input!!
  • Enables the formation of truly deep networks (DNNs): Effective learning is possible even in deep layers!

Code Start!!

1
2
3
4
5
6
7
8
9
10
11
12
13
# โœ… Loading ImageNet class index (for dog class identification)
import json
imagenet_url = "https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt"
imagenet_classes = requests.get(imagenet_url).text.splitlines()

# โœ… Loading test image (e.g., from the internet)
img_url = "https://images.unsplash.com/photo-1558788353-f76d92427f16"  # Dog photo
response = requests.get(img_url)
img = Image.open(BytesIO(response.content)).convert("RGB")

# โœ… Loading pre-trained ResNet18 (inference without training)
model = models.resnet18(pretrained=True)
model.eval()

Through the above process, we load the ImageNet class data, the dog photo, and finally the pre-trained ResNet18 model!
You can also see the model structure through eval as shown below~~
We will explore the detailed structure of the model in the next ResNet study!

View ResNet Detailed Structure ``` ResNet( (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (layer1): Sequential( (0): BasicBlock( (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (1): BasicBlock( (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (layer2): Sequential( (0): BasicBlock( (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (downsample): Sequential( (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): BasicBlock( (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (layer3): Sequential( (0): BasicBlock( (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (downsample): Sequential( (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): BasicBlock( (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (layer4): Sequential( (0): BasicBlock( (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (downsample): Sequential( (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): BasicBlock( (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (avgpool): AdaptiveAvgPool2d(output_size=(1, 1)) (fc): Linear(in_features=512, out_features=1000, bias=True) ) ```

Preprocessing Start!!

1
2
3
4
5
6
7
8
9
10
# โœ… Extracting dog classes (simple method: names containing 'golden retriever')
dog_classes = [i for i, name in enumerate(imagenet_classes) if 'golden retriever' in name.lower()]

# โœ… Image preprocessing
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
])
input_tensor = transform(img).unsqueeze(0)  # shape: [1, 3, 224, 224]

Now all preparations are complete!! Letโ€™s put the data into the model and perform inference!!

Simple Classification Model!! (Fully Connected Layer + Softmax)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# โœ… Load pre-trained ResNet18 (inference without training)
model = models.resnet18(pretrained=True)
model.eval()

# โœ… Inference
with torch.no_grad():
    output = model(input_tensor)
    pred_class = output.argmax(dim=1).item()

# โœ… Result check
pred_label = imagenet_classes[pred_class]
is_dog = pred_class in dog_classes

print(f"Predicted label: {pred_label}")
print("๐Ÿฆด Is it a dog?", "โœ… Yes" if is_dog else "โŒ No")

Through the above code, you can see โ€˜whether it is simply classified as a dog!โ€™
This was the method of the classification model before CAM~~

If you actually examine the output vector output,

1
output[0][205:210]

You can see that the value at index 207 is indeed the largest at 13.7348!
(Since 207 came out from arg.max(dim=1), it is indeed the highest value, right!?)

1
tensor([ 9.8655,  6.4875, 13.7348, 11.1263,  8.8567])

CAM Start!!!!

Letโ€™s refer back to the CAM structure we summarized in the previous post.

StepData ShapeDescription
๐Ÿ“ท Input Image[3, 224, 224]RGB Image
๐Ÿ”„ CNN(resnet) Last conv Output[512, 7, 7]512 7ร—7 feature maps
๐Ÿ”ฅ CAM Calculation: Weighted sum of CNN(resnet) last conv output and class_weight[7, 7]7ร—7 feature map
๐Ÿ”ผ Final CAM Image Creation (Upsample)[224, 224]Heatmap overlay possible on the original image
๐Ÿ“‰ GAP (Global Average Pooling)[512]Channel-wise average vector of feature map [512, 7, 7]
๐Ÿงฎ FC Layer[N_classes]Converts GAP result to class scores
๐ŸŽฏ Softmax[N_classes]Outputs predicted class probability values

Load the model again, just like before!

1
2
3
# โœ… Load pretrained ResNet18
model = models.resnet18(pretrained=True)
model.eval()
feature map Extraction!

Now, the important part begins! Itโ€™s as follows!!

1
2
3
4
5
6
features = []

def hook_fn(module, input, output):
    features.append(output)

model.layer4.register_forward_hook(hook_fn)  # Last conv block   
  • hook_fn: A function to call data within the model. module is the layer object, input is the input tuple, and output is the output tensor!
  • model.layer4.register_forward_hook(hook_fn): Places hook_fn at the end of model.layer4, so that the output of conv layer4 is stored in the features list.

After placing the hook_fn function at the end of the modelโ€™s layer4,
execute the model in the same way as the simple classification model.

Now, letโ€™s proceed with the [CNN(resnet) Last conv Output]!

StepData ShapeDescription
๐Ÿ”„ CNN(resnet) Last conv Output[512, 7, 7]512 7ร—7 feature maps
1
2
3
4
5
6
7
# โœ… Get weights from the final linear layer
params = list(model.parameters())
fc_weights = params[-2]  # shape: [1000, 512]
class_weights = fc_weights[pred_class].detach().cpu().numpy()  # [512]

# โœ… Get feature map from hook
feature_map = features[0].squeeze(0).detach().cpu().numpy()  # [512, 7, 7]

Through this, we have extracted the [512, 7, 7] size feature map!

create CAM!!!

Now, this is the process of creating a CAM image from this feature map!

StepData ShapeDescription
๐Ÿ”ฅ CAM Calculation: Weighted sum of CNN(resnet) last conv output and class_weight[7, 7]7ร—7 feature map
๐Ÿ”ผ Final CAM Image Creation (Upsample)[224, 224]Heatmap overlay possible on the original image
1
2
3
4
5
6
7
8
9
10
# โœ… Compute CAM
cam = np.zeros((7, 7), dtype=np.float32)
for i in range(len(class_weights)):
    cam += class_weights[i] * feature_map[i]

cam = np.maximum(cam, 0)
cam = cam - np.min(cam)
cam = cam / np.max(cam)
cam = cv2.resize(cam, (224, 224))
heatmap = cv2.applyColorMap(np.uint8(255 *cam), cv2.COLORMAP_JET)

In the above process, we obtain a [7, 7] size CAM by calculating the weighted sum of the class_weights and the [512, 7, 7] feature map from the last conv output of ResNet! Then, we create the final heatmap image through resizing, i.e., Upsampling!

Finally, we overlay the heatmap image on the original image for visualization!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# โœ… Overlay CAM on original image
img_cv = np.array(transforms.Resize((224, 224))(img))[:, :, ::-1]  # PIL โ†’ OpenCV BGR
overlay = cv2.addWeighted(img_cv, 0.5, heatmap, 0.5, 0)

# โœ… Show
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.title("Original Image")
plt.imshow(img)
plt.axis('off')

plt.subplot(1, 2, 2)
plt.title(f"CAM: {imagenet_classes[pred_class]}")
plt.imshow(overlay[:, :, ::-1])  # Back to RGB
plt.axis('off')
plt.tight_layout()
plt.show()

Then!! You will get the image shown directly below, which we saw in the previous post~!

golden

image classification using CAM!

In addition to this!!
We can also distinguish within the CAM!!!
By passing through GAP and the FC layer, and calculating the softmax, we can also see the accuracy of the result distinction~!

StepData ShapeDescription
๐Ÿ“‰ GAP (Global Average Pooling)[512]Channel-wise average vector of feature map [512, 7, 7]
๐Ÿงฎ FC Layer[N_classes]Converts GAP result to class scores
๐ŸŽฏ Softmax[N_classes]Outputs predicted class probability values
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# โœ… GAP operation: [512, 7, 7] โ†’ [512]
gap_vector = feature_map.mean(axis=(1, 2))  # shape: [512]

# โœ… FC operation: [512] ร— [1000, 512]^T โ†’ [1000]
logits = np.dot(fc_weights.detach().cpu().numpy(), gap_vector)  # shape: [1000]

# โœ… Softmax
exp_logits = np.exp(logits - np.max(logits))  # numerical stability
probs = exp_logits / exp_logits.sum()

# โœ… Predicted class
gap_pred_class = np.argmax(probs)
gap_pred_label = imagenet_classes[gap_pred_class]

# โœ… Result comparison
print("\nโœ… GAP โ†’ FC โ†’ Softmax based prediction result")
print(f"Predicted label: {gap_pred_label}")
print("๐Ÿฆด Is it a dog?", "โœ… Yes" if gap_pred_class in dog_classes else "โŒ No")

After going through the above process!?
You can confirm the result:
Predicted label: golden retriever

Is it real?

1
probs[205:210]

Looking at this, the value at index 207 is indeed the largest, right!?
However!! You can see that it is different from the original classification vector value of 13.7348!!

1
[1.7553568e-02, 6.1262056e-04, 8.4515899e-01, 6.3063554e-02, 6.3092457e-03]

Through todayโ€™s process, we were able to understand the detailed operation of CAM well!!

View Full Code ```python import torch from torchvision import models, transforms from PIL import Image import requests from io import BytesIO import numpy as np import matplotlib.pyplot as plt import numpy as np import cv2 import json imagenet_url = "https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt" imagenet_classes = requests.get(imagenet_url).text.splitlines() # โœ… Loading test image (e.g., from the internet) img_url = "https://images.unsplash.com/photo-1558788353-f76d92427f16" # Dog photo response = requests.get(img_url) img = Image.open(BytesIO(response.content)).convert("RGB") # โœ… Loading pre-trained ResNet18 (inference without training) model = models.resnet18(pretrained=True) model.eval() # โœ… Extracting dog classes (simple method: names containing 'golden retriever') dog_classes = [i for i, name in enumerate(imagenet_classes) if 'golden retriever' in name.lower()] # โœ… Image preprocessing transform = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), ]) input_tensor = transform(img).unsqueeze(0) # shape: [1, 3, 224, 224] # โœ… Inference with torch.no_grad(): output = model(input_tensor) pred_class = output.argmax(dim=1).item() # โœ… Result check pred_label = imagenet_classes[pred_class] is_dog = pred_class in dog_classes print(f"Predicted label: {pred_label}") print("๐Ÿฆด Is it a dog?", "โœ… Yes" if is_dog else "โŒ No") # โœ… Load pretrained ResNet18 model = models.resnet18(pretrained=True) model.eval() # โœ… Hook to get final conv feature map features = [] def hook_fn(module, input, output): features.append(output) model.layer4.register_forward_hook(hook_fn) # Last conv block # โœ… Predict with torch.no_grad(): output = model(input_tensor) pred_class = output.argmax(dim=1).item() # โœ… Get weights from the final linear layer params = list(model.parameters()) fc_weights = params[-2] # shape: [1000, 512] class_weights = fc_weights[pred_class].detach().cpu().numpy() # [512] # โœ… Get feature map from hook feature_map = features[0].squeeze(0).detach().cpu().numpy() # [512, 7, 7] # โœ… Compute CAM cam = np.zeros((7, 7), dtype=np.float32) for i in range(len(class_weights)): cam += class_weights[i] * feature_map[i] # Normalize & resize cam = np.maximum(cam, 0) cam = cam - np.min(cam) cam = cam / np.max(cam) cam = cv2.resize(cam, (224, 224)) heatmap = cv2.applyColorMap(np.uint8(255 * cam), cv2.COLORMAP_JET) # โœ… Overlay CAM on original image img_cv = np.array(transforms.Resize((224, 224))(img))[:, :, ::-1] # PIL โ†’ OpenCV BGR overlay = cv2.addWeighted(img_cv, 0.5, heatmap, 0.5, 0) # โœ… Show plt.figure(figsize=(10, 5)) plt.subplot(1, 2, 1) plt.title("Original Image") plt.imshow(img) plt.axis('off') plt.subplot(1, 2, 2) plt.title(f"CAM: {imagenet_classes[pred_class]}") plt.imshow(overlay[:, :, ::-1]) # Back to RGB plt.axis('off') plt.tight_layout() plt.show() # โœ… Result text output print(f"Predicted label: {imagenet_classes[pred_class]}") print("๐Ÿฆด Is it a dog?", "โœ… Yes" if pred_class in dog_classes else "โŒ No") ```

(ํ•œ๊ตญ์–ด) ํŒŒ์ด์ฌ์œผ๋กœ CAM ๊ณต๋ถ€ํ•˜๊ธฐ

์˜ค๋Š˜์€ Python ์ฝ”๋“œ๋กœ CAM(Class Activation Map) ์— ๋Œ€ํ•˜์—ฌ ์ž์„ธํžˆ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค!! ์•Œ์•„๋ณด๊ธฐ์— ์•ž์„œ ํ•„์š”ํ•œ ํŒจํ‚ค์ง€๋“ค์€ ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค!! GPU ์—†์ด!! CPU ๋กœ๋„ ๋ชจ๋‘ ๊ฐ€๋Šฅํ•˜๋‹ˆ ๊ฑฑ์ •๋งˆ์„ธ์š”~!^^

1
2
3
4
5
6
7
8
import torch
from torchvision import models, transforms
from PIL import Image
import requests
from io import BytesIO
import matplotlib.pyplot as plt
import numpy as np
import cv2

CAM๋„ ๊ธฐ๋ณธ์ ์œผ๋กœ ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜์—์„œ ์‹œ์ž‘๋ฉ๋‹ˆ๋‹ค! ์˜ค๋Š˜์€ imagenet์œผ๋กœ ํ•™์Šต๋œ resnet์˜ ๋ถ„๋ฅ˜ ๋ชจ๋ธ์„ ๋ฐ”ํƒ•์œผ๋กœ CAM์ด๋ฏธ์ง€๋ฅผ ๋งŒ๋“ค์–ด ๋ณด๊ณ ์žํ•ฉ๋‹ˆ๋‹ค!!!

imagenet์ด๋ž€!?

image_net

  • ์•ฝ 1,400๋งŒ ๊ฐœ๊ฐ€ ๋„˜๋Š” ์ด๋ฏธ์ง€๋ฅผ ํฌํ•จํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, ์•ฝ 2๋งŒ ๊ฐœ ์ด์ƒ์˜ ๋ช…์‚ฌ ๊ณ„์ธต ๊ตฌ์กฐ (WordNet ๊ธฐ๋ฐ˜)๋กœ ์ด๋ฃจ์–ด์ง„ ์นดํ…Œ๊ณ ๋ฆฌ๋กœ ๋ถ„๋ฅ˜
  • ์ปดํ“จํ„ฐ ๋น„์ „ ๋ถ„์•ผ์˜ ๋”ฅ๋Ÿฌ๋‹ ๋ฐœ์ „์— ์ง€๋Œ€ํ•œ ๊ณตํ—Œ์„ ํ•จ
  • resnet๋„ ์ด imagenet๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ํ•™์Šตํ•จ!!

resnet์ด๋ž€?

resnet

  • ๋น„์ „ ๋ถ„์•ผ ํ˜์‹  ๋ชจ๋ธ: ์ด๋ฏธ์ง€ ์ธ์‹ ์„ฑ๋Šฅ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚จ ์ค‘์š”ํ•œ ๊ตฌ์กฐ!! -2015๋…„ MS research์—์„œ ๋ฐœํ‘œ!!
  • ์ž”์ฐจ ์—ฐ๊ฒฐ(Residual connections)๋กœ ๊นŠ์€ ์‹ ๊ฒฝ๋ง ํ•™์Šต ์–ด๋ ค์›€์„ ๊ทน๋ณต
  • ์ž”์ฐจ ์—ฐ๊ฒฐ(Residual connections): ์ž…๋ ฅ์— ํ•™์Šต๋œ ๋ณ€ํ™”๋Ÿ‰์„ ๋”ํ•ด ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค์„ ๋ง‰์Œ!!
  • ์ง„์งœ ๊นŠ์€ ๋„คํŠธ์›Œํฌ(DNN) ํ˜•์„ฑ ๊ฐ€๋Šฅ: ๊นŠ์€ ์ธต์—์„œ๋„ ํšจ๊ณผ์ ์ธ ํ•™์Šต์ด ๊ฐ€๋Šฅํ•ด์ง!

์ฝ”๋“œ ์‹œ์ž‘!!

๊ด€๋ จ ๋ฐ์ดํ„ฐ ๋ฐ ๋ชจ๋ธ ์ค€๋น„

1
2
3
4
5
6
7
8
9
10
11
12
13
# โœ… ImageNet class index ๋กœ๋”ฉ (๊ฐ•์•„์ง€ ํด๋ž˜์Šค ๊ตฌ๋ถ„์šฉ)
import json
imagenet_url = "https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt"
imagenet_classes = requests.get(imagenet_url).text.splitlines()

# โœ… ํ…Œ์ŠคํŠธ ์ด๋ฏธ์ง€ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ (์˜ˆ: ์ธํ„ฐ๋„ท ์ด๋ฏธ์ง€)
img_url = "https://images.unsplash.com/photo-1558788353-f76d92427f16"  # ๊ฐ•์•„์ง€ ์‚ฌ์ง„
response = requests.get(img_url)
img = Image.open(BytesIO(response.content)).convert("RGB")

# โœ… ์‚ฌ์ „ ํ•™์Šต๋œ ResNet18 ๋กœ๋“œ (ํ•™์Šต ์—†์ด inference)
model = models.resnet18(pretrained=True)
model.eval()

์œ„์˜ ๊ณผ์ •์„ ํ†ตํ•ด์„œ, imagenet์˜ ํด๋ž˜์Šค ๋ฐ์ดํ„ฐ๋„ ๋ฐ›์•„์˜ค๊ณ , ๊ฐ•์•„์ง€ ์‚ฌ์ง„๋„ ๋ฐ›์•„์˜ค๊ณ ! ๋งˆ์ง€๋ง‰์œผ๋กœ resnet18์˜ ์‚ฌ์ „ํ•™์Šต๋œ ๋ชจ๋ธ๋„ ๋ถˆ๋Ÿฌ์˜ค๊ฒŒ๋ฉ๋‹ˆ๋‹ค!
eval์„ ํ†ตํ•ด์„œ ๋ชจ๋ธ๋„ ์•„๋ž˜์™€ ๊ฐ™์ด ๋ณผ์ˆ˜ ์žˆ์ง€์š”~~
๋ชจ๋ธ์˜ ์„ธ๋ถ€๊ตฌ์กฐ๋Š”!? ๋‹ค์Œ resnet ๊ณต๋ถ€์—์„œ ์•Œ์•„๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค!

resnet ์„ธ๋ถ€ ๊ตฌ์กฐ ๋ณด๊ธฐ ``` ResNet( (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False) (layer1): Sequential( (0): BasicBlock( (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) (1): BasicBlock( (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (layer2): Sequential( (0): BasicBlock( (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (downsample): Sequential( (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): BasicBlock( (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (layer3): Sequential( (0): BasicBlock( (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (downsample): Sequential( (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): BasicBlock( (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (layer4): Sequential( (0): BasicBlock( (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (downsample): Sequential( (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): BasicBlock( (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (relu): ReLU(inplace=True) (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (avgpool): AdaptiveAvgPool2d(output_size=(1, 1)) (fc): Linear(in_features=512, out_features=1000, bias=True) ) ```

์ „์ฒ˜๋ฆฌ ์‹œ์ž‘!!

1
2
3
4
5
6
7
8
9
10
# โœ… ๊ฐ•์•„์ง€ ํด๋ž˜์Šค๋“ค ์ถ”๋ ค๋‚ด๊ธฐ (๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•: ์ด๋ฆ„์— 'golden retriever' ํฌํ•จ๋œ ๊ฒƒ)
dog_classes = [i for i, name in enumerate(imagenet_classes) if 'golden retriever' in name.lower()]

# โœ… ์ด๋ฏธ์ง€ ์ „์ฒ˜๋ฆฌ
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
])
input_tensor = transform(img).unsqueeze(0)  # shape: [1, 3, 224, 224]

์ด์ œ ๋ชจ๋‘” ์ค€๋น„๊ฐ€ ๋๋‚ฌ์Šต๋‹ˆ๋‹ค!! ๋ชจ๋ธ์— ๋ฐ์ดํ„ฐ๋ฅผ ๋„ฃ์–ด์„œ ์ถ”๋ก ํ•ด๋ณด์•„์š”!!

๋‹จ์ˆœ ๋ถ„๋ฅ˜๋ชจ๋ธ!! (Fully Connected Layer + Softmax)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# โœ… ์‚ฌ์ „ ํ•™์Šต๋œ ResNet18 ๋กœ๋“œ (ํ•™์Šต ์—†์ด inference)
model = models.resnet18(pretrained=True)
model.eval()

# โœ… ์ถ”๋ก 
with torch.no_grad():
    output = model(input_tensor)
    pred_class = output.argmax(dim=1).item()

# โœ… ๊ฒฐ๊ณผ ํ™•์ธ
pred_label = imagenet_classes[pred_class]
is_dog = pred_class in dog_classes

print(f"Predicted label: {pred_label}")
print("๐Ÿฆด Is it a dog?", "โœ… Yes" if is_dog else "โŒ No")

์œ„ ์ฝ”๋“œ๋ฅผ ํ†ตํ•ด์„œ โ€˜๋‹จ์ˆœํžˆ ๊ฐ•์•„์ง€๋กœ ๋ถ„๋ฅ˜ํ•˜๋Š”๊ฐ€!โ€™ ์— ๋Œ€ํ•˜์—ฌ ์•Œ์•„๋ณผ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. CAM์ด์ „ ๋ถ„๋ฅ˜๋ชจ๋ธ์˜ ๋ฐฉ์‹์ด์—ˆ์ง€์š”~~

์‹ค์ œ๋กœ ๊ฒฐ๊ณผ๊ฐ’ ๋ฒกํ„ฐ output์„ ์กฐ์‚ฌํ•ด๋ณด๋ฉด!

1
output[0][205:210]

์ •๋ง๋กœ 207๋ฒˆ์จฐ์˜ ๊ฐ’์ด 13.7348๋กœ ํฐ ๊ฐ’์ž„์„ ์•Œ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค! (arg.max(dim=1)์—์„œ 207์ด ๋‚˜์™”์œผ๋‹ˆ์ตœ๊ณ ๊ฐ’์ด ๋งž๋Š”๊ฒƒ ์•„์‹œ์ฃ !?)

1
tensor([ 9.8655,  6.4875, 13.7348, 11.1263,  8.8567])

CAM ์‹œ์ž‘!!!!

์ง€๋‚œ ํฌ์ŠคํŒ…์—์„œ ์ •๋ฆฌํ•ด๋ณธ CAM์˜ ๊ตฌ์กฐ๋ฅผ ๋‹ค์‹œ ์ฐธ๊ณ ํ•ด๋ด…๋‹ˆ๋‹ค

๋‹จ๊ณ„๋ฐ์ดํ„ฐ ํ˜•ํƒœ์„ค๋ช…
๐Ÿ“ท ์ž…๋ ฅ ์ด๋ฏธ์ง€[3, 224, 224]RGB ์ด๋ฏธ์ง€
๐Ÿ”„ CNN(resnet) ๋งˆ์ง€๋ง‰ conv ์ถœ๋ ฅ[512, 7, 7]512๊ฐœ์˜ 7ร—7 feature map
๐Ÿ”ฅ CAM ๊ณ„์‚ฐ : CNN(resnet) ๋งˆ์ง€๋ง‰ conv ์ถœ๋ ฅ ๊ณผ class_weight์˜ weighted sum[7, 7]7ร—7 feature map
๐Ÿ”ผ ์ตœ์ข… CAM ์ด๋ฏธ์ง€ ๋งŒ๋“ค๊ธฐ (Upsample)[224, 224]์›๋ณธ ์ด๋ฏธ์ง€ ์œ„์— ํžˆํŠธ๋งต overlay ๊ฐ€๋Šฅ
๐Ÿ“‰ GAP(Global Average Pooling)[512]feature map[512, 7, 7]์˜ ์ฑ„๋„๋ณ„ ํ‰๊ท  ๋ฒกํ„ฐ
๐Ÿงฎ FC Layer[N_classes]GAP ๊ฒฐ๊ณผ๋ฅผ ํด๋ž˜์Šค๋ณ„ score๋กœ ๋ณ€ํ™˜
๐ŸŽฏ Softmax[N_classes]์˜ˆ์ธก ํด๋ž˜์Šค ํ™•๋ฅ ๊ฐ’ ์ถœ๋ ฅ

์ „๊ณผ ๋™์ผํ•˜๊ฒŒ ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์™€ ์ค๋‹ˆ๋‹ค!

1
2
3
# โœ… Load pretrained ResNet18
model = models.resnet18(pretrained=True)
model.eval()
feature map ์ถ”์ถœํ•˜๊ธฐ!!

์ง€๊ธˆ๋ถ€ํ„ฐ ์ค‘์š”ํ•œ ๋ถ€๋ถ„์ด ์‹œ์ž‘๋ฉ๋‚˜๋‹ค! ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค!!

1
2
3
4
5
6
features = []

def hook_fn(module, input, output):
    features.append(output)

model.layer4.register_forward_hook(hook_fn)  # ๋งˆ์ง€๋ง‰ conv block
  • hook_fn : ๋ชจ๋ธ ๋‚ด์— ๋ฐ์ดํ„ฐ๋ฅผ ํ˜ธ์ถœํ•˜๋Š” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค. module ์€ layer ๊ฐ์ฒด, input์€ ์ž…๋ ฅํŠœํ”Œ, output์€ ์ถœ๋ ฅ ํ…์„œ์ž…๋‹ˆ๋‹ค!
  • model.layer4.register_forward_hook(hook_fn) : ๋ชจ๋ธ์˜ layer4์— hook_fn์„ ๋ฐฐ์น˜, conv์˜ layer4์˜ ๊ฒฐ๊ณผ๋ฌผ์„ features ๋ฆฌ์ŠคํŠธ์— ์ €์žฅ๋˜๋„๋กํ•ฉ๋‹ˆ๋‹ค.

์œ„์˜ hook_fnํ•จ์ˆ˜๋ฅผ ๋ชจ๋ธ layer4 ๋’ท๋‹จ์— ์— ๋ฐฐ์น˜์‹œํ‚จ ๋’ค
๋‹จ์ˆœ ๋ถ„๋ฅ˜๋ชจ๋ธ๊ณผ ๋™์ผํ•˜๊ฒŒ ๋ชจ๋ธ์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

์ด์ œ [CNN(resnet) ๋งˆ์ง€๋ง‰ conv ์ถœ๋ ฅ] ์„ ์ง„ํ–‰ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค!!

๋‹จ๊ณ„๋ฐ์ดํ„ฐ ํ˜•ํƒœ์„ค๋ช…
๐Ÿ”„ CNN(resnet) ๋งˆ์ง€๋ง‰ conv ์ถœ๋ ฅ[512, 7, 7]512๊ฐœ์˜ 7ร—7 feature map
1
2
3
4
5
6
7
# โœ… Get weights from the final linear layer
params = list(model.parameters())
fc_weights = params[-2]  # shape: [1000, 512]
class_weights = fc_weights[pred_class].detach().cpu().numpy()  # [512]

# โœ… Get feature map from hook
feature_map = features[0].squeeze(0).detach().cpu().numpy()  # [512, 7, 7]

์ด๋ฅผ ํ†ตํ•ด์„œ [512,7,7] ์‚ฌ์ด์ฆˆ์˜ feature map์„ ์ถ”์ถœํ–ˆ์Šต๋‹ˆ๋‹ค!

feature map๊ณผ class_weight ๊ณฑํ•ด์„œ, CAM ๋งŒ๋“ค๊ธฐ!!

์ด์  , ์ด feature map์œผ๋กœ CAM์ด๋ฏธ์ง€๋ฅผ ๋งŒ๋“œ๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค!

๋‹จ๊ณ„๋ฐ์ดํ„ฐ ํ˜•ํƒœ์„ค๋ช…
๐Ÿ”ฅ CAM ๊ณ„์‚ฐ : CNN(resnet) ๋งˆ์ง€๋ง‰ conv ์ถœ๋ ฅ ๊ณผ class_weight์˜ weighted sum[7, 7]7ร—7 feature map
๐Ÿ”ผ ์ตœ์ข… CAM ์ด๋ฏธ์ง€ ๋งŒ๋“ค๊ธฐ (Upsample)[224, 224]์›๋ณธ ์ด๋ฏธ์ง€ ์œ„์— ํžˆํŠธ๋งต overlay ๊ฐ€๋Šฅ
1
2
3
4
5
6
7
8
9
# โœ… Compute CAM
cam = np.zeros((7, 7), dtype=np.float32)
for i in range(len(class_weights)):
    cam += class_weights[i] * feature_map[i]

cam = np.maximum(cam, 0)
cam = cam - np.min(cam)
cam = cam / np.max(cam)
cam = cv2.resize(cam, (224, 224))

์œ„ ๊ณผ์ •์—์„œ CNN(resnet) ๋งˆ์ง€๋ง‰ conv ์ถœ๋ ฅ class_weights์™€ [512, 7,7] feature map๊ณผ์˜ weight sum์„ ๊ตฌํ•˜์—ฌ [7,7]์‚ฌ์ด์ฆˆ์˜ cam ์„ ๊ตฌํ•ฉ๋‹ˆ๋‹ค!! ๊ทธ๋ฆฌ๊ณ  resize, ์ฆ‰ Upsample์„ ํ†ตํ•ด์„œ ์ตœ์ข… heatmap์ด๋ฏธ์ง€๋ฅผ ๋งŒ๋“ค์–ด์ค๋‹ˆ๋‹ค!

heat map ๋งŒ๋“ค๊ธฐ!~!

๋งˆ์ง€๋ง‰์œผ๋กœ heatmap์ด๋ฏธ์ง€๋ฅผ ๊ธฐ์กด ์ด๋ฏธ์ง€์™€ ๊ฒน์ณ์„œ ์‹œ๊ฐํ™” ํ•ฉ๋‹ˆ๋‹ค!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# โœ… Overlay CAM on original image
img_cv = np.array(transforms.Resize((224, 224))(img))[:, :, ::-1]  # PIL โ†’ OpenCV BGR
overlay = cv2.addWeighted(img_cv, 0.5, heatmap, 0.5, 0)

# โœ… Show
plt.figure(figsize=(10, 5))
plt.subplot(1, 2, 1)
plt.title("Original Image")
plt.imshow(img)
plt.axis('off')

plt.subplot(1, 2, 2)
plt.title(f"CAM: {imagenet_classes[pred_class]}")
plt.imshow(overlay[:, :, ::-1])  # Back to RGB
plt.axis('off')
plt.tight_layout()
plt.show()

๊ทธ๋Ÿผ!! ์ง€๋‚œ ํฌ์ŠคํŒ…์—์„œ ๋ดค๋˜ ๋ฐ”๋กœ ์•„๋ž˜ ์ด๋ฏธ์ง€๊ฐ€ ๋‚˜์˜ค๊ฒŒ๋ฉ๋‹ˆ๋‹ค~!

golden

CAMP๋ชจ๋ธ๋กœ ๋ถ„๋ฅ˜ํ•˜๊ธฐ!~!

์ด์— ๋”ํ•ด์„œ!! CAM์—์„œ๋„ ๊ตฌ๋ถ„์„ ํ•  ์ˆ˜ ์žˆ๋‹ค!!! GAP๊ณผ FC layer๋ฅผ ํ†ต๊ณผ, softmax๋ฅผ ๊ตฌํ•˜๊ฒŒ ๋˜๋ฉด ๊ฒฐ๊ณผ ๊ตฌ๋ถ„์˜ ์ •ํ™•๋„๋„ ๋ณผ์ˆ˜ ์žˆ๋Š”๋ฐ์š”~!

๋‹จ๊ณ„๋ฐ์ดํ„ฐ ํ˜•ํƒœ์„ค๋ช…
๐Ÿ“‰ GAP(Global Average Pooling)[512]feature map[512, 7, 7]์˜ ์ฑ„๋„๋ณ„ ํ‰๊ท  ๋ฒกํ„ฐ
๐Ÿงฎ FC Layer[N_classes]GAP ๊ฒฐ๊ณผ๋ฅผ ํด๋ž˜์Šค๋ณ„ score๋กœ ๋ณ€ํ™˜
๐ŸŽฏ Softmax[N_classes]์˜ˆ์ธก ํด๋ž˜์Šค ํ™•๋ฅ ๊ฐ’ ์ถœ๋ ฅ
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# โœ… GAP ์—ฐ์‚ฐ: [512, 7, 7] โ†’ [512]
gap_vector = feature_map.mean(axis=(1, 2))  # shape: [512]

# โœ… FC ์—ฐ์‚ฐ: [512] ร— [1000, 512]^T โ†’ [1000]
logits = np.dot(fc_weights.detach().cpu().numpy(), gap_vector)  # shape: [1000]

# โœ… Softmax
exp_logits = np.exp(logits - np.max(logits))  # numerical stability
probs = exp_logits / exp_logits.sum()

# โœ… ์˜ˆ์ธก ํด๋ž˜์Šค
gap_pred_class = np.argmax(probs)
gap_pred_label = imagenet_classes[gap_pred_class]

# โœ… ๊ฒฐ๊ณผ ๋น„๊ต
print("\nโœ… GAP โ†’ FC โ†’ Softmax ๊ธฐ๋ฐ˜ ์˜ˆ์ธก ๊ฒฐ๊ณผ")
print(f"Predicted label: {gap_pred_label}")
print("๐Ÿฆด Is it a dog?", "โœ… Yes" if gap_pred_class in dog_classes else "โŒ No")

์œ„์˜ ๊ณผ์ •์„ ๊ฑธ์น˜๋ฉด!? Predicted label: golden retriever ๋ผ๋Š” ๊ฒฐ๊ณผ๋ฅผ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค!!
์ •๋ง์ผ๊นŒ์š”?

1
probs[205:210]

๋ฅผ ๋ณด๋ฉด ์ •๋ง๋กœ 207๋ฒˆ์จฐ์˜ ๊ฐ’์ด ๊ฐ€์žฅ ํฐ ๊ฐ’์ด์ฃ !?
ํ•˜์ง€๋งŒ!! ๊ธฐ์กด ๋ถ„๋ฅ˜๋ฒกํ„ฐ์˜ ๊ฐ’ 13.7348๊ณผ๋Š” ๋‹ค๋ฆ„์„ ์•Œ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค!!

1
[1.7553568e-02, 6.1262056e-04, 8.4515899e-01, 6.3063554e-02, 6.3092457e-03]

์˜ค๋Š˜์˜ ๊ณผ์ •์„ ํ†ตํ•ด์„œ CAM์˜ ์„ธ๋ถ€ ๋™์ž‘์„ ์ž˜ ์•Œ์•„๋ณผ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค!!

์ „์ฒด ์ฝ”๋“œ ๋ณด๊ธฐ ```python import torch from torchvision import models, transforms from PIL import Image import requests from io import BytesIO import numpy as np import matplotlib.pyplot as plt import numpy as np import cv2 import json imagenet_url = "https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt" imagenet_classes = requests.get(imagenet_url).text.splitlines() # โœ… ํ…Œ์ŠคํŠธ ์ด๋ฏธ์ง€ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ (์˜ˆ: ์ธํ„ฐ๋„ท ์ด๋ฏธ์ง€) img_url = "https://images.unsplash.com/photo-1558788353-f76d92427f16" # ๊ฐ•์•„์ง€ ์‚ฌ์ง„ response = requests.get(img_url) img = Image.open(BytesIO(response.content)).convert("RGB") # โœ… ์‚ฌ์ „ ํ•™์Šต๋œ ResNet18 ๋กœ๋“œ (ํ•™์Šต ์—†์ด inference) model = models.resnet18(pretrained=True) model.eval() # โœ… ๊ฐ•์•„์ง€ ํด๋ž˜์Šค๋“ค ์ถ”๋ ค๋‚ด๊ธฐ (๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•: ์ด๋ฆ„์— 'golden retriever' ํฌํ•จ๋œ ๊ฒƒ) dog_classes = [i for i, name in enumerate(imagenet_classes) if 'golden retriever' in name.lower()] # โœ… ์ด๋ฏธ์ง€ ์ „์ฒ˜๋ฆฌ transform = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), ]) input_tensor = transform(img).unsqueeze(0) # shape: [1, 3, 224, 224] # โœ… ์ถ”๋ก  with torch.no_grad(): output = model(input_tensor) pred_class = output.argmax(dim=1).item() # โœ… ๊ฒฐ๊ณผ ํ™•์ธ pred_label = imagenet_classes[pred_class] is_dog = pred_class in dog_classes print(f"Predicted label: {pred_label}") print("๐Ÿฆด Is it a dog?", "โœ… Yes" if is_dog else "โŒ No") # โœ… Load pretrained ResNet18 model = models.resnet18(pretrained=True) model.eval() # โœ… Hook to get final conv feature map features = [] def hook_fn(module, input, output): features.append(output) model.layer4.register_forward_hook(hook_fn) # ๋งˆ์ง€๋ง‰ conv block # โœ… Predict with torch.no_grad(): output = model(input_tensor) pred_class = output.argmax(dim=1).item() # โœ… Get weights from the final linear layer params = list(model.parameters()) fc_weights = params[-2] # shape: [1000, 512] class_weights = fc_weights[pred_class].detach().cpu().numpy() # [512] # โœ… Get feature map from hook feature_map = features[0].squeeze(0).detach().cpu().numpy() # [512, 7, 7] # โœ… Compute CAM cam = np.zeros((7, 7), dtype=np.float32) for i in range(len(class_weights)): cam += class_weights[i] * feature_map[i] # Normalize & resize cam = np.maximum(cam, 0) cam = cam - np.min(cam) cam = cam / np.max(cam) cam = cv2.resize(cam, (224, 224)) heatmap = cv2.applyColorMap(np.uint8(255 * cam), cv2.COLORMAP_JET) # โœ… Overlay CAM on original image img_cv = np.array(transforms.Resize((224, 224))(img))[:, :, ::-1] # PIL โ†’ OpenCV BGR overlay = cv2.addWeighted(img_cv, 0.5, heatmap, 0.5, 0) # โœ… Show plt.figure(figsize=(10, 5)) plt.subplot(1, 2, 1) plt.title("Original Image") plt.imshow(img) plt.axis('off') plt.subplot(1, 2, 2) plt.title(f"CAM: {imagenet_classes[pred_class]}") plt.imshow(overlay[:, :, ::-1]) # Back to RGB plt.axis('off') plt.tight_layout() plt.show() # โœ… ๊ฒฐ๊ณผ ํ…์ŠคํŠธ ์ถœ๋ ฅ print(f"Predicted label: {imagenet_classes[pred_class]}") print("๐Ÿฆด Is it a dog?", "โœ… Yes" if pred_class in dog_classes else "โŒ No") ```
This post is licensed under CC BY 4.0 by the author.