๐ฅ๏ธ Object Detection with DETR! Python Practice!! - DETR์ ํ์ฉํ ๊ฐ์ฒด ํ์ง! ํ์ด์ฌ ์ค์ต!!
(English) Object Detection with DETR! Python Practice!!
In the previous post we studied DETR!! Today, based on this DETR model, we will directly perform Object Detection!
- Letโs start with the conclusion again!!!
- It finds and shows multiple detected objects in the image!!
- It accurately displays many people and frisbees, along with their accuracy!!
- Letโs explore the process together with Python code!!
1. Loading the DETR model from Hugging Face!!
Todayโs DETR model will be loaded from Hugging Face, using the
facebook/detr-resnet-50
model.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
import torch
import torchvision.transforms as T
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from PIL import Image
from transformers import DetrImageProcessor, DetrForObjectDetection
# 1๏ธโฃ Set device (use CUDA if GPU is available)
device = "cuda" if torch.cuda.is_available() else "cpu"
# 2๏ธโฃ Load DETR model and processor (pretrained model)
model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50").to(device)
processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")
processor : ๐ผ๏ธ Image Processor (DetrImageProcessor)
Role: To preprocess the input image into a format that the DETR model can effectively understand and process.
Main Tasks:
- Image Resizing: Changes the size of the input image to a specific size required by the model.
- Image Normalization: Adjusts the pixel values of the image to a specific range to improve the stability of model training and inference.
- Tensor Conversion: Converts the image into a tensor format that can be used by deep learning frameworks such as PyTorch.
- Handling Model-Specific Requirements: Performs additional preprocessing tasks according to the model architecture (e.g., mask generation).
If we actually check the internal workings of the processor, we can see the preprocessing steps as below:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
DetrImageProcessor {
"do_convert_annotations": true,
"do_normalize": true,
"do_pad": true,
"do_rescale": true,
"do_resize": true,
"format": "coco_detection",
"image_mean": [
0.485,
0.456,
0.406
],
"image_processor_type": "DetrImageProcessor",
"image_std": [
0.229,
0.224,
0.225
],
"pad_size": null,
"resample": 2,
"rescale_factor": 0.00392156862745098,
"size": {
"longest_edge": 1333,
"shortest_edge": 800
}
}
model : ๐ค DETR Object Detection Model (DetrForObjectDetection)
Role: To perform object detection on the preprocessed image and predict the location and class of objects within the image. This is the core role.
Main Tasks:
- Feature Extraction: Extracts important visual features for object detection from the input image.
- Transformer Encoder-Decoder: Processes the extracted features through the Transformer structure to understand the relationships between objects in the image and learn information about each object.
- Object Prediction: Finally outputs the bounding box coordinates, the corresponding class labels, and the confidence scores of the detected objects in the image.
The DETR model is structured as shown below:
2. Starting Object Detection with DETR!
Itโs done with just a few lines of simple code!!!
I have prepared an image above where several people are playing with a frisbee! And then!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import torch
import torchvision.transforms as T
from PIL import Image
from transformers import DetrImageProcessor, DetrForObjectDetection
# 1๏ธโฃ Set device (use CUDA if GPU is available)
device = "cuda" if torch.cuda.is_available() else "cpu"
# 2๏ธโฃ Load DETR model and processor (pretrained model)
model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50").to(device)
processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")
# 3๏ธโฃ Load the bike.jpg image from the local directory
image_path = "catch_frisbee.jpg"
image = Image.open(image_path)
# 4๏ธโฃ Preprocess the image (convert to DETR model input format)
inputs = processor(images=image, return_tensors="pt").to(device)
# 5๏ธโฃ Model inference
with torch.no_grad():
outputs = model(**inputs)
# 6๏ธโฃ Post-process the results (convert Bounding Box & Labels)
target_sizes = torch.tensor([image.size[::-1]]) # (height, width) format
results = processor.post_process_object_detection(outputs, target_sizes=target_sizes)[0]
# 7๏ธโฃ Output detected objects
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
if score > 0.7: # Output objects with confidence above 70%
box = [round(i, 2) for i in box.tolist()]
print(f"Detected {model.config.id2label[label.item()]} with confidence {round(score.item(), 3)} at {box}")
If we briefly analyze the code above:
- It loads the model.
- It loads the
catch_frisbee
image! - It preprocesses it through the
processor
. - It puts it into the
model
and performs inference! - It prints the detected content from
results
!
Then the output is! As shown below! It tells us the detected objects, their accuracy (confidence), and finally the bounding box coordinates!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Detected person with confidence 0.783 at [12.91, 355.33, 32.23, 383.66]
Detected person with confidence 0.999 at [279.08, 255.76, 365.66, 423.82]
Detected person with confidence 0.995 at [533.57, 280.23, 584.71, 401.82]
Detected umbrella with confidence 0.744 at [459.41, 324.56, 496.24, 340.89]
Detected person with confidence 0.933 at [488.93, 340.06, 510.23, 376.37]
Detected person with confidence 0.835 at [0.01, 355.79, 11.03, 384.31]
Detected person with confidence 0.906 at [261.05, 346.35, 284.02, 378.22]
Detected person with confidence 0.99 at [574.15, 301.1, 605.79, 395.45]
Detected person with confidence 0.713 at [244.5, 349.68, 262.29, 378.9]
Detected person with confidence 0.997 at [132.21, 31.6, 310.32, 329.97]
Detected person with confidence 0.732 at [349.66, 352.63, 365.67, 378.28]
Detected person with confidence 0.796 at [209.17, 326.9, 232.89, 355.65]
Detected person with confidence 0.777 at [149.0, 347.84, 169.28, 381.43]
Detected person with confidence 0.991 at [163.45, 299.99, 206.14, 399.0]
Detected frisbee with confidence 1.0 at [181.55, 139.33, 225.96, 161.49]
Detected person with confidence 0.734 at [200.95, 350.37, 229.14, 380.88]
Detected person with confidence 0.737 at [467.46, 347.11, 483.07, 376.49]
Detected person with confidence 0.978 at [413.58, 253.38, 465.11, 416.57]
Detected person with confidence 0.73 at [597.38, 342.37, 613.34, 380.89]
Detected person with confidence 0.998 at [304.64, 70.92, 538.5, 410.45]
3. Visualization of Object Detection Results!!
Instead of simple text detection, letโs display bounding boxes on the image!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
import torch
import torchvision.transforms as T
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from PIL import Image
from transformers import DetrImageProcessor, DetrForObjectDetection
# 1๏ธโฃ Set device (use CUDA if GPU is available)
device = "cuda" if torch.cuda.is_available() else "cpu"
# 2๏ธโฃ Load DETR model and processor (pretrained model)
model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50").to(device)
processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")
# 3๏ธโฃ Load the bike.jpg image from the local directory
image_path = "catch_frisbee.jpg"
image = Image.open(image_path)
# 4๏ธโฃ Preprocess the image (convert to DETR model input format)
inputs = processor(images=image, return_tensors="pt").to(device)
# 5๏ธโฃ Model inference
with torch.no_grad():
outputs = model(**inputs)
# 6๏ธโฃ Post-process the results (convert Bounding Box & Labels)
target_sizes = torch.tensor([image.size[::-1]]) # (height, width) format
results = processor.post_process_object_detection(outputs, target_sizes=target_sizes)[0]
# 7๏ธโฃ Visualize detected objects with Bounding Boxes on the image
fig, ax = plt.subplots(1, figsize=(10, 6))
ax.imshow(image)
# Draw Bounding Boxes
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
if score > 0.7: # ๐น Visualize objects with confidence above 70%
box = [round(i, 2) for i in box.tolist()]
x, y, w, h = box
rect = patches.Rectangle((x, y), w - x, h - y, linewidth=2, edgecolor='r', facecolor='none')
ax.add_patch(rect)
ax.text(x, y, f"{model.config.id2label[label.item()]}: {round(score.item(), 2)}",
fontsize=12, bbox=dict(facecolor='yellow', alpha=0.5))
# 8๏ธโฃ Save the result
output_path = "detr_output.jpg" # ๐น Filename to save
plt.axis("off") # ๐น Remove axes
plt.savefig(output_path, bbox_inches="tight")
plt.show()
print(f"Detection result saved as {output_path}")
Through the code above,
The detected objects are visualized,
And saved as detr_output.jpg
!!
Object detection, itโs really easy, right?
However, it takes 8.5 seconds to detect objects in a single imageโฆ itโs still a bit slow!
(ํ๊ตญ์ด) DETR์ ํ์ฉํ ๊ฐ์ฒด ํ์ง! ํ์ด์ฌ ์ค์ต!!
์ง๋ ํฌ์คํ ์์ ๊ณต๋ถํด๋ณด์๋ DETR!!
์ค๋์ ์ด DETR ๋ชจ๋ธ์ ๋ฐํ์ผ๋ก ์ง์ ๊ฐ์ฒด ํ์ง(Object Detection)์ ์งํํด ์์๋ณด๊ฒ ์ต๋๋ค~!
- ์ค๋๋๋ ๊ฒฐ๋ก ๋ถํฐ!!!
- ์ด๋ฏธ์ง์์ ํ์ง๋ ์ฌ๋ฌ ๊ฐ์ฒด๋ค์ ์ฐพ์์ ๋ณด์ฌ์ค๋๋ค!!
- ๋ง์ ์ฌ๋๋ค๊ณผ ํ๋ฆฌ์ค๋น ๋ฑ ๊ฐ์ฒด๋ฅผ ์ ํ๋์ ํจ๊ป ๋ณด์ฌ์ค๋๋ค!!
- ํจ๊ป, ํ์ด์ฌ ์ฝ๋๋ก ๊ทธ ๊ณผ์ ์ ์์๋ณด์์!!
1. huggingface์์ DETR ๋ชจ๋ธ ๋ฐ๊ธฐ!!
์ค๋์ DETR ๋ชจ๋ธ์ Huggingface๋ก๋ถํฐ,
facebook/detr-resnet-50
๋ชจ๋ธ์ ๋ฐ์ ์งํํด๋ณด๊ฒ ์ต๋๋ค.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import torch
import torchvision.transforms as T
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from PIL import Image
from transformers import DetrImageProcessor, DetrForObjectDetection
# 1๏ธโฃ ๋๋ฐ์ด์ค ์ค์ (GPU ์ฌ์ฉ ๊ฐ๋ฅํ๋ฉด CUDA๋ก ์ค์ )
device = "cuda" if torch.cuda.is_available() else "cpu"
# 2๏ธโฃ DETR ๋ชจ๋ธ ๋ฐ ํ๋ก์ธ์ ๋ก๋ (์ฌ์ ํ์ต๋ ๋ชจ๋ธ)
model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50").to(device)
processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")
์ ์ฝ๋๋ฅผ ๋ณด๋ฉด, ์ฌ์ ํ์ต๋ facebook/detr-resnet-50
์ Model๊ณผ Processor ๋ฅผ ๋ก๋ํ๋๋ฐ์~!
๊ฐ๊ฐ์ ์ญํ ์ ์์๋ณด์๋ฉด!
processor : ๐ผ๏ธ ์ด๋ฏธ์ง ํ๋ก์ธ์ (DetrImageProcessor)
์ญํ : ์ ๋ ฅ ์ด๋ฏธ์ง๋ฅผ DETR ๋ชจ๋ธ์ด ํจ๊ณผ์ ์ผ๋ก ์ดํดํ๊ณ ์ฒ๋ฆฌํ ์ ์๋ ํํ๋ก ์ ์ฒ๋ฆฌ(Preprocessing)ํ๋ ์ญํ
์ฃผ์ ์์ :
- ์ด๋ฏธ์ง ํฌ๊ธฐ ์กฐ์ (Resizing): ์ ๋ ฅ ์ด๋ฏธ์ง์ ํฌ๊ธฐ๋ฅผ ๋ชจ๋ธ์ด ์๊ตฌํ๋ ํน์ ํฌ๊ธฐ๋ก ๋ณ๊ฒฝ
- ์ด๋ฏธ์ง ์ ๊ทํ (Normalization): ์ด๋ฏธ์ง ํฝ์ ๊ฐ์ ํน์ ๋ฒ์๋ก ์กฐ์ ํ์ฌ ๋ชจ๋ธ ํ์ต ๋ฐ ์ถ๋ก ์์ ์ฑ ํฅ์
- ํ ์ ๋ณํ (Tensor Conversion): ์ด๋ฏธ์ง๋ฅผ ํ์ดํ ์น(PyTorch)์ ๊ฐ์ ๋ฅ๋ฌ๋ ํ๋ ์์ํฌ์์ ์ฌ์ฉํ ์ ์๋ ํ ์(Tensor) ํํ ๋ณํ
- ๋ชจ๋ธ๋ณ ์๊ตฌ ์ฌํญ ์ฒ๋ฆฌ: ๋ชจ๋ธ ์ํคํ ์ฒ์ ๋ฐ๋ผ ์ถ๊ฐ์ ์ธ ์ ์ฒ๋ฆฌ ์์ (์: ๋ง์คํฌ ์์ฑ ๋ฑ)์ ์ํ
์ค์ ๋ก processor ๋ฅผ ๋ด๋ถ๋ฅผ ํ์ธํด๋ณด๋ฉด ์๋์ ๊ฐ์ด ์ ์ฒ๋ฆฌ ๊ณผ์ ์ ๋ณผ์ ์์ต๋๋ค~
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
DetrImageProcessor {
"do_convert_annotations": true,
"do_normalize": true,
"do_pad": true,
"do_rescale": true,
"do_resize": true,
"format": "coco_detection",
"image_mean": [
0.485,
0.456,
0.406
],
"image_processor_type": "DetrImageProcessor",
"image_std": [
0.229,
0.224,
0.225
],
"pad_size": null,
"resample": 2,
"rescale_factor": 0.00392156862745098,
"size": {
"longest_edge": 1333,
"shortest_edge": 800
}
}
model : ๐ค DETR ๊ฐ์ฒด ๊ฐ์ง ๋ชจ๋ธ (DetrForObjectDetection)
์ญํ : ์ ์ฒ๋ฆฌ๋ ์ด๋ฏธ์ง๋ฅผ ์ ๋ ฅ๋ฐ์ ์ด๋ฏธ์ง ๋ด์ ๊ฐ์ฒด๋ฅผ ๊ฐ์ง(Object Detection)ํ๊ณ , ํด๋น ๊ฐ์ฒด์ ์์น์ ํด๋์ค๋ฅผ ์์ธกํ๋ ํต์ฌ์ ์ธ ์ญํ ์ํ
์ฃผ์ ์์ :
- ์ด๋ฏธ์ง ํน์ง ์ถ์ถ (Feature Extraction): ์ ๋ ฅ ์ด๋ฏธ์ง์์ ๊ฐ์ฒด ๊ฐ์ง์ ์ค์ํ ์๊ฐ์ ํน์ง๋ค์ ์ถ์ถ
- ํธ๋์คํฌ๋จธ ์ธ์ฝ๋-๋์ฝ๋ (Transformer Encoder-Decoder): ์ถ์ถ๋ ํน์ง๋ค์ ํธ๋์คํฌ๋จธ ๊ตฌ์กฐ๋ฅผ ํตํด ์ฒ๋ฆฌํ์ฌ ์ด๋ฏธ์ง ๋ด ๊ฐ์ฒด ๊ฐ์ ๊ด๊ณ๋ฅผ ํ์ ํ๊ณ , ๊ฐ ๊ฐ์ฒด์ ์ ๋ณด๋ฅผ ํ์ต
- ๊ฐ์ฒด ์์ธก (Object Prediction): ์ต์ข ์ ์ผ๋ก ์ด๋ฏธ์ง ๋ด์ ์กด์ฌํ๋ ๊ฐ์ฒด๋ค์ ๋ฐ์ด๋ฉ ๋ฐ์ค ์ขํ, ํด๋น ๊ฐ์ฒด์ ํด๋์ค ๋ ์ด๋ธ, ๊ทธ๋ฆฌ๊ณ ์์ธก์ ์ ๋ขฐ๋ ์ ์ ์ถ๋ ฅ
์๋์ ๊ฐ์ด DETR์ ๋ชจ๋ธ๋ก ๊ตฌ์ฑ๋จ์ ๋ณผ์ ์์ต๋๋ค!!
2. DETR๋ก ๊ฐ์ฒดํ์ง ์์!
๊ฐ๋จํ ์ฝ๋ ๋ช์ค์ด๋ฉด ๋!!!
์์ ๊ฐ์ด ์ฌ๋ฌ์ฌ๋๋ค์ด ํ๋ฆฌ์ค๋น๋ก ๋๊ณ ์๋ ์ด๋ฏธ์ง๋ฅผ ์ค๋นํด๋ณด์์ต๋๋ค!! ๊ทธ๋ฆฌ๊ณ @!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import torch
import torchvision.transforms as T
from PIL import Image
from transformers import DetrImageProcessor, DetrForObjectDetection
# 1๏ธโฃ ๋๋ฐ์ด์ค ์ค์ (GPU ์ฌ์ฉ ๊ฐ๋ฅํ๋ฉด CUDA๋ก ์ค์ )
device = "cuda" if torch.cuda.is_available() else "cpu"
# 2๏ธโฃ DETR ๋ชจ๋ธ ๋ฐ ํ๋ก์ธ์ ๋ก๋ (์ฌ์ ํ์ต๋ ๋ชจ๋ธ)
model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50").to(device)
processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")
# 3๏ธโฃ ๋ก์ปฌ ๋๋ ํ ๋ฆฌ์ bike.jpg ์ด๋ฏธ์ง ๋ก๋
image_path = "catch_frisbee.jpg"
image = Image.open(image_path)
# 4๏ธโฃ ์ด๋ฏธ์ง ์ ์ฒ๋ฆฌ (DETR ๋ชจ๋ธ ์
๋ ฅ ํํ๋ก ๋ณํ)
inputs = processor(images=image, return_tensors="pt").to(device)
# 5๏ธโฃ ๋ชจ๋ธ ์ถ๋ก
with torch.no_grad():
outputs = model(**inputs)
# 6๏ธโฃ ๊ฒฐ๊ณผ ํ์ฒ๋ฆฌ (Bounding Box & Labels ๋ณํ)
target_sizes = torch.tensor([image.size[::-1]]) # (height, width) ํ์
results = processor.post_process_object_detection(outputs, target_sizes=target_sizes)[0]
# 7๏ธโฃ ๊ฐ์ง๋ ๊ฐ์ฒด ์ถ๋ ฅ
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
if score > 0.7: # ์ ๋ขฐ๋ 70% ์ด์์ธ ๊ฐ์ฒด๋ง ์ถ๋ ฅ
box = [round(i, 2) for i in box.tolist()]
print(f"Detected {model.config.id2label[label.item()]} with confidence {round(score.item(), 3)} at {box}")
์์ ์ฝ๋๋ฅผ ๊ฐ๋จํ๊ฒ ๋ถ์ํด๋ณด๋ฉด,
- ๋ชจ๋ธ์ ๋ก๋ํ๊ณ
catch_frisbee
์ด๋ฏธ์ง๋ฅผ ๋ก๋ํ๊ณ !processor
๋ฅผ ํตํด ์ ์ฒ๋ฆฌํ๊ณ ,model
์ ๋ฃ์ด์!! ์ถ๋ก ํ ๋ค.results
์์ ํ์ง๋ ๋ด์ฉ print ํ๊ธฐ!!
๊ทธ๋ผ ๊ทธ output์!!
์๋์ ๊ฐ์ด!! ํ์ง๋ ๊ฐ์ฒด์, ๊ทธ ์ ํ๋(confidence), ๋ง์ง๋ง์ผ๋ก ๋ฐ์ด๋ฉ ๋ฐ์ค ์ขํ๋ฅผ ์๋ ค์ค๋๋ค@!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Detected person with confidence 0.783 at [12.91, 355.33, 32.23, 383.66]
Detected person with confidence 0.999 at [279.08, 255.76, 365.66, 423.82]
Detected person with confidence 0.995 at [533.57, 280.23, 584.71, 401.82]
Detected umbrella with confidence 0.744 at [459.41, 324.56, 496.24, 340.89]
Detected person with confidence 0.933 at [488.93, 340.06, 510.23, 376.37]
Detected person with confidence 0.835 at [0.01, 355.79, 11.03, 384.31]
Detected person with confidence 0.906 at [261.05, 346.35, 284.02, 378.22]
Detected person with confidence 0.99 at [574.15, 301.1, 605.79, 395.45]
Detected person with confidence 0.713 at [244.5, 349.68, 262.29, 378.9]
Detected person with confidence 0.997 at [132.21, 31.6, 310.32, 329.97]
Detected person with confidence 0.732 at [349.66, 352.63, 365.67, 378.28]
Detected person with confidence 0.796 at [209.17, 326.9, 232.89, 355.65]
Detected person with confidence 0.777 at [149.0, 347.84, 169.28, 381.43]
Detected person with confidence 0.991 at [163.45, 299.99, 206.14, 399.0]
Detected frisbee with confidence 1.0 at [181.55, 139.33, 225.96, 161.49]
Detected person with confidence 0.734 at [200.95, 350.37, 229.14, 380.88]
Detected person with confidence 0.737 at [467.46, 347.11, 483.07, 376.49]
Detected person with confidence 0.978 at [413.58, 253.38, 465.11, 416.57]
Detected person with confidence 0.73 at [597.38, 342.37, 613.34, 380.89]
Detected person with confidence 0.998 at [304.64, 70.92, 538.5, 410.45]
3. ๊ฐ์ฒด ํ์ง๊ฒฐ๊ณผ๋ฌผ์ ์๊ฐํ!!
๋จ์ ํ ์คํธ ํ์ง๊ฐ ์๋๋ผ ๊ทธ๋ฆผ์ ๋ฐ์ด๋ฉ๋ฐ์ค๋ก ํ์ํด๋ด ๋๋ค!@
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
import torch
import torchvision.transforms as T
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from PIL import Image
from transformers import DetrImageProcessor, DetrForObjectDetection
# 1๏ธโฃ ๋๋ฐ์ด์ค ์ค์ (GPU ์ฌ์ฉ ๊ฐ๋ฅํ๋ฉด CUDA๋ก ์ค์ )
device = "cuda" if torch.cuda.is_available() else "cpu"
# 2๏ธโฃ DETR ๋ชจ๋ธ ๋ฐ ํ๋ก์ธ์ ๋ก๋ (์ฌ์ ํ์ต๋ ๋ชจ๋ธ)
model = DetrForObjectDetection.from_pretrained("facebook/detr-resnet-50").to(device)
processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50")
# 3๏ธโฃ ๋ก์ปฌ ๋๋ ํ ๋ฆฌ์ bike.jpg ์ด๋ฏธ์ง ๋ก๋
image_path = "catch_frisbee.jpg"
image = Image.open(image_path)
# 4๏ธโฃ ์ด๋ฏธ์ง ์ ์ฒ๋ฆฌ (DETR ๋ชจ๋ธ ์
๋ ฅ ํํ๋ก ๋ณํ)
inputs = processor(images=image, return_tensors="pt").to(device)
# 5๏ธโฃ ๋ชจ๋ธ ์ถ๋ก
with torch.no_grad():
outputs = model(**inputs)
# 6๏ธโฃ ๊ฒฐ๊ณผ ํ์ฒ๋ฆฌ (Bounding Box & Labels ๋ณํ)
target_sizes = torch.tensor([image.size[::-1]]) # (height, width) ํ์
results = processor.post_process_object_detection(outputs, target_sizes=target_sizes)[0]
# 7๏ธโฃ ๊ฐ์ง๋ ๊ฐ์ฒด๋ฅผ ์ด๋ฏธ์ง์ Bounding Box๋ก ์๊ฐํ
fig, ax = plt.subplots(1, figsize=(10, 6))
ax.imshow(image)
# Bounding Box ๊ทธ๋ฆฌ๊ธฐ
for score, label, box in zip(results["scores"], results["labels"], results["boxes"]):
if score > 0.7: # ๐น ์ ๋ขฐ๋ 70% ์ด์์ธ ๊ฐ์ฒด๋ง ์๊ฐํ
box = [round(i, 2) for i in box.tolist()]
x, y, w, h = box
rect = patches.Rectangle((x, y), w-x, h-y, linewidth=2, edgecolor='r', facecolor='none')
ax.add_patch(rect)
ax.text(x, y, f"{model.config.id2label[label.item()]}: {round(score.item(), 2)}",
fontsize=12, bbox=dict(facecolor='yellow', alpha=0.5))
# 8๏ธโฃ ๊ฒฐ๊ณผ ์ ์ฅ
output_path = "detr_output.jpg" # ๐น ์ ์ฅํ ํ์ผ๋ช
plt.axis("off") # ๐น ์ถ ์ ๊ฑฐ
plt.savefig(output_path, bbox_inches="tight")
plt.show()
print(f"Detection result saved as {output_path}")
์ ์ฝ๋๋ฅผ ํตํ์ฌ,
๊ฐ์ง๋ ๊ฐ์ฒด๋ฅผ ์๊ฐํํ๊ณ
detr_output.jpg
๋ก๋ ์ ์ฅํ๊ฒ๋ฉ๋๋ค~!!
๊ฐ์ฒด ํ์ง, ์ฐธ ์ฝ์ฃ ~?
๋ค๋ง, 1๊ฐ ์ด๋ฏธ์ง์์ ๊ฐ์ฒด ํ์ง์ ์๊ฐ์ด 8.5์ด๊ฐ ์์,, ์ญ์ ์ข ์ค๋๊ฑธ๋ฆฌ๋ค์!