Post

๐Ÿฆ– DINO Practice: Running Object Detection with Pretrained Models - DINO ์‹ค์Šต: ๋ชจ๋ธ์„ ๋ฐ›์•„ ์ง์ ‘ ๊ฐ์ฒด ํƒ์ง€ ํ•ด๋ณด๊ธฐ!

๐Ÿฆ– DINO Practice: Running Object Detection with Pretrained Models - DINO ์‹ค์Šต: ๋ชจ๋ธ์„ ๋ฐ›์•„ ์ง์ ‘ ๊ฐ์ฒด ํƒ์ง€ ํ•ด๋ณด๊ธฐ!

(English) ๐Ÿฆ– DINO Practice: Running Object Detection with Pretrained Models!

In the previous post, I studied the core ideas behind DINO!!!
Now, letโ€™s dive into actually installing the DINO model from GitHub and running object detection on our own images~!

result

  • Starting with the result, as always!!
  • Just like DETR, it detects multiple objects in the image and visualizes them!!
  • Letโ€™s walk through each step together in Python code!
  • This time, thereโ€™s no simple pip install โ€“ we need to download and build from the GitHub repo. Letโ€™s get started!! ๐Ÿ˜„

๐Ÿ”— 1. Clone the DINO repository from GitHub

๐Ÿ”— Official GitHub Repo

A very well-organized GitHub repository.
Huge thanks to the author!! ๐Ÿ™


๐Ÿ“ฆ 2. Create a virtual environment and install dependencies

1
2
3
4
5
6
7
8
9
10
conda create --name DINO python=3.9
conda activate DINO

sudo apt update && sudo apt install -y build-essential python3-dev
pip install cython

conda install -c conda-forge libstdcxx-ng
conda install -c pytorch pytorch torchvision

pip install -r requirements.txt

โš™๏ธ 3. Build and test the model code

1
2
3
4
5
6
7
cd models/dino/ops
python setup.py build install

# Unit test - should print all True if successful
python test.py

cd ../../..

๐Ÿ—‚๏ธ 4. Prepare the COCO2017 dataset

Organize the COCO dataset as follows:

1
2
3
4
5
6
COCODIR/
โ”œโ”€โ”€ train2017/
โ”œโ”€โ”€ val2017/
โ””โ”€โ”€ annotations/
    โ”œโ”€โ”€ instances_train2017.json
    โ””โ”€โ”€ instances_val2017.json

๐Ÿ“ฅ 5. Download Pretrained Model

You can download the pretrained DINO-4scale, Swin-L (36 epochs) model from the following Google Drive link:

๐Ÿ”— https://drive.google.com/drive/folders/1qD5m1NmK0kjE5hh-G17XUX751WsEG-h_?usp=sharing


๐Ÿง  6. Load the Model and Run Inference

Import Packages

1
2
3
4
5
6
7
8
9
10
11
12
import os, sys
import torch, json
import numpy as np

from main import build_model_main
from util.slconfig import SLConfig
from datasets import build_dataset
from util.visualizer import COCOVisualizer
from util import box_ops

from PIL import Image
import datasets.transforms as T

Load COCO Class ID-to-Name Mapping

1
2
3
with open('{your_path1}/DINO/util/coco_id2name.json') as f:
    id2name = json.load(f)
    id2name = {int(k):v for k,v in id2name.items()}

Load Model Configuration and Checkpoint

1
2
3
4
5
6
7
8
9
10
model_config_path = "{your_path1}/DINO/config/DINO/DINO_4scale.py"
model_checkpoint_path = "{your_path2}/checkpoint0033_4scale.pth"

args = SLConfig.fromfile(model_config_path) 
args.device = 'cuda' 
model, criterion, postprocessors = build_model_main(args)

checkpoint = torch.load(model_checkpoint_path, map_location='cpu')
model.load_state_dict(checkpoint['model'])
_ = model.eval()

Load and Preprocess the Image

1
2
3
4
5
6
7
8
image = Image.open("{image_path}/catch_rugby_ball_001480.jpg").convert("RGB")

transform = T.Compose([
    T.RandomResize([800], max_size=1333),
    T.ToTensor(),
    T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
image, _ = transform(image, None)

Run Inference

1
2
output = model.cuda()(image[None].cuda())
output = postprocessors['bbox'](output, torch.Tensor([[1.0, 1.0]]).cuda())[0]

๐Ÿ–ผ๏ธ 7. Visualize the Results

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
thershold = 0.3  # set confidence threshold

vslzr = COCOVisualizer()

scores = output['scores']
labels = output['labels']
boxes = box_ops.box_xyxy_to_cxcywh(output['boxes'])
select_mask = scores > thershold

box_label = [id2name[int(item)] for item in labels[select_mask]]
pred_dict = {
    'boxes': boxes[select_mask],
    'size': torch.Tensor([image.shape[1], image.shape[2]]),
    'box_label': box_label
}

vslzr.visualize(image, pred_dict, savedir=None, dpi=100)

And just like that โ€“ boom! The detection works beautifully! ๐Ÿ™Œ

result


โœ… Wrap-Up

At first, I was a bit nervous about using models not yet uploaded to timm or HuggingFace,
but once I got into it, everything was quite manageable!
Especially in this case, the DINO repo was incredibly well organized,
which made the process even smoother.

Iโ€™ll continue experimenting with more models and sharing my findings. Stay tuned! ๐Ÿš€


(ํ•œ๊ตญ์–ด) ๐Ÿฆ– DINO ์‹ค์Šต: ๋ชจ๋ธ์„ ๋ฐ›์•„ ์ง์ ‘ ๊ฐ์ฒด ํƒ์ง€ ํ•ด๋ณด๊ธฐ!

์ง€๋‚œ ํฌ์ŠคํŒ… ์—์„œ DINO์˜ ์›๋ฆฌ์— ๋Œ€ํ•˜ ๊ณต๋ถ€ํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค!!!
์ด์  , DINO ๋ชจ๋ธ์„ git repo๋กœ๋ถ€ํ„ฐ ์ง์ ‘ ์„ค์น˜ํ•˜์—ฌ ๊ฐ์ฒด ํƒ์ง€(Object Detection)์„ ์ง„ํ–‰ํ•ด ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค~!

result

  • ์˜ค๋Š˜๋„๋„ ๊ฒฐ๋ก ๋ถ€ํ„ฐ!!!
  • DETR๊ณผ ๋™์ผํ•˜๊ฒŒ ์ด๋ฏธ์ง€์—์„œ ํƒ์ง€๋œ ์—ฌ๋Ÿฌ ๊ฐ์ฒด๋“ค์„ ์ฐพ์•„์„œ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค!!
  • ํ•จ๊ป˜, ํŒŒ์ด์ฌ ์ฝ”๋“œ๋กœ ๊ทธ ๊ณผ์ •์„ ์•Œ์•„๋ณด์•„์š”!!
  • ์ด๋ฒˆ์—๋Š” ๊ฐ„๋‹จํ•œ ๋ชจ๋ธ์ด ์—†์–ด git ์—์„œ ๋ชจ๋ธ์„ ๋ฐ›๊ณ  ์„ค์น˜ํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค~ ์ž˜ ๋”ฐ๋ผํ•ด๋ณด์„ธ์š”!!^^

๐Ÿ”— 1. GIT ์ €์žฅ์†Œ์—์„œ DINO ๋ชจ๋ธ ๋ฐ›๊ธฐ!!

๐Ÿ”— ๊ณต์‹ GitHub ์ €์žฅ์†Œ

์•„์ฃผ ์ž˜ ์ •๋ฆฌ๋œ git repo์ž…๋‹ˆ๋‹ค~!
์ž‘์„ฑ์ž๋ถ„๊ป˜ ๊ฐ์‚ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค!!^^


๐Ÿ“ฆ 2. ๊ฐ€์ƒํ™˜๊ฒฝ ๋ฐ ์˜์กด์„ฑ ์„ค์น˜

1
2
3
4
5
6
7
8
9
10
conda create --name DINO python=3.9
conda activate DINO

sudo apt update && sudo apt install -y build-essential python3-dev
pip install cython

conda install -c conda-forge libstdcxx-ng
conda install -c pytorch pytorch torchvision

pip install -r requirements.txt

โš™๏ธ 3. ๋ชจ๋ธ ์ฝ”๋“œ ์ปดํŒŒ์ผ ๋ฐ ํ…Œ์ŠคํŠธ

1
2
3
4
5
6
7
cd models/dino/ops
python setup.py build install

# ์œ ๋‹› ํ…Œ์ŠคํŠธ - ๋ชจ๋‘ True๊ฐ€ ์ถœ๋ ฅ๋˜๋ฉด ์„ฑ๊ณต
python test.py

cd ../../..

๐Ÿ—‚๏ธ 4. COCO2017 ๋ฐ์ดํ„ฐ์…‹ ์ค€๋น„

COCO2017 ๋ฐ์ดํ„ฐ๋ฅผ ์•„๋ž˜ ๊ตฌ์กฐ๋กœ ์ •๋ฆฌํ•ฉ๋‹ˆ๋‹ค:

1
2
3
4
5
6
COCODIR/
โ”œโ”€โ”€ train2017/
โ”œโ”€โ”€ val2017/
โ””โ”€โ”€ annotations/
โ”œโ”€โ”€ instances_train2017.json
โ””โ”€โ”€ instances_val2017.json

๐Ÿ“ฅ 5. ์‚ฌ์ „ ํ•™์Šต๋œ ๋ชจ๋ธ ๋‹ค์šด๋กœ๋“œ

DINO-4scale, Swin-L, 36 epoch ๊ธฐ์ค€ ๋ชจ๋ธ์€ ์•„๋ž˜ Google Drive ๋งํฌ์—์„œ ๋‹ค์šด๋กœ๋“œํ•ฉ๋‹ˆ๋‹ค:

๐Ÿ”— https://drive.google.com/drive/folders/1qD5m1NmK0kjE5hh-G17XUX751WsEG-h_?usp=sharing


๐Ÿง  6. ๋ชจ๋ธ ๋กœ๋“œ ๋ฐ ์ด๋ฏธ์ง€ ์˜ˆ์ธก

ํŒจํ‚ค์ง€ ์ž„ํฌํŠธ

1
2
3
4
5
6
7
8
9
10
11
12
import os, sys
import torch, json
import numpy as np

from main import build_model_main
from util.slconfig import SLConfig
from datasets import build_dataset
from util.visualizer import COCOVisualizer
from util import box_ops

from PIL import Image
import datasets.transforms as T

COCO ํด๋ž˜์Šค ID ๋งคํ•‘ ํŒŒ์ผ ๋กœ๋“œ

1
2
3
with open('{์•Œ๋งž์€์œ„์น˜1}/DINO/util/coco_id2name.json') as f:
    id2name = json.load(f)
    id2name = {int(k):v for k,v in id2name.items()}

๋ชจ๋ธ ๊ตฌ์„ฑ ๋ฐ ์ฒดํฌํฌ์ธํŠธ ๋กœ๋“œ

1
2
3
4
5
6
7
8
9
10
model_config_path = "{์•Œ๋งž์€์œ„์น˜1}/DINO/config/DINO/DINO_4scale.py"
model_checkpoint_path = "{์•Œ๋งž์€์œ„์น˜2}/checkpoint0033_4scale.pth"

args = SLConfig.fromfile(model_config_path) 
args.device = 'cuda' 
model, criterion, postprocessors = build_model_main(args)

checkpoint = torch.load(model_checkpoint_path, map_location='cpu')
model.load_state_dict(checkpoint['model'])
_ = model.eval()

์ด๋ฏธ์ง€ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ ๋ฐ ์ „์ฒ˜๋ฆฌ

1
2
3
4
5
6
7
8
image = Image.open("{์ด๋ฏธ์ง€์œ„์น˜}/catch_rugby_ball_001480.jpg").convert("RGB")

transform = T.Compose([
    T.RandomResize([800], max_size=1333),
    T.ToTensor(),
    T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
image, _ = transform(image, None)

์ด๋ฏธ์ง€ ์˜ˆ์ธก

1
2
output = model.cuda()(image[None].cuda())
output = postprocessors['bbox'](output, torch.Tensor([[1.0, 1.0]]).cuda())[0]

๐Ÿ–ผ๏ธ 7. ๊ฒฐ๊ณผ ์‹œ๊ฐํ™”

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
thershold = 0.3  # ์ž„๊ณ„๊ฐ’ ์„ค์ •

vslzr = COCOVisualizer()

scores = output['scores']
labels = output['labels']
boxes = box_ops.box_xyxy_to_cxcywh(output['boxes'])
select_mask = scores > thershold

box_label = [id2name[int(item)] for item in labels[select_mask]]
pred_dict = {
    'boxes': boxes[select_mask],
    'size': torch.Tensor([image.shape[1], image.shape[2]]),
    'box_label': box_label
}

vslzr.visualize(image, pred_dict, savedir=None, dpi=100)

ํ•˜๋ฉด~~ ์ง  ์•„๋ž˜์™€ ๊ฐ™์ด!! ์ž˜ detection ํ•˜๋„ค์š”~~

result


โœ… ๋งˆ๋ฌด๋ฆฌ

timm, huggingface ๋“ฑ์— ๋ชจ๋ธ์ด ์—…๋ฐ์ดํŠธ ๋˜์ง€ ์•Š์œผ๋ฉด ๋งŒ๋งŒ์น˜ ์•Š๊ฒ ๋‹ค๋ผ๋Š” ๋‘๋ ค์›€์ด์ƒ๊ธฐ์ง€๋งŒ~,
๋ง‰์ƒ ํ•ด๋ณด๋ฉด ๋‹ค ํ• ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค!
ํŠนํžˆ ์ด๋ฒˆ DINO๋Š” ๋ชจ๋ธ ์—…๋กœ๋“œํ•œ ์ž‘์„ฑ์ž๊ฐ€ ๊น”๋”ํ•˜๊ฒŒ ์ •๋ฆฌ๋˜์–ด์žˆ์–ด ๋”์šฑ ์‰ฝ๊ฒŒํ• ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค!!

์•ž์œผ๋กœ๋„ ์—ฌ๋Ÿฌ ๋ชจ๋ธ์„ ํ…Œ์ŠคํŠธํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค!!

This post is licensed under CC BY 4.0 by the author.