(English) ๐ฆ DINO Practice: Running Object Detection with Pretrained Models!
In the previous post, I studied the core ideas behind DINO!!!
Now, letโs dive into actually installing the DINO model from GitHub and running object detection on our own images~!
- Starting with the result, as always!!
- Just like DETR, it detects multiple objects in the image and visualizes them!!
- Letโs walk through each step together in Python code!
- This time, thereโs no simple pip install โ we need to download and build from the GitHub repo. Letโs get started!! ๐
๐ 1. Clone the DINO repository from GitHub
๐ Official GitHub Repo
A very well-organized GitHub repository.
Huge thanks to the author!! ๐
๐ฆ 2. Create a virtual environment and install dependencies
1
2
3
4
5
6
7
8
9
10
| conda create --name DINO python=3.9
conda activate DINO
sudo apt update && sudo apt install -y build-essential python3-dev
pip install cython
conda install -c conda-forge libstdcxx-ng
conda install -c pytorch pytorch torchvision
pip install -r requirements.txt
|
โ๏ธ 3. Build and test the model code
1
2
3
4
5
6
7
| cd models/dino/ops
python setup.py build install
# Unit test - should print all True if successful
python test.py
cd ../../..
|
๐๏ธ 4. Prepare the COCO2017 dataset
Organize the COCO dataset as follows:
1
2
3
4
5
6
| COCODIR/
โโโ train2017/
โโโ val2017/
โโโ annotations/
โโโ instances_train2017.json
โโโ instances_val2017.json
|
๐ฅ 5. Download Pretrained Model
You can download the pretrained DINO-4scale, Swin-L (36 epochs) model from the following Google Drive link:
๐ https://drive.google.com/drive/folders/1qD5m1NmK0kjE5hh-G17XUX751WsEG-h_?usp=sharing
๐ง 6. Load the Model and Run Inference
Import Packages
1
2
3
4
5
6
7
8
9
10
11
12
| import os, sys
import torch, json
import numpy as np
from main import build_model_main
from util.slconfig import SLConfig
from datasets import build_dataset
from util.visualizer import COCOVisualizer
from util import box_ops
from PIL import Image
import datasets.transforms as T
|
Load COCO Class ID-to-Name Mapping
1
2
3
| with open('{your_path1}/DINO/util/coco_id2name.json') as f:
id2name = json.load(f)
id2name = {int(k):v for k,v in id2name.items()}
|
Load Model Configuration and Checkpoint
1
2
3
4
5
6
7
8
9
10
| model_config_path = "{your_path1}/DINO/config/DINO/DINO_4scale.py"
model_checkpoint_path = "{your_path2}/checkpoint0033_4scale.pth"
args = SLConfig.fromfile(model_config_path)
args.device = 'cuda'
model, criterion, postprocessors = build_model_main(args)
checkpoint = torch.load(model_checkpoint_path, map_location='cpu')
model.load_state_dict(checkpoint['model'])
_ = model.eval()
|
Load and Preprocess the Image
1
2
3
4
5
6
7
8
| image = Image.open("{image_path}/catch_rugby_ball_001480.jpg").convert("RGB")
transform = T.Compose([
T.RandomResize([800], max_size=1333),
T.ToTensor(),
T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
image, _ = transform(image, None)
|
Run Inference
1
2
| output = model.cuda()(image[None].cuda())
output = postprocessors['bbox'](output, torch.Tensor([[1.0, 1.0]]).cuda())[0]
|
๐ผ๏ธ 7. Visualize the Results
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| thershold = 0.3 # set confidence threshold
vslzr = COCOVisualizer()
scores = output['scores']
labels = output['labels']
boxes = box_ops.box_xyxy_to_cxcywh(output['boxes'])
select_mask = scores > thershold
box_label = [id2name[int(item)] for item in labels[select_mask]]
pred_dict = {
'boxes': boxes[select_mask],
'size': torch.Tensor([image.shape[1], image.shape[2]]),
'box_label': box_label
}
vslzr.visualize(image, pred_dict, savedir=None, dpi=100)
|
And just like that โ boom! The detection works beautifully! ๐
โ
Wrap-Up
At first, I was a bit nervous about using models not yet uploaded to timm
or HuggingFace,
but once I got into it, everything was quite manageable!
Especially in this case, the DINO repo was incredibly well organized,
which made the process even smoother.
Iโll continue experimenting with more models and sharing my findings. Stay tuned! ๐
(ํ๊ตญ์ด) ๐ฆ DINO ์ค์ต: ๋ชจ๋ธ์ ๋ฐ์ ์ง์ ๊ฐ์ฒด ํ์ง ํด๋ณด๊ธฐ!
์ง๋ ํฌ์คํ
์์ DINO์ ์๋ฆฌ์ ๋ํ ๊ณต๋ถํด๋ณด์์ต๋๋ค!!!
์ด์ , DINO ๋ชจ๋ธ์ git repo๋ก๋ถํฐ ์ง์ ์ค์นํ์ฌ ๊ฐ์ฒด ํ์ง(Object Detection)์ ์งํํด ๋ณด๊ฒ ์ต๋๋ค~!
- ์ค๋๋๋ ๊ฒฐ๋ก ๋ถํฐ!!!
- DETR๊ณผ ๋์ผํ๊ฒ ์ด๋ฏธ์ง์์ ํ์ง๋ ์ฌ๋ฌ ๊ฐ์ฒด๋ค์ ์ฐพ์์ ๋ณด์ฌ์ค๋๋ค!!
- ํจ๊ป, ํ์ด์ฌ ์ฝ๋๋ก ๊ทธ ๊ณผ์ ์ ์์๋ณด์์!!
- ์ด๋ฒ์๋ ๊ฐ๋จํ ๋ชจ๋ธ์ด ์์ด git ์์ ๋ชจ๋ธ์ ๋ฐ๊ณ ์ค์นํด์ผํฉ๋๋ค~ ์ ๋ฐ๋ผํด๋ณด์ธ์!!^^
๐ 1. GIT ์ ์ฅ์์์ DINO ๋ชจ๋ธ ๋ฐ๊ธฐ!!
๐ ๊ณต์ GitHub ์ ์ฅ์
์์ฃผ ์ ์ ๋ฆฌ๋ git repo์
๋๋ค~!
์์ฑ์๋ถ๊ป ๊ฐ์ฌ๋๋ฆฝ๋๋ค!!^^
๐ฆ 2. ๊ฐ์ํ๊ฒฝ ๋ฐ ์์กด์ฑ ์ค์น
1
2
3
4
5
6
7
8
9
10
| conda create --name DINO python=3.9
conda activate DINO
sudo apt update && sudo apt install -y build-essential python3-dev
pip install cython
conda install -c conda-forge libstdcxx-ng
conda install -c pytorch pytorch torchvision
pip install -r requirements.txt
|
โ๏ธ 3. ๋ชจ๋ธ ์ฝ๋ ์ปดํ์ผ ๋ฐ ํ
์คํธ
1
2
3
4
5
6
7
| cd models/dino/ops
python setup.py build install
# ์ ๋ ํ
์คํธ - ๋ชจ๋ True๊ฐ ์ถ๋ ฅ๋๋ฉด ์ฑ๊ณต
python test.py
cd ../../..
|
๐๏ธ 4. COCO2017 ๋ฐ์ดํฐ์
์ค๋น
COCO2017 ๋ฐ์ดํฐ๋ฅผ ์๋ ๊ตฌ์กฐ๋ก ์ ๋ฆฌํฉ๋๋ค:
1
2
3
4
5
6
| COCODIR/
โโโ train2017/
โโโ val2017/
โโโ annotations/
โโโ instances_train2017.json
โโโ instances_val2017.json
|
๐ฅ 5. ์ฌ์ ํ์ต๋ ๋ชจ๋ธ ๋ค์ด๋ก๋
DINO-4scale, Swin-L, 36 epoch ๊ธฐ์ค ๋ชจ๋ธ์ ์๋ Google Drive ๋งํฌ์์ ๋ค์ด๋ก๋ํฉ๋๋ค:
๐ https://drive.google.com/drive/folders/1qD5m1NmK0kjE5hh-G17XUX751WsEG-h_?usp=sharing
๐ง 6. ๋ชจ๋ธ ๋ก๋ ๋ฐ ์ด๋ฏธ์ง ์์ธก
ํจํค์ง ์ํฌํธ
1
2
3
4
5
6
7
8
9
10
11
12
| import os, sys
import torch, json
import numpy as np
from main import build_model_main
from util.slconfig import SLConfig
from datasets import build_dataset
from util.visualizer import COCOVisualizer
from util import box_ops
from PIL import Image
import datasets.transforms as T
|
COCO ํด๋์ค ID ๋งคํ ํ์ผ ๋ก๋
1
2
3
| with open('{์๋ง์์์น1}/DINO/util/coco_id2name.json') as f:
id2name = json.load(f)
id2name = {int(k):v for k,v in id2name.items()}
|
๋ชจ๋ธ ๊ตฌ์ฑ ๋ฐ ์ฒดํฌํฌ์ธํธ ๋ก๋
1
2
3
4
5
6
7
8
9
10
| model_config_path = "{์๋ง์์์น1}/DINO/config/DINO/DINO_4scale.py"
model_checkpoint_path = "{์๋ง์์์น2}/checkpoint0033_4scale.pth"
args = SLConfig.fromfile(model_config_path)
args.device = 'cuda'
model, criterion, postprocessors = build_model_main(args)
checkpoint = torch.load(model_checkpoint_path, map_location='cpu')
model.load_state_dict(checkpoint['model'])
_ = model.eval()
|
์ด๋ฏธ์ง ๋ถ๋ฌ์ค๊ธฐ ๋ฐ ์ ์ฒ๋ฆฌ
1
2
3
4
5
6
7
8
| image = Image.open("{์ด๋ฏธ์ง์์น}/catch_rugby_ball_001480.jpg").convert("RGB")
transform = T.Compose([
T.RandomResize([800], max_size=1333),
T.ToTensor(),
T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
image, _ = transform(image, None)
|
์ด๋ฏธ์ง ์์ธก
1
2
| output = model.cuda()(image[None].cuda())
output = postprocessors['bbox'](output, torch.Tensor([[1.0, 1.0]]).cuda())[0]
|
๐ผ๏ธ 7. ๊ฒฐ๊ณผ ์๊ฐํ
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| thershold = 0.3 # ์๊ณ๊ฐ ์ค์
vslzr = COCOVisualizer()
scores = output['scores']
labels = output['labels']
boxes = box_ops.box_xyxy_to_cxcywh(output['boxes'])
select_mask = scores > thershold
box_label = [id2name[int(item)] for item in labels[select_mask]]
pred_dict = {
'boxes': boxes[select_mask],
'size': torch.Tensor([image.shape[1], image.shape[2]]),
'box_label': box_label
}
vslzr.visualize(image, pred_dict, savedir=None, dpi=100)
|
ํ๋ฉด~~ ์ง ์๋์ ๊ฐ์ด!! ์ detection ํ๋ค์~~
โ
๋ง๋ฌด๋ฆฌ
timm, huggingface ๋ฑ์ ๋ชจ๋ธ์ด ์
๋ฐ์ดํธ ๋์ง ์์ผ๋ฉด ๋ง๋ง์น ์๊ฒ ๋ค๋ผ๋ ๋๋ ค์์ด์๊ธฐ์ง๋ง~,
๋ง์ ํด๋ณด๋ฉด ๋ค ํ ์ ์์ต๋๋ค!
ํนํ ์ด๋ฒ DINO๋ ๋ชจ๋ธ ์
๋ก๋ํ ์์ฑ์๊ฐ ๊น๋ํ๊ฒ ์ ๋ฆฌ๋์ด์์ด ๋์ฑ ์ฝ๊ฒํ ์ ์์์ต๋๋ค!!
์์ผ๋ก๋ ์ฌ๋ฌ ๋ชจ๋ธ์ ํ
์คํธํด๋ณด๊ฒ ์ต๋๋ค!!