Post

grounding-DINO ์‹ค์Šต with python!

grounding-DINO ์‹ค์Šต with python!

๐Ÿฆ– (English) Grounding DINO Practice! Running the Evolved Model of DINO

This post is a hands-on guide to the Grounding DINO, the evolved version of DINO!
Just like DINO, we clone the model from GitHub and run it โ€” and surprisingly, itโ€™s even simpler ๐Ÿ˜Š
So this time, weโ€™ll skip the theory for now and jump straight into running the code!!


๐Ÿงฑ 1. Clone the GitHub Repository

\($ git clone https://github.com/IDEA-Research/GroundingDINO.git cd GroundingDINO/\)$


๐Ÿ“ฆ 2. Install the Model

\($ pip install -e .\)$


๐ŸงŠ 3. Download the Pretrained Weights

\($ mkdir weights cd weights/ wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth cd ..\)$


๐Ÿš€ 4. Run the Inference (Command Template)

The command structure for running Grounding DINO is shown below.
Each part marked with {} can be replaced with the appropriate value for your environment:

\($ CUDA_VISIBLE_DEVICES={GPU_ID} python demo/inference_on_a_image.py \ -c {CONFIG_PATH} \ -p {CHECKPOINT_PATH} \ -i {INPUT_IMAGE_PATH} \ -o {OUTPUT_DIR} \ -t "{TEXT_PROMPT}"\)$

Example values:

VariableDescription
{GPU_ID}GPU ID to use (e.g., 0) โ€” you can check it using nvidia-smi
{CONFIG_PATH}Config file path (e.g., groundingdino/config/GroundingDINO_SwinT_OGC.py)
{CHECKPOINT_PATH}Path to the pretrained weights (e.g., weights/groundingdino_swint_ogc.pth)
{INPUT_IMAGE_PATH}Input image path (e.g., /home/user/images/catch_rugby_ball_001480.jpg)
{OUTPUT_DIR}Directory to save the result (e.g., /home/user/images/GroundingDINO/results)
{TEXT_PROMPT}Text prompt to detect (e.g., "chair")
  • You can change the TEXT_PROMPT to try different detection phrases!!

โœ… Real-World Prompt Tests!

Letโ€™s now try running the inference while changing only the TEXT_PROMPT and see the results!


person โ€” A common COCO category.

Of course it works great!

Image


cat โ€” Will it produce any false positives even when the object isnโ€™t there?

Nothing detected! Well done ๐Ÿ˜Ž

Image


rugby โ€” Likely missing from most test sets. Will it still work?

Oh nice! It makes sense!!

Image


jump โ€” Now letโ€™s try a verb!

Whoa~ It works for actions too!?

Image


player is jumping โ€” What about a full sentence?

Hmmโ€ฆ Seems like it breaks it apart instead of treating it as one phrase.

Image


๐ŸŽ‰ Conclusion

Grounding DINO was very easy to install, and the inference workflow is intuitive!
Especially useful if you want to experiment with diverse text prompts.
If it could even understand full sentences, and support segmentation too โ€”
that would be amazing, right!? ๐Ÿ˜„

Well, maybe thereโ€™s a model out there that already does that?
Letโ€™s keep exploring together!


๐Ÿฆ–(ํ•œ๊ตญ์–ด) Grounding DINO ์‹ค์Šต! DINO์˜ ์ง„ํ™” ๋ชจ๋ธ์„ ์ง์ ‘ ์‹คํ–‰ํ•ด๋ณด์ž!

์ด๋ฒˆ ํฌ์ŠคํŒ…์€ DINO์˜ ํ›„์† ๋ชจ๋ธ์ธ Grounding DINO ์‹ค์Šต์ž…๋‹ˆ๋‹ค!
DINO์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ GitHub repo์—์„œ ์ฝ”๋“œ๋ฅผ ๋‚ด๋ ค๋ฐ›์•„ ์‹คํ–‰ํ•˜์ง€๋งŒ, ์˜คํžˆ๋ ค ๋” ๊ฐ„๋‹จํ•˜๊ฒŒ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋”๋ผ๊ตฌ์š” ๐Ÿ˜Š
๊ทธ๋ž˜์„œ ์ด๋ฒˆ์—” ์ด๋ก ์€ ์ž ์‹œ ๋’ค๋กœ ๋ฏธ๋ฃจ๊ณ , ๋ฐ”๋กœ ์ฝ”๋“œ๋ถ€ํ„ฐ ์‹คํ–‰ํ•ด๋ด…๋‹ˆ๋‹ค!!


๐Ÿงฑ 1. GitHub ์ €์žฅ์†Œ ํด๋ก 

1
2
git clone https://github.com/IDEA-Research/GroundingDINO.git
cd GroundingDINO/

๐Ÿ“ฆ 2. ๋ชจ๋ธ ์„ค์น˜

1
pip install -e .

๐ŸงŠ 3. Pretrained Weight ๋‹ค์šด๋กœ๋“œ

1
2
3
4
mkdir weights
cd weights/
wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
cd ..

๐Ÿš€ 4. ๊ฐ์ฒด ํƒ์ง€ ์‹คํ—˜ ์‹คํ–‰ (๋ช…๋ น์–ด ํ…œํ”Œ๋ฆฟ)

๋‹ค์Œ ๋ช…๋ น์–ด๋Š” Grounding DINO๋ฅผ ์‹คํ–‰ํ•˜๋Š” ์ „์ฒด ๊ตฌ์กฐ์ž…๋‹ˆ๋‹ค.
ํ•„์š”ํ•œ ๊ฐ’๋“ค์€ {} ๋ณ€์ˆ˜๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์–ด ์ƒํ™ฉ์— ๋งž๊ฒŒ ๋Œ€์ฒดํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค:

1
2
3
4
5
6
CUDA_VISIBLE_DEVICES={GPU_ID} python demo/inference_on_a_image.py \
  -c {CONFIG_PATH} \
  -p {CHECKPOINT_PATH} \
  -i {INPUT_IMAGE_PATH} \
  -o {OUTPUT_DIR} \
  -t "{TEXT_PROMPT}"

์˜ˆ์‹œ ๊ฐ’ ๋Œ€์ž…:

๋ณ€์ˆ˜๋ช…์„ค๋ช…
{GPU_ID}์‚ฌ์šฉํ•  GPU ID (์˜ˆ: 0) - nvidia-smi ํ•˜๋ฉด ํ™•์ธํ• ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค!!
{CONFIG_PATH}์„ค์ • ํŒŒ์ผ ๊ฒฝ๋กœ (์˜ˆ: groundingdino/config/GroundingDINO_SwinT_OGC.py)
{CHECKPOINT_PATH}๋ชจ๋ธ ๊ฐ€์ค‘์น˜ ๊ฒฝ๋กœ (์˜ˆ: weights/groundingdino_swint_ogc.pth)
{INPUT_IMAGE_PATH}๊ฐ์ง€ํ•  ์ด๋ฏธ์ง€ ๊ฒฝ๋กœ (์˜ˆ: /home/user/images/catch_rugby_ball_001480.jpg)
{OUTPUT_DIR}์ถœ๋ ฅ ์ €์žฅ ๋””๋ ‰ํ„ฐ๋ฆฌ (์˜ˆ: /home/user/images/GroundingDINO/results)
{TEXT_PROMPT}ํƒ์ง€ํ•  ๋Œ€์ƒ ํ…์ŠคํŠธ (์˜ˆ: "chair")
  • ์—ฌ๊ธฐ์„œ TEXT_PROMPT ์— ๋‹ค์–‘ํ•œ ๊ฐ’์„ ๋„ฃ์œผ๋ฉฐ ํ…Œ์ŠคํŠธํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค!!

โœ… ์‹ค์ „ ํ”„๋กฌํฌํŠธ๋ณ„ ํ…Œ์ŠคํŠธ!!

์ด์ œ TEXT_PROMPT ๋ฅผ ๋ฐ”๊ฟ”๊ฐ€๋ฉฐ ๊ฒฐ๊ณผ๋ฅผ ๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค!!

person. ๊ฐ€์žฅ ๊ฐ„๋‹จํ•˜๋ฉฐ ๊ธฐ์กด coco dataset์— ์žˆ๋Š” person!!

์—ญ์‹œ ์ž˜ํ•˜๋Š”๊ตฌ๋งŒ~~

Image

cat. ์—†๋Š”๊ฒƒ์„ ์˜คํƒ์ง€ํ•˜์ง€๋Š” ์•Š์„๊นŒ์š”!?

์•„๋ฌด๊ฒƒ๋„ ์•ˆ์žก๋„ค? ์ž˜ํ–ˆ์–ด!!

Image

ํ…Œ์ŠคํŠธ ์…‹์—๋„ ์—†์—ˆ์„ rugby. ๋Ÿญ๋น„๋ผ๋Š” ๋‹จ์–ด๋กœ ์ž‘๋™์„ ํ• ๊นŒ์š”!?

๋Ÿญ๋น„~~ ๊ทธ๋Ÿด๋“ฏํ•ด ์ข‹์•„!?!

Image

jump ์ด๋ฒˆ์—” ๋™์‚ฌ์œผ๋กœ!!

์˜ค~ ๋™์‚ฌ ์ž˜ํ•˜๋Š”๊ฑธ!?!

Image

player is jumping ์ด๋ฒˆ์—” ๋ฌธ์žฅ!!

์•„์‰ฝ๊ฒŒ๋„ ๋ฌธ์žฅ์„ ํ•˜๋‚˜๋กœ ์ธ์‹ํ•˜๋Š”๊ฒŒ ์•„๋‹ˆ๋ผ ๋Š์–ด์„œ ๋ณด๋Š”๋“ฏํ•˜๋„ค์š”!!

Image


๐ŸŽ‰ ๋งˆ๋ฌด๋ฆฌ

Grounding DINO๋Š” ์„ค์น˜๋„ ๊ฐ„๋‹จํ•˜๊ณ , inference๋„ ์ง๊ด€์ ์œผ๋กœ ๋˜์–ด ์žˆ์–ด์„œ ๋ฐ”๋กœ ์‹คํ—˜ํ•ด๋ณด๊ธฐ ์ข‹์•˜์Šต๋‹ˆ๋‹ค!
ํŠนํžˆ, ๋‹ค์–‘ํ•œ ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์‹คํ—˜ํ•ด๋ณด๊ณ  ์‹ถ์„ ๋•Œ ๋งค์šฐ ์œ ์šฉํ•  ๊ฒƒ ๊ฐ™์•„์š”.
๋ฌธ์žฅ๊นŒ์ง€ ์ž˜ ์ธ์ง€ํ•˜๊ณ , Segment ๊นŒ์ง€ํ• ์ˆ˜ ์žˆ๋‹ค๋ฉด!!
์–ผ~~๋งˆ๋‚˜ ์ข‹์„๊นŒ์š”!!
๊ทธ๋Ÿฐ๋ฐ! ๊ทธ๋Ÿฐ ์—ฐ๊ตฌ๋„ ์žˆ์ง€ ์•Š์„๊นŒ์š”!?
ํ•จ๊ป˜ ๊ณต๋ถ€ํ•ด๋ด…์‹œ๋‹ค!^^

This post is licensed under CC BY 4.0 by the author.