Post

๐Ÿ–ฅ๏ธ Grounded SAM Hands-On with Python! - Grounded SAM ์‹ค์Šต with python!

๐Ÿ–ฅ๏ธ Grounded SAM Hands-On with Python! - Grounded SAM ์‹ค์Šต with python!

๐Ÿฆ– (English) Hands-On with Grounded SAM! Detect objects with DINO, then Segment with SAM!

In this post, weโ€™ll do a hands-on walkthrough of Grounding DINO + SAM = Grounded SAM!
Weโ€™ll keep following the GitHub repo and run the code,
but if you go step by step, itโ€™s not too hard!
So once again, letโ€™s skip the theory for now,
and dive straight into the code to understand what Grounded SAM is all about!!


๐Ÿงฑ 1. Clone the GitHub Repository

1
2
git clone https://github.com/IDEA-Research/Grounded-Segment-Anything
cd Grounded-Segment-Anything/

๐Ÿ“ฆ 2. Install the Models

  • From here on, weโ€™re following the setup from the GitHub repo directly!!
  • Please start in an environment where PyTorch and GPU are set up correctly.
  • If notโ€ฆ youโ€™ll likely run into many issues! ๐Ÿ˜…
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
export AM_I_DOCKER=False
export BUILD_WITH_CUDA=True
export CUDA_HOME=/path/to/cuda-11.3/

# Install SAM
python -m pip install -e segment_anything

# Install GroundingDINO
pip install --no-build-isolation -e GroundingDINO

# Install diffusers
pip install --upgrade diffusers[torch]

# OSX-specific install: I skipped this on Ubuntu, but if you're on Mac, you should run this!!
git submodule update --init --recursive
cd grounded-sam-osx && bash install.sh

# Install RAM & Tag2Text
git clone https://github.com/xinyu1205/recognize-anything.git
pip install -r ./recognize-anything/requirements.txt
pip install -e ./recognize-anything/

# Final dependencies โ€“ may vary per user!
pip install opencv-python pycocotools matplotlib onnxruntime onnx ipykernel

# Bonus tip!! The supervision version must match exactly as below!!
## I found this after many errors โ€“ trust me, use this version!
pip install supervision==0.21.0

๐Ÿš€ 3. Run Object Detection (from Jupyter Notebook)

Now!! With the provided grounded_sam.ipynb from the repo, you can jump straight into segmentation~!
I reused the same image from our previous Grounding DINO test.

Just like before, I input prompts and tested various labels.
Here are the results!
Please note: only one segment is returned per prompt!

  • person. The simplest and one of the standard COCO dataset labels!!

Image

From detection to segmentation โ€“ flawless!

  • cat. We already know Grounding DINO failed to detect this before, so skipping it.

  • rugby. I hoped it would detect the ball, but sadly, detection failed again!

Image

  • helmet. Fantastic result!

Image

  • jump. It accurately captured the jumping person!

Image

How about a full sentence this time: player is running?

Once again, performance on full sentences isnโ€™t quite there yet!

Image

Now I tried a different image.

holding โ€“ curious to see what it catches~

I was hoping it might isolate just the hand, but I guess thatโ€™s asking too much!

Image

bat โ€“ can it detect a small baseball bat?

Absolutely! To help understand, hereโ€™s the mask version too!

Image

catcher and referee!!

Clearly distinguishes large human figures!

Image


๐ŸŽ‰ Final Thoughts

Grounded SAM!! After Grounding DINO,
we now go from detection to actual image segmentation!
SAM alone was conceptually interesting but lacked text input,
so Grounded SAM is amazing in that it allows text prompts! ๐Ÿ˜„
That said, imagine how powerful it would be if it could handle large images and multiple segments in one shot!


๐Ÿฆ–(ํ•œ๊ตญ์–ด) Grounded SAM ์‹ค์Šต! DINO๋กœ ๊ฐ์ฑ„ ํƒ์ง€ ํ›„ Segment๊นŒ์ง€!!!

์ด๋ฒˆ ํฌ์ŠคํŒ…์€ Grounding DINO ์™€ SAM ์„ ๊ฒฐํ•ฉํ•œ Grounded SAM์˜ ์‹ค์Šต์ž…๋‹ˆ๋‹ค!
๊ณ„์†ํ•ด์„œ GitHub repo์—์„œ ์ฝ”๋“œ๋ฅผ ๋‚ด๋ ค๋ฐ›์•„ ์‹คํ–‰ํ•˜์ง€๋งŒ,
์ฒœ์ฒœํžˆ ๋”ฐ๋ผํ•ด๋ณด๋ฉด ๋ชจ๋‘ ์ž˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค!
๊ทธ๋ž˜์„œ ์ด๋ฒˆ์—๋„ ์ด๋ก ์€ ์ž ์‹œ ๋’ค๋กœ ๋ฏธ๋ค„๋‘๊ณ ,
Grounded SAM์ด ๋ญ”์ง€ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด ๋ฐ”๋กœ ์ฝ”๋“œ๋ถ€ํ„ฐ ์‹คํ–‰ํ•ด๋ด…์‹œ๋‹ค!!


๐Ÿงฑ 1. GitHub ์ €์žฅ์†Œ ํด๋ก 

1
2
git clone https://github.com/IDEA-Research/Grounded-Segment-Anything
cd Grounded-Segment-Anything/

๐Ÿ“ฆ 2. ๋ชจ๋ธ ์„ค์น˜

  • ์—ฌ๊ธฐ๋ถ€ํ„ฐ๋Š” git repo์˜ ์„ค์น˜ ๋ฐฉ๋ฒ•์„ ๊ทธ๋Œ€๋กœ ๋”ฐ๋ผํ–ˆ์Šต๋‹ˆ๋‹ค!!.
  • pytorch ๋ฐ GPU ์„ธํŒ…์ด ์ž˜ ๋˜์–ด์žˆ๋Š” ํ™˜๊ฒฝ์—์„œ ์‹œ์ž‘ํ•ด์ฃผ์„ธ์š”!
  • ์•„๋‹ˆ๋ผ๋ฉด, ๋งŽ์€ ๋‚œ๊ด€์— ๋ถ€๋”ซํžˆ๋ฆฌ๋ผ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
export AM_I_DOCKER=False
export BUILD_WITH_CUDA=True
export CUDA_HOME=/path/to/cuda-11.3/

# SAM ์„ค์น˜
python -m pip install -e segment_anything

# GroundingDINO ์„ค์น˜
pip install --no-build-isolation -e GroundingDINO

# diffusers ์„ค์น˜
pip install --upgrade diffusers[torch]

# osx ์„ค์น˜ : ์ €๋Š” ์šฐ๋ถ„ํˆฌ ํ™˜๊ฒฝ์ด์–ด ์ƒ๋žตํ–ˆ์ง€๋งŒ mac์„ ์“ฐ์‹ ๋‹ค๋ฉด!!
git submodule update --init --recursive
cd grounded-sam-osx && bash install.sh

# RAM & Tag2Text ์„ค์น˜
git clone https://github.com/xinyu1205/recognize-anything.git
pip install -r ./recognize-anything/requirements.txt
pip install -e ./recognize-anything/

# ๋งˆ์ง€๋ง‰ ํ•„์š”ํ•œ ํ•จ์ˆ˜๋“ค์ธ๋ฐ, ์š”๊ฑด  ์‚ฌ์šฉ์ž๋งˆ๋‹ค ๋‹ค๋ฅผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค!
pip install opencv-python pycocotools matplotlib onnxruntime onnx ipykernel

# + ๊ฟ€ํŒ!! ์•„๋ž˜์™€ ๊ฐ™์ด supervision ์˜ ๋ฒ„์ ผ์ด ์ž˜ ๋งž์•„์•ผํ•ฉ๋‹ˆ๋‹ค!!
## ์—ฌ๋Ÿฌ ์—๋Ÿฌ๋ฅผ ํ†ตํ•ด ๋ฐœ๊ฒฌํ•œ ์‚ฌ์‹ค๋กœ! ๊ผญ ์ด๋ฒ„์ ผ์ด๋˜์–ด์•ผํ•ฉ๋‹ˆ๋‹ค!
pip install supervision==0.21.0

๐Ÿš€ 3. ๊ฐ์ฒด ํƒ์ง€ ์‹คํ—˜ ์‹คํ–‰ (์ฃผํ”ผํ„ฐ ๋…ธํŠธ๋ถ์—์„œ!!)

์ด์  !! ๊ธฐ์กด repo์— ์ œ๊ณต๋œ grounded_sam.ipynb๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด ๋ฐ”๋กœ segment๋ฅผ ์ง„ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค~!
์ €๋Š” ์ด๋ฏธ์ง€๋งŒ ์ง€๋‚œ๋ฒˆ Grounding DINO์™€ ๋™์ผํ•œ ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•ด๋ณด์•˜์Šต๋‹ˆ๋‹ค!

์ด์   ์ง€๋‚œ๋ฒˆ๊ณผ ๋™์ผํ•˜๊ฒŒ, ํ”„๋กฌํฌํŠธ๋ฅผ ๋„ฃ๊ณ  ํ…Œ์ŠคํŠธ๋ฅผ ์ง„ํ–‰ํ•ด๋ณด์•˜๊ณ !
๊ฒฐ๊ณผ๋ฅผ ๊ณต์œ ํ•ฉ๋‹ค!! ํ•œ๊ฐœ์˜ Segment ๋งŒ ๋œ๋‹ค๋Š”๊ฒƒ์„ ๊ฐ์•ˆํ•ด์ฃผ์„ธ์š”!!

  • person. ๊ฐ€์žฅ ๊ฐ„๋‹จํ•˜๋ฉฐ ๊ธฐ์กด coco dataset์— ์žˆ๋Š” person!!

Image

๊ฐ์ง€์— ์ด์–ด Segment๊นŒ์ง€ ๋๋‚ด์ค๋‹ˆ๋‹ค!

  • cat. ์€ ์ง€๋‚œ๋ณ€ grounding-DINO์—์„œ ์ด๋ฏธ detecting์ด ์•ˆ๋˜๋Š”๊ฒƒ์„ ํ™•์ธํ•˜์˜€์œผ๋‹ˆ skip!!

  • rugby. ๊ณต์ด ์žกํž๊ฒƒ์„ ์˜ˆ์ƒํ–ˆ์ง€๋งŒ ์—ฌ๊ธฐ์„  ์ข€ ์•„์‰ฝ๋„ค์š”! detection ๋ถ€ํ„ฐ ํ‹€๋ ค๋ฒ„๋ ธ์–ด!!

Image

  • helmet. ์•„์ฃผ ๋ฉ‹์ ธ์š”!!

Image

  • jump. ์ •๋ง ์ ํ”„ํ•˜๋Š” ์‚ฌ๋žŒ์„ ์ž˜ ์žก์•„๋ƒ…๋‹ˆ๋‹ค!

Image

player is running ์ด๋ฒˆ์—” ๋ฌธ์žฅ์œผ๋กœ!?!!

์ด๋ฒˆ์—๋„ ์—ญ์‹œ ๋ฌธ์žฅ์—์„œ๋Š” ์ž˜ํ•˜์ง€ ๋ชปํ•˜๋Š”๊ฒƒ์„ ๋ณด์•˜์Šต๋‹ˆ๋‹ค!

Image

์ด์   ์ด๋ฏธ์ง€๋ฅผ ๋ฐ”๊พธ์–ด๋ณด์•˜์Šต๋‹ˆ๋‹ค!

holding ์–ด๋–ป๊ฒŒ ๋ ์ง€ ๊ถ๊ธ‰ํ–ˆ๋Š”๋ฐ~~

ํ˜น์‹œ๋‚˜ ์† ๋ถ€๋ถ„๋งŒ์„ ์บ์น˜ํ• ๊นŒ ํ–ˆ๋Š”๋ฐ! ๊ทธ๊ฑด ์š•์‹ฌ์ด๋„ค์š”~!

Image

bat ์ž‘์€ ๋ฐฉ๋ง์ด๋Š” ์ž˜ ์บ์น˜ํ• ๊นŒ์š”!?

์ž˜ํ•ฉ๋‹ˆ๋‹ค! ์ดํ•ด๋ฅผ ์œ„ํ•ด mask ์ด๋ฏธ์ง€๋„ ํ•จ๊ป˜!!

Image

catcher ์™€ referee !!

ํผ์ง€๋ง‰ํ•˜๊ฒŒ ์ธ๋ฌผ๋กœ ์ž˜ ๊ตฌ๋ถ„ํ•ฉ๋‹ˆ๋‹ค!!

Image


๐ŸŽ‰ ๋งˆ๋ฌด๋ฆฌ

Grounded SAM!! Grounding DINO์— ์ด์–ด์„œ!! ๋””ํƒ์…˜ ๋‚ด๋ถ€์˜ ์ด๋ฏธ์ง€๋ฅผ segment!!
SAM์—์„œ๋Š” Text ํ”„๋กฌํฌํŠธ๊ฐ€ ๊ฐœ๋…์ ์œผ๋กœ๋งŒ ์ œ์‹œ๋˜์–ด ์•„์‰ฌ์› ๋Š”๋ฐ
์ด Grounded SAM์—์„œ๋Š” ํ…์ŠคํŠธ ์ œ์‹œ๊ฐ€ ๊ฐ€๋Šฅํ•ด์„œ ๋„ˆ๋ฌด ์ข‹์•˜์Šต๋‹ˆ๋‹ค!^^
๋‹ค๋งŒ, ํฐ ์ด๋ฏธ์ง€๋ฅผ ๋„˜์–ด ์ด๋ฏธ์ง€ ๋‚ด์˜ segment ๊นŒ์ง€ ๋œ๋‹ค๋ฉด ์–ผ~~๋งˆ๋‚˜ ์ข‹์„๊นŒ์š”~!

1
/home/smartride/DrFirst/LOCATE/AGD20K/Seen/trainset/exocentric/hit/baseball_bat/hit_baseball_bat_000029.jpg
This post is licensed under CC BY 4.0 by the author.