Post

๐Ÿ–ฅ๏ธ SEEM Practice!! - SEEM ์‹ค์Šต!! with python. gradio

๐Ÿ–ฅ๏ธ SEEM Practice!! - SEEM ์‹ค์Šต!! with python. gradio

๐Ÿฆ– (English) SEEM Model Practice!!

SEEM: Segment Everything Everywhere All At Once

We previously introduced SEEM in this post.
This time, weโ€™ll go through a hands-on tutorial using the SEEM model!


๐Ÿงฑ 1. SEEM Git Clone

Clone the official repository from the GitHub page:

1
git clone git@github.com:UX-Decoder/Segment-Everything-Everywhere-All-At-Once.git

๐Ÿ“ฆ 2. Required Packages in Virtual Environment

I used a conda virtual environment to install the required packages:

1
2
conda create -n seem python=3.9 -y
conda activate seem

Then install the dependencies from the repo:

1
2
3
4
5
6
7
8
9
10
# Python Package Installation
pip install -r assets/requirements/requirements.txt
pip install -r assets/requirements/requirements_custom.txt

# Custom Operator [only needed for training deformable vision encoder]
cd modeling/vision/encoder/ops && sh make.sh && cd ../../../../

# System Package [only needed for SEEM demo]
sudo apt update
sudo apt install ffmpeg

Note: I encountered an error while installing mpi4py, so I adjusted some variables:

1
2
3
4
5
6
mpicc --version
which mpicc
export LD=ld
export CC=mpicc
export CXX=mpicxx
pip install --no-cache-dir --force-reinstall mpi4py

๐ŸงŠ 3. Run the SEEM Model

Execute the command from the demo directory:
The model will automatically download and launch a gradio-based interface!

1
2
export PYTHONPATH=$(pwd)  # Required to avoid ModuleNotFoundError: No module named 'modeling'
python demo/seem/app.py

You will see logs like the following:

1
2
3
4
5
6
7
8
9
  inputs = [ImageMask(label="[Stroke] Draw on Image",type="pil"), gr.inputs.CheckboxGroup(choices=["Stroke", "Example", "Text", "Audio", "Video", "Panoptic"], type="value", label="Interative Mode"), ImageMask(label="[Example] Draw on Referring Image",type="pil"), gr.Textbox(label="[Text] Referring Text"), gr.Audio(label="[Audio] Referring Audio", source="microphone", type="filepath"), gr.Video(label="[Video] Referring Video Segmentation",format="mp4",interactive=True)]
.../demo/seem/app.py:131: GradioDeprecationWarning: `optional` parameter is deprecated
.../demo/seem/app.py:136: GradioDeprecationWarning: gr.outputs is deprecated, use gradio.components instead
Running on local URL:  http://0.0.0.0:8507
IMPORTANT: You are using gradio version 3.42.0, however version 4.44.1 is available.
--------
Running on public URL: https://98c5c3e82a6870cfe5.gradio.live

This share link expires in 72 hours.

Now, if you open https://98c5c3e82a6870cfe5.gradio.live, youโ€™ll be able to access the environment!
(Note: Since I wonโ€™t keep the server running, please use your own URL.)

Hereโ€™s what the interface looks like:

ui

  • Interactive Mode: You can try various prompts including stroke, text, audio, etc.

๐ŸงŠ 4. SEEM Model Testing

Letโ€™s try some tests!

Text prompt: โ€œA person using a computer on the couchโ€ โ€“ worked really well!
test1

Segmentation based on keywords works great too!
test2

If you just upload an image, it automatically segments objects and labels them!
test3

Using the stroke feature, it segments only the selected region and even labels it!
test4


๐ŸŽ‰ Conclusion

SEEM truly feels like the peak of current segmentation models.
Its flexibility and ease of use are simply amazing!


๐Ÿฆ–(ํ•œ๊ตญ์–ด) SEEM ๋ชจ๋ธ ์‹ค์Šต!!

SEEM : Segment Everything Everywhere All At Once

์ง€๋‚œ ํฌ์ŠคํŒ… ์—์„œ ์•Œ์•„๋ณด์•˜๋˜ SEEM!!
์ด๋ฒˆ ํฌ์ŠคํŒ…์€ ์ด SEEM ๋ชจ๋ธ์˜ ์‹ค์Šต์„ ์ง„ํ–‰ํ•ด๋ณด๊ฒ ์Šต๋‹ˆ๋‹ค!!


๐Ÿงฑ 1. SEEM Git Clone

1
git@github.com:UX-Decoder/Segment-Everything-Everywhere-All-At-Once.git

๐Ÿ“ฆ 2. ๊ฐ€์ƒํ™˜๊ฒฝ์—์„œ์˜ ํ•„์š” ํŒจํ‚ค์ง€ ์„ค์น˜!!

์ €๋Š” conda ๊ฐ€์ƒํ™˜๊ฒฝ์—์„œ ํ•„์š” ํŒจํ‚ค์ง€๋“ค์„ ์„ค์น˜ํ–ˆ์Šต๋‹ˆ๋‹ค!!

1
2
conda create -n seem python=3.9 -y
conda activate seem

์ด์ œ, repo์—์„œ ์ œ๊ณตํ•˜๋Š” requirements๋ฅผ ์„ค์น˜ํ•ด์ค๋‹ˆ๋‹ค!

1
2
3
4
5
6
7
8
9
10
# Python Package Installation
pip install -r assets/requirements/requirements.txt
pip install -r assets/requirements/requirements_custom.txt

# Customer Operator [only need training deformable vision encoder]
cd modeling/vision/encoder/ops && sh make.sh && cd ../../../../

# System Package [only need for demo in SEEM]
sudo apt update
sudo apt install ffmpeg

์—ฌ๊ธฐ์„œ!! ์ €๋Š” mpi4py์„ค์น˜์—์„œ ์—๋Ÿฌ๊ฐ€ ์ƒ๊ฒผ์–ด์„œ
์•„๋ž˜์™€ ๊ฐ™์ด ๋ณ€์ˆ˜๋ฅผ ์กฐ๊ธˆ ๋ฐ”๊พธ์–ด์ฃผ์—ˆ์—ˆ์Šต๋‹ˆ๋‹ค!

1
2
3
4
5
6
mpicc --version
which mpicc
export LD=ld
export CC=mpicc
export CXX=mpicxx 
pip install --no-cache-dir --force-reinstall mpi4py

๐ŸงŠ 3. SEEM ๋ชจ๋ธ ์‹คํ–‰!!

์•„๋ž˜์™€ ๊ฐ™์ด demoํด๋”ํƒœ์˜ ๋ช…๋ น์–ด๋ฅผ ์‹คํ–‰ํ•˜๋ฉด!!
์•Œ์•„์„œ ๋ชจ๋ธ์„ ์ž˜ ๋ฐ›๊ณ !!
gradio ๊ธฐ๋ฐ˜์˜ ์‚ฌ์ดํŠธ๊ฐ€ ์‹œ์ž‘๋ฉ๋‹ˆ๋‹ค!!

1
2
export PYTHONPATH=$(pwd) # ์•„๋‹ˆ๋ฉด ModuleNotFoundError: No module named 'modeling' ๋ผ๋Š” ์—๋Ÿฌ๊ฐ€ ๋œน๋‹ˆ๋‹ค!
python demo/seem/app.py

๊ทธ๋Ÿผ!! ๋ชจ๋ธ์„ ๋‹ค์šด๋ฐ›์œผ๋ฉฐ ์•„๋ž˜์™€ ๊ฐ™์ด ์ฃผ๋ฃจ๋ฅต ๋กœ๊ทธ๊ฐ€ ๋•๋‹ˆ๋‹ค~!

1
2
3
4
5
6
7
8
9
10
11
  inputs = [ImageMask(label="[Stroke] Draw on Image",type="pil"), gr.inputs.CheckboxGroup(choices=["Stroke", "Example", "Text", "Audio", "Video", "Panoptic"], type="value", label="Interative Mode"), ImageMask(label="[Example] Draw on Referring Image",type="pil"), gr.Textbox(label="[Text] Referring Text"), gr.Audio(label="[Audio] Referring Audio", source="microphone", type="filepath"), gr.Video(label="[Video] Referring Video Segmentation",format="mp4",interactive=True)]
.../Segment-Everything-Everywhere-All-At-Once/demo/seem/app.py:131: GradioDeprecationWarning: `optional` parameter is deprecated, and it has no effect
  inputs = [ImageMask(label="[Stroke] Draw on Image",type="pil"), gr.inputs.CheckboxGroup(choices=["Stroke", "Example", "Text", "Audio", "Video", "Panoptic"], type="value", label="Interative Mode"), ImageMask(label="[Example] Draw on Referring Image",type="pil"), gr.Textbox(label="[Text] Referring Text"), gr.Audio(label="[Audio] Referring Audio", source="microphone", type="filepath"), gr.Video(label="[Video] Referring Video Segmentation",format="mp4",interactive=True)]
.../Segment-Everything-Everywhere-All-At-Once/demo/seem/app.py:136: GradioDeprecationWarning: Usage of gradio.outputs is deprecated, and will not be supported in the future, please import your components from gradio.components
  gr.outputs.Image(
Running on local URL:  http://0.0.0.0:8507
IMPORTANT: You are using gradio version 3.42.0, however version 4.44.1 is available, please upgrade.
--------
Running on public URL: https://98c5c3e82a6870cfe5.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)

๊ทธ๋Ÿผ!! https://98c5c3e82a6870cfe5.gradio.live ๋กœ ์ ‘์†ํ•ด๋ณด๋ฉด ์ด์ œ ํ™˜๊ฒฝ์— ์ ‘์†์ด ๋œ๊ฒƒ์ž…๋‹ˆ๋‹ค!!
(์ €๋Š” ์ด ์„œ๋ฒ„๋ฅผ ๊ณ„์† ์ผœ๋‘์ง€ ์•Š์„๊ฒƒ์ด๊ธฐ์— ์ œ url์ด ์•„๋‹ˆ๋ผ ์—ฌ๋Ÿฌ๋ถ„์˜ url๋กœ ์ ‘์†ํ•˜์„ธ์š”!!)

์•„๋ž˜์™€ ๊ฐ™์€ ํ™”๋ฉด์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค!

ui

  • Interative Mode : stroke (์ด๋ฏธ์ง€์— ํ‘œ์‹œํ•˜๊ธฐ), text, audio ๋“ฑ ํ”„๋กฌํฌํŠธ๋ฅผ ๋„ฃ์„์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค!

๐ŸงŠ 4. SEEM ๋ชจ๋ธ ํ…Œ์ŠคํŠธ!!

์ด์ œ. ํ…Œ์ŠคํŠธ๋ฅผ ์ง„ํ–‰ํ•ด๋ด…์‹œ๋‹ค!!

์†ŒํŒŒ์—์„œ ์ปดํ“จํ„ฐํ•˜๋Š” ์‚ฌ๋žŒ์— ๋Œ€ํ•œ ํ…์ŠคํŠธ ํ”„๋กฌํฌํŠธ ํ…Œ์ŠคํŠธ! ์•„์ฃผ์ข‹์•„! test1

๋‹จ์–ด์— ๋Œ€ํ•œ segmentation ๋„ ์ž˜ํ•ด์š”~! test2

๊ทธ๋ƒฅ ์ด๋ฏธ์ง€๋งŒ ๋„ฃ๋Š”๋‹ค๋ฉด!? ์•Œ์•„์„œ segmentation ํ•˜๊ณ  ์ด๋ฆ„๊นŒ์ง€!!
test3

์ด๋ฒˆ์—” stroke ๊ธฐ๋Šฅ์„ ์จ์„œํ•˜๋ฉด!@? ๊ทธ๋ถ€๋ถ„๋งŒ segmentation ํ•˜๊ณ  ์ด๋ฆ„๊นŒ์ง€!!
test4


๐ŸŽ‰ ๋งˆ๋ฌด๋ฆฌ

SEEM, ๊ทธ๋™์•ˆ์˜ segmentation ์˜ ์ •์ ์ธ๋“ฏ! ์ž์œ ์ž์ œ๋กœ ๋„ˆ๋ฌด ๋ฉ‹์ง€๋„ค์š”!

This post is licensed under CC BY 4.0 by the author.