Post

LISA Practice!! - Reasoning Segmentation LLM LISA ์‹ค์Šต!!

LISA Practice!! - Reasoning Segmentation LLM LISA ์‹ค์Šต!!

๐Ÿฆ– (English) Reasoning Segmentation LLM LISA Practice!!

LISA, isnโ€™t the name lovely!? It stands for Large Language Instructed Segmentation Assistant!

This post is about hands-on practice with LISA, a model that performs image segmentation through reasoning.
The model is so fascinating that I jumped into the practice first!
Letโ€™s look at the theory later~!!


๐Ÿงฑ 1. Clone the LISA Git Repository

1
git clone git@github.com:dvlab-research/LISA.git

๐Ÿ“ฆ 2. Install Required Packages in Virtual Environment

I installed the required packages using a conda virtual environment!!

1
2
conda create -n lisa python=3.9 -y
conda activate lisa

Now, install the requirements provided in the repo!

1
2
pip install -r requirements.txt
pip install flash-attn --no-build-isolation

โš ๏ธ Note: You must modify the first part of requirements.txt:
--extra-index-url https://download.pytorch.org/whl/cu117
Change it according to your CUDA version!!

Thatโ€™s it for the installation~~


๐ŸงŠ 3. Run the LISA Model!!

For reference, Iโ€™m running a single GeForce RTX 4090 with 24GB VRAM!
Running the standard inference model results in an out-of-memory error:

1
CUDA_VISIBLE_DEVICES=0 python chat.py --version='xinlai/LISA-13B-llama2-v1'
1
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 136.00 MiB...

However!! LISA kindly provides a lightweight version suitable for a single 24GB or 12GB GPU.
So, I used the light version as follows:

1
CUDA_VISIBLE_DEVICES=0 python chat.py --version='xinlai/LISA-13B-llama2-v1' --precision='fp16' --load_in_8bit

And voilร ~! Youโ€™ll see a prompt input like this:

1
Please input your prompt: 

I used a toothbrush image and asked about the part used for cleaning:

The part of a toothbrush used to remove food particles from teeth is called the bristles.

And the output was:

1
2
3
4
5
Please input your prompt: The part of a toothbrush used to remove food particles from teeth is called the bristles.
Please input the image path: /home/user/data/AGD20K/Seen/trainset/egocentric/brush_with/toothbrush/toothbrush_000127.jpg
text_output:  <s>A chat between a curious human and an artificial intelligence assistant...
./vis_output/The part of a toothbrush used to remove food particles from teeth. _toothbrush_000127_mask_0.jpg has been saved.
./vis_output/The part of a toothbrush used to remove food particles from teeth. _toothbrush_000127_masked_img_0.jpg has been saved.

Just a bit of theory: the [SEG] tag in the response indicates that a mask output is included.
And indeed, two image files are saved, as confirmed in the last two lines!!

Shall we see the image!?

The model clearly segments the bristle part only!
toothbrush

Here are more test results I tried:

wine glass โ€“ It segments well even with a simple noun!
wine_glass

glove โ€“ It nicely extracts the hand area!
glove

where is the handle? โ€“ Great comprehension!
knifehandle

Which part of a baseball bat is the handle that people hold?
baseballhandle

Wearing glove โ€“ Would be better if it focused just on the hand!
wearingglove

A vegetable thatโ€™s healthy but not liked by most kids โ€“ Broccoliโ€ฆ not the best result!
brocoli


๐ŸŽ‰ Conclusion

Itโ€™s the era of Segmentation!
Weโ€™re now going beyond basic segmentation into reasoning-based segmentation!!
Curious to see how far this field will go~!


๐Ÿฆ–(ํ•œ๊ตญ์–ด) Reasoning Segmentation LLM LISA ์‹ค์Šต!!

LISA, ์ด๋ฆ„์ด ์ด์˜์ง€์š”~!? Large Language Instructed Segmentation Assistant ์˜ ์•ฝ์ž๋ผ๊ณ ํ•ฉ๋‹ˆ๋‹ค!

์ด๋ฒˆ ํฌ์ŠคํŒ…์€ ์ด๋ฏธ์ง€๋ฅผ ์ถ”๋ก ์— ์˜ํ•˜์—ฌ Segmentation ํ•˜๋Š” LISA ๋ชจ๋ธ ์‹ค์Šต์ž…๋‹ˆ๋‹ค!
๋ชจ๋ธ์ด ๋„ˆ๋ฌด ์‹ ๊ธฐํ•ด์„œ ๋จผ์ € ์‹ค์Šต๋ถ€ํ„ฐ ํ•˜๊ณ !
์ด๋ก ์— ๋Œ€ํ•˜์—ฌ ์•Œ์•„๋ณด์•„์š”~!!
โ€”

๐Ÿงฑ 1. LISA Git Clone

1
git clone git@github.com:dvlab-research/LISA.git

๐Ÿ“ฆ 2. ๊ฐ€์ƒํ™˜๊ฒฝ์—์„œ์˜ ํ•„์š” ํŒจํ‚ค์ง€ ์„ค์น˜!!

์ €๋Š” conda ๊ฐ€์ƒํ™˜๊ฒฝ์—์„œ ํ•„์š” ํŒจํ‚ค์ง€๋“ค์„ ์„ค์น˜ํ–ˆ์Šต๋‹ˆ๋‹ค!!

1
2
conda create -n lisa python=3.9 -y
conda activate lisa

์ด์ œ, repo์—์„œ ์ œ๊ณตํ•˜๋Š” requirements๋ฅผ ์„ค์น˜ํ•ด์ค๋‹ˆ๋‹ค!

1
2
pip install -r requirements.txt
pip install flash-attn --no-build-isolation

์—ฌ๊ธฐ์„œ!! ์ค‘์š”ํ•œ์ ์€ requirements.txt ์—์„œ ์ œ์ผ ์•ž๋ถ€๋ถ„์˜
--extra-index-url https://download.pytorch.org/whl/cu117
๋ถ€๋ถ„์„, ๋‚˜์˜ cuda ๋ฒ„์ ผ์— ๋งž๊ฒŒ ๋ณ€๊ฒฝํ•ด์ฃผ์–ด์•ผํ•œ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค!!

์ด์ œ ์„ค์น˜ ๋~~


๐ŸงŠ 3. LISA ๋ชจ๋ธ ์‹คํ–‰!!

์‹œ์ž‘์— ์•ž์„œ ์ œ ํ™˜๊ฒฝ์€ 1๊ฐœ์˜ GeForce RTX 4090, 24GB ์ž…๋‹ˆ๋‹ค!! ์ด์— ์ผ๋ฐ˜ ์ถ”๋ก  ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™์ด ๋ฉ”๋ชจ๋ฆฌ ๋ถ€์กฑ์ด ๋œจ๊ฒŒ ๋˜์–ด์š”!

1
CUDA_VISIBLE_DEVICES=0 python chat.py --version='xinlai/LISA-13B-llama2-v1'
1
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 136.00 MiB. GPU 0 has a total capacity of 23.49 GiB of which 116.19 MiB is free. Including non-PyTorch memory, this process has 22.92 GiB memory in use. Of the allocated memory 22.54 GiB is allocated by PyTorch, and 2.73 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

๊ทธ๋Ÿฌ๋‚˜!! LISA๋Š” ์นœ์ ˆํ•˜๊ฒŒ single 24G or 12G GPU ์—์„œ๋„ ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ ๊ฒฝ๋Ÿ‰ํ™” ๋ชจ๋ธ์„ ์ œ๊ณตํ•˜๊ธฐ์—,
ํ•ด๋‹น ๋ชจ๋ธ๋กœ ์‹คํ–‰ํ•˜์˜€์Šต๋‹ˆ๋‹ค!!

1
CUDA_VISIBLE_DEVICES=0 python chat.py --version='xinlai/LISA-13B-llama2-v1' --precision='fp16' --load_in_8bit

๊ทธ๋Ÿผ~~ ์งœ์ž”!~! ์ด์ œ prompt ์ž…๋ ฅํ•˜๋ผ๊ณ  ๋‚˜์˜ต๋‹ˆ๋‹ค~~

1
Please input your prompt: 

์ €๋Š” ์นซ์†” ์ด๋ฏธ์ง€๋ฅผ ๋Œ€์ƒ์œผ๋กœ ์•„๋ž˜์™€ ๊ฐ™์ด ์ด๋ฅผ ๋‹ฆ๋Š” ๋ถ€๋ถ„์„ ๋ฌผ์–ด๋ณด์•˜์–ด์š”!
The part of a toothbrush used to remove food particles from teeth is called the bristles.

๊ทธ๋Ÿผ!!

์•„๋ž˜์™€ ๊ฐ™์ด ๋‹ต๋ณ€์ด ๋‚˜์˜ต๋‹ˆ๋‹ค~~!

1
2
3
4
5
Please input your prompt: The part of a toothbrush used to remove food particles from teeth is called the bristles.
Please input the image path: /home/user/data/AGD20K/Seen/trainset/egocentric/brush_with/toothbrush/toothbrush_000127.jpg
text_output:  <s>A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. USER: <im_start> <im_end> The part of a toothbrush used to remove food particles from teeth is called the bristles. ASSISTANT: Sure, [SEG] .</s>
./vis_output/The part of a toothbrush used to remove food particles from teeth is called the bristles. _toothbrush_000127_mask_0.jpg has been saved.
./vis_output/The part of a toothbrush used to remove food particles from teeth is called the bristles. _toothbrush_000127_masked_img_0.jpg has been saved.

์‚ด์ง ์ด๋ก ๋‚ด์šฉ์„ ๊ณต์œ ํ•˜์ž๋ฉด, ๋‹ต๋ณ€์— [SEG] ๋ผ๋Š” ํ‘œํ˜„์ด ์žˆ๋‹ค๋ฉด ๋ณ„๋„์˜ mask ์ •๋ณด๊ฐ€ ์žˆ๋‹ค๋Š” ๋œป์ด๊ตฌ์š”!
๊ทธ๋ž˜์„œ ๋งˆ์ง€๋ง‰ 2์ค„๊ณผ ๊ฐ™์ด ์ด๋ฏธ์ง€๊ฐ€ ์ €์žฅ๋˜์—ˆ๋‹ค๊ณ  ์•Œ๋ ค์ค๋‹ˆ๋‹ค!!

์ด๋ฏธ์ง€๋ฅผ ๋ณผ๊นŒ์š”!?

์•„๋ž˜์™€ ๊ฐ™์ด ํ™•์‹คํ•˜๊ฒŒ ์†” ๋ถ€๋ถ„๋งŒ ์ž˜ ๋ถ„๋ฅ˜ํ•˜๋„ค์š”~!! toothbrush

์ด ์™ธ์—๋„ ํ…Œ์ŠคํŠธํ•ด๋ณด์•˜๋˜ ๊ฒฐ๊ณผ๋ฌผ์„ ์•„๋ž˜์™€ ๊ฐ™์ด ๊ณต์œ ํ•ฉ๋‹ˆ๋‹ค~!

wine glass : ๋‹จ์ˆœํ•œ ๋‹จ์–ด๋กœ๋„ ์ž˜ ๊ตฌ๋ถ„ํ•˜์ฃ !? wine_glass

glove : ์†๋ถ€๋ถ„๋งŒ ์ž˜ ์ถ”์ถœํ•˜๋„ค์š”~! glove

where is the handle? : ๋ฌธ์žฅ ์ž˜ํ•ด!!! knifehandle

Which part of a baseball bat is the handle that people hold? baseballhandle

Wearing glove : ๋”ฑ ์†๋ถ€๋ถ„๋งŒ ํ•˜๋ฉด ์ข‹๊ฒ ์ง€๋งŒ ๊ทธ๋ ‡๊ฒŒ๋Š” ์•ˆ๋˜๋„ค์š”!. ๊ธ€๋กœ๋ธŒ๋ฅผ ๋ผ๊ณ ์žˆ๋Š” ์‚ฌ๋žŒ ์œผ๋กœ ์ดํ•ดํ•˜๋‚˜๋ณด์•„์š”! wearingglove

A vegetable thatโ€™s healthy but not liked by most kids : ๋ธŒ๋กœ์ฝœ๋ฆฌ.. ์ž˜ ๋ชปํ•˜๋Š”๊ตฐ์š”!! brocoli


๐ŸŽ‰ ๋งˆ๋ฌด๋ฆฌ

Segmentation์˜ ์‹œ๋Œ€! ์ด์ œ๋Š” ๋‹จ์ˆœ Segmentation์„ ๋„˜์–ด ์ถ”๋ก ๊นŒ์ง€!!
์•ž์œผ๋กœ ์–ผ๋งˆ๋‚˜ ๋” ๋ฐœ์ „๋ ์ง€ ๊ธฐ๋Œ€๋ฉ๋‹ˆ๋‹ค~!

This post is licensed under CC BY 4.0 by the author.