LISA Practice!! - Reasoning Segmentation LLM LISA ์ค์ต!!
๐ฆ (English) Reasoning Segmentation LLM LISA Practice!!
LISA, isnโt the name lovely!? It stands for
Large Language Instructed Segmentation Assistant
!
This post is about hands-on practice with LISA, a model that performs image segmentation through reasoning.
The model is so fascinating that I jumped into the practice first!
Letโs look at the theory later~!!
๐งฑ 1. Clone the LISA Git Repository
- Clone the repo from the official Git site!
1
git clone git@github.com:dvlab-research/LISA.git
๐ฆ 2. Install Required Packages in Virtual Environment
I installed the required packages using a conda virtual environment!!
1
2
conda create -n lisa python=3.9 -y
conda activate lisa
Now, install the requirements provided in the repo!
1
2
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
โ ๏ธ Note: You must modify the first part of requirements.txt
:
--extra-index-url https://download.pytorch.org/whl/cu117
Change it according to your CUDA version!!
Thatโs it for the installation~~
๐ง 3. Run the LISA Model!!
For reference, Iโm running a single GeForce RTX 4090
with 24GB VRAM!
Running the standard inference model results in an out-of-memory error:
1
CUDA_VISIBLE_DEVICES=0 python chat.py --version='xinlai/LISA-13B-llama2-v1'
1
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 136.00 MiB...
However!! LISA kindly provides a lightweight version suitable for a single 24GB or 12GB GPU.
So, I used the light version as follows:
1
CUDA_VISIBLE_DEVICES=0 python chat.py --version='xinlai/LISA-13B-llama2-v1' --precision='fp16' --load_in_8bit
And voilร ~! Youโll see a prompt input like this:
1
Please input your prompt:
I used a toothbrush image and asked about the part used for cleaning:
The part of a toothbrush used to remove food particles from teeth is called the bristles.
And the output was:
1
2
3
4
5
Please input your prompt: The part of a toothbrush used to remove food particles from teeth is called the bristles.
Please input the image path: /home/user/data/AGD20K/Seen/trainset/egocentric/brush_with/toothbrush/toothbrush_000127.jpg
text_output: <s>A chat between a curious human and an artificial intelligence assistant...
./vis_output/The part of a toothbrush used to remove food particles from teeth. _toothbrush_000127_mask_0.jpg has been saved.
./vis_output/The part of a toothbrush used to remove food particles from teeth. _toothbrush_000127_masked_img_0.jpg has been saved.
Just a bit of theory: the [SEG]
tag in the response indicates that a mask output is included.
And indeed, two image files are saved, as confirmed in the last two lines!!
Shall we see the image!?
Here are more test results I tried:
Which part of a baseball bat is the handle that people hold?
Wearing glove โ Would be better if it focused just on the hand!
A vegetable thatโs healthy but not liked by most kids โ Broccoliโฆ not the best result!
๐ Conclusion
Itโs the era of Segmentation!
Weโre now going beyond basic segmentation into reasoning-based segmentation!!
Curious to see how far this field will go~!
๐ฆ(ํ๊ตญ์ด) Reasoning Segmentation LLM LISA ์ค์ต!!
LISA, ์ด๋ฆ์ด ์ด์์ง์~!?
Large Language Instructed Segmentation Assistant
์ ์ฝ์๋ผ๊ณ ํฉ๋๋ค!
์ด๋ฒ ํฌ์คํ
์ ์ด๋ฏธ์ง๋ฅผ ์ถ๋ก ์ ์ํ์ฌ Segmentation ํ๋ LISA ๋ชจ๋ธ ์ค์ต์
๋๋ค!
๋ชจ๋ธ์ด ๋๋ฌด ์ ๊ธฐํด์ ๋จผ์ ์ค์ต๋ถํฐ ํ๊ณ !
์ด๋ก ์ ๋ํ์ฌ ์์๋ณด์์~!!
โ
๐งฑ 1. LISA Git Clone
- ๊ณต์ Git ์ฌ์ดํธ์์ Repo๋ฅผ Clone ํฉ๋๋ค!!
1
git clone git@github.com:dvlab-research/LISA.git
๐ฆ 2. ๊ฐ์ํ๊ฒฝ์์์ ํ์ ํจํค์ง ์ค์น!!
์ ๋ conda ๊ฐ์ํ๊ฒฝ์์ ํ์ ํจํค์ง๋ค์ ์ค์นํ์ต๋๋ค!!
1
2
conda create -n lisa python=3.9 -y
conda activate lisa
์ด์ , repo์์ ์ ๊ณตํ๋ requirements๋ฅผ ์ค์นํด์ค๋๋ค!
1
2
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
์ฌ๊ธฐ์!! ์ค์ํ์ ์ requirements.txt
์์ ์ ์ผ ์๋ถ๋ถ์
--extra-index-url https://download.pytorch.org/whl/cu117
๋ถ๋ถ์, ๋์ cuda ๋ฒ์ ผ์ ๋ง๊ฒ ๋ณ๊ฒฝํด์ฃผ์ด์ผํ๋ค๋ ์ ์
๋๋ค!!
์ด์ ์ค์น ๋~~
๐ง 3. LISA ๋ชจ๋ธ ์คํ!!
์์์ ์์ ์ ํ๊ฒฝ์ 1๊ฐ์ GeForce RTX 4090
, 24GB ์
๋๋ค!! ์ด์ ์ผ๋ฐ ์ถ๋ก ๋ชจ๋ธ์ ์ฌ์ฉํ๋ฉด ์๋์ ๊ฐ์ด ๋ฉ๋ชจ๋ฆฌ ๋ถ์กฑ์ด ๋จ๊ฒ ๋์ด์!
1
CUDA_VISIBLE_DEVICES=0 python chat.py --version='xinlai/LISA-13B-llama2-v1'
1
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 136.00 MiB. GPU 0 has a total capacity of 23.49 GiB of which 116.19 MiB is free. Including non-PyTorch memory, this process has 22.92 GiB memory in use. Of the allocated memory 22.54 GiB is allocated by PyTorch, and 2.73 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
๊ทธ๋ฌ๋!! LISA๋ ์น์ ํ๊ฒ single 24G or 12G GPU ์์๋ ์คํ ๊ฐ๋ฅํ ๊ฒฝ๋ํ ๋ชจ๋ธ์ ์ ๊ณตํ๊ธฐ์,
ํด๋น ๋ชจ๋ธ๋ก ์คํํ์์ต๋๋ค!!
1
CUDA_VISIBLE_DEVICES=0 python chat.py --version='xinlai/LISA-13B-llama2-v1' --precision='fp16' --load_in_8bit
๊ทธ๋ผ~~ ์ง์!~! ์ด์ prompt ์ ๋ ฅํ๋ผ๊ณ ๋์ต๋๋ค~~
1
Please input your prompt:
์ ๋ ์นซ์ ์ด๋ฏธ์ง๋ฅผ ๋์์ผ๋ก ์๋์ ๊ฐ์ด ์ด๋ฅผ ๋ฆ๋ ๋ถ๋ถ์ ๋ฌผ์ด๋ณด์์ด์!
The part of a toothbrush used to remove food particles from teeth is called the bristles.
๊ทธ๋ผ!!
์๋์ ๊ฐ์ด ๋ต๋ณ์ด ๋์ต๋๋ค~~!
1
2
3
4
5
Please input your prompt: The part of a toothbrush used to remove food particles from teeth is called the bristles.
Please input the image path: /home/user/data/AGD20K/Seen/trainset/egocentric/brush_with/toothbrush/toothbrush_000127.jpg
text_output: <s>A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. USER: <im_start> <im_end> The part of a toothbrush used to remove food particles from teeth is called the bristles. ASSISTANT: Sure, [SEG] .</s>
./vis_output/The part of a toothbrush used to remove food particles from teeth is called the bristles. _toothbrush_000127_mask_0.jpg has been saved.
./vis_output/The part of a toothbrush used to remove food particles from teeth is called the bristles. _toothbrush_000127_masked_img_0.jpg has been saved.
์ด์ง ์ด๋ก ๋ด์ฉ์ ๊ณต์ ํ์๋ฉด, ๋ต๋ณ์ [SEG] ๋ผ๋ ํํ์ด ์๋ค๋ฉด ๋ณ๋์ mask ์ ๋ณด๊ฐ ์๋ค๋ ๋ป์ด๊ตฌ์!
๊ทธ๋์ ๋ง์ง๋ง 2์ค๊ณผ ๊ฐ์ด ์ด๋ฏธ์ง๊ฐ ์ ์ฅ๋์๋ค๊ณ ์๋ ค์ค๋๋ค!!
์ด๋ฏธ์ง๋ฅผ ๋ณผ๊น์!?
์๋์ ๊ฐ์ด ํ์คํ๊ฒ ์ ๋ถ๋ถ๋ง ์ ๋ถ๋ฅํ๋ค์~!!
์ด ์ธ์๋ ํ ์คํธํด๋ณด์๋ ๊ฒฐ๊ณผ๋ฌผ์ ์๋์ ๊ฐ์ด ๊ณต์ ํฉ๋๋ค~!
Which part of a baseball bat is the handle that people hold?
Wearing glove : ๋ฑ ์๋ถ๋ถ๋ง ํ๋ฉด ์ข๊ฒ ์ง๋ง ๊ทธ๋ ๊ฒ๋ ์๋๋ค์!. ๊ธ๋ก๋ธ๋ฅผ ๋ผ๊ณ ์๋ ์ฌ๋ ์ผ๋ก ์ดํดํ๋๋ณด์์!
A vegetable thatโs healthy but not liked by most kids : ๋ธ๋ก์ฝ๋ฆฌ.. ์ ๋ชปํ๋๊ตฐ์!!
๐ ๋ง๋ฌด๋ฆฌ
Segmentation์ ์๋! ์ด์ ๋ ๋จ์ Segmentation์ ๋์ด ์ถ๋ก ๊น์ง!!
์์ผ๋ก ์ผ๋ง๋ ๋ ๋ฐ์ ๋ ์ง ๊ธฐ๋๋ฉ๋๋ค~!