๐ Understanding LISA - LISA ์์๋ณด๊ธฐ?!!
๐ง (English) LISA: A New Frontier in Reasoning-Based Segmentation
๐ An innovative model that understands complex linguistic instructions and segments the corresponding regions in an image!
Paper: LISA: Reasoning Segmentation via Large Language Model
Conference: CVPR 2024 (by CUHK, MSRA, SmartMore)
Code: dvlab-research/LISA
Comment: A groundbreaking approach combining the language understanding ability of LLMs with visual segmentation!
โ Limitations of Existing Visual Recognition Systems
Despite many high-performance segmentation models, they lack the ability to understand implicit user intent and perform
reasoning
!
- Explicit instructions required: Users must directly specify the target object.
- Dependent on predefined categories: Difficult to handle new objects or scenarios flexibly.
- Lacks complex reasoning: Cannot understand or process instructions like โfoods rich in Vitamin C.โ
โก๏ธ To overcome these limitations,
a new task called โreasoning segmentationโ was introduced, based on complex and implicit language instructions!
Example from the paper: when someone says โChange the TV channel,โ a robot doesnโt understand.
Instead, it needs a command like โgo to the table, find the remote, press the channel button.โ LISA introduces reasoning to solve such issues.
โ Key Features of LISA!
๐ 1. Reasoning Segmentation
- Understands complex language instructions:
Able to process commands like โSegment the US president in this image and explain why.โ - Utilizes world knowledge:
For example, โfoods rich in Vitamin C.โ - Provides explanation:
Can generate explanations for the segmentation output.
๐ง 2. Unified Processing! LISA Model Architecture
- SEG Token Introduction:
Introduces a new tokenSEG
and uses the embedding-as-mask paradigm. - Multimodal LLM Integration:
Combines LLMโs language understanding with visual information. - End-to-End Training:
Directly maps language instruction + image to segmentation mask.
๐ 3. Creation of the ReasonSeg Benchmark!
To evaluate LISAโs performance, a new benchmark called ReasonSeg was created!
- ๐ฆ Total samples: 1218
- ๐งช Data split:
- Train: 239
- Validation: 200
- Test: 779
- ๐ผ Image sources: OpenImages, ScanNetv2
- ๐ Instruction types: short phrases + complex sentences
ReasonSeg is designed to evaluate the modelโs reasoning-based segmentation capabilities.
๐๏ธโโ๏ธ Training Methodology
LISA is trained in an end-to-end manner using the following three main data sources:
1. Semantic Segmentation Datasets
Datasets: ADE20K, COCO-Stuff, LVIS-PACO Learn โwhat it isโ (e.g., chair)
- Input: image + class name
- Output: binary mask ๏ธโ learns pixel-level semantic understanding
- QA Format Example:
1
2
USER: <IMAGE> Can you segment the chair in this image?\\
ASSISTANT: It is <SEG>.
2. Referring Segmentation Datasets
Datasets: refCOCO, refCOCO+, refCOCOg, refCLEF
Theseref*
datasets are known to facilitatereasoning understanding
!
Explicit referring expressions converted into QA format: โthe red chair on the rightโ โ โCan you segment the red chair on the right in this image?โ
Learns not only โwhatโ but also โwhich one specificallyโ (e.g., wooden chair)
- Input: image + explicit object description
- Output: binary mask for the target object ๏ธโ learns to localize and segment based on natural language
3. Visual Question Answering (VQA)
๐ Important: Even though reasoning segmentation examples were not included,
LISA performed impressively on ReasonSeg in zero-shot setting!
- Input: image + natural language question
- Output: natural language answer ๏ธโ learns to integrate visual and language understanding
Models used:
- LLaVA-Instruct-150k (v1)
- LLaVA-v1.5-mix665k (v1.5)
๐ LISA Architecture: Embedding-as-Mask Paradigm
Prior polygon-sequence methods are expensive and less generalizable.
LISA introduces a new structure called Embedding-as-Mask.
๐ Key Components
- Add
<SEG>
token to specify segmentation request - Extract
<SEG>
embedding from the last LLM layer - Pass through MLP to generate mask embedding
- Combine with vision encoder features and pass to decoder
- Output final binary mask
To better understand the mask output flow from SEG, we follow the pseudocode below:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# Image and text input\\
x_img = load_image_tensor(...) # [3, H, W]\\
x_txt = "Can you segment the red chair in this image? It is <SEG>."
# 1. Tokenize text and find <SEG> token index\\
input_ids = tokenizer(x_txt, return_tensors='pt')\\
seg_token_index = input_ids.input_ids[0].tolist().index(tokenizer.convert_tokens_to_ids("<SEG>"))
# 2. Vision Encoder extracts image features\\
f_img = vision_encoder(x_img) # [B, C, H', W']
# 3. Multimodal LLM encoding\\
output_hidden_states = multimodal_llm(input_ids, image_features=f_img, output_hidden_states=True)
# 4. Extract embedding for <SEG> from final hidden state\\
h_tilde_seg = output_hidden_states.last_hidden_state[0, seg_token_index] # [hidden_dim]
# 5. Project with MLP\\
h_seg = mlp_projection(h_tilde_seg) # [proj_dim]
# 6. Decode to segmentation mask\\
pred_mask = mask_decoder(h_seg, f_img) # [1, H, W]
# 7. Loss function\\
loss = bce_loss(pred_mask, gt_mask) + dice_loss(pred_mask, gt_mask)
๐ฏ Training Objective Function
๐ = ฮป_txt * ๐_txt + ฮป_mask * ๐_mask
๐_txt: Text generation loss (Auto-regressive CE)
๐_mask: Mask loss = BCE + DICE
ฮป_txt, ฮป_mask : Hyperparameter
๐ 1. Text Generation Loss ๐_txt
Evaluates accuracy of the natural language portion before
<SEG>
- Uses autoregressive cross-entropy loss, same as typical language modeling
๐ 2. Mask Loss ๐_mask
Evaluates segmentation mask accuracy generated from
<SEG>
token embedding
Combines two losses:
- BCE (pixel-wise accuracy)
- DICE (overall shape similarity)
๐ Efficiency and Performance
Model | GPU Resources | Training Time |
---|---|---|
VisionLLM | 4 ร 8 ร A100 80GB | 50 Epochs (unrealistic) |
LISA-7B | 8 ร RTX 3090 24GB | < 3 days |
LISA is a practical segmentation model that excels in both efficiency and performance.
โจ Conclusion
LISA empowers multimodal LLMs with reasoning-based image segmentation,
evolving them into models capable of understanding and executing complex natural language instructions.
๐ฎ Initially, it seemed like multimodal models could do everything alone.
But going forward, we expect new models of varying styles and perhaps a unified solution to integrate them all!
๐ง (ํ๊ตญ์ด) LISA: ์ถ๋ก ๊ธฐ๋ฐ ์ธ๊ทธ๋ฉํ ์ด์ ์ ์๋ก์ด ์งํ
๐ ๋ณต์กํ ์ธ์ด ์ง์๋ฅผ ์ดํดํ๊ณ , ์ด๋ฏธ์ง์์ ํด๋น ์์ญ์ ๋ถํ ํ๋ ํ์ ์ ์ธ ๋ชจ๋ธ!
๋ ผ๋ฌธ: LISA: Reasoning Segmentation via Large Language Model
๋ฐํ: CVPR 2024 (by CUHK, MSRA, SmartMore)
์ฝ๋: dvlab-research/LISA
์ฝ๋ฉํธ: LLM์ ์ธ์ด ์ดํด ๋ฅ๋ ฅ์ ์๊ฐ ๋ถํ ์ ์ ๋ชฉํ ํ๊ธฐ์ ์ธ ์ ๊ทผ!
โ ๊ธฐ์กด ์๊ฐ ์ธ์ ์์คํ ์ ํ๊ณ
์ฌ๋ฌ, ์ฑ๋ฅ ์ข์ Segmentation ๋ชจ๋ธ๋ค์ด ๋์์ง๋ง!!
์ด๋ฌํ ์์คํ ์ ์์์ ์ธ ์ฌ์ฉ์ ์๋๋ฅผ ์ดํดํ๊ณ์ถ๋ก
ํ๋ ๋ฅ๋ ฅ์ด ๋ถ์กฑํ๋ค!
- ๋ช ์์ ์ง์ ํ์: ์ฌ์ฉ์๊ฐ ์ง์ ์ ์ผ๋ก ๋์ ๊ฐ์ฒด๋ฅผ ์ง์ ํด์ผ ํจ.
- ์ฌ์ ์ ์๋ ๋ฒ์ฃผ ์์กด: ์๋ก์ด ๊ฐ์ฒด๋ ์ํฉ์ ๋ํ ์ ์ฐํ ๋์์ด ์ด๋ ค์.
- ๋ณต์กํ ์ถ๋ก ๋ถ์กฑ: โ๋นํ๋ฏผ C๊ฐ ๋ง์ ์์โ๊ณผ ๊ฐ์ ๋ณต์กํ ์ง์๋ฅผ ์ดํดํ๊ณ ์ฒ๋ฆฌํ๋ ๋ฐ ํ๊ณ๊ฐ ์์.
โก๏ธ ์ด๋ฌํ ํ๊ณ๋ฅผ ๊ทน๋ณตํ๊ธฐ ์ํด,
๋ณต์กํ๊ณ ์์์ ์ธ ์ธ์ด ์ง์๋ฅผ ๊ธฐ๋ฐ์ผ๋ก ์ด๋ฏธ์ง์์ ํน์ ์์ญ์ ๋ถํ ํ๋
โ์ถ๋ก ๋ถํ (reasoning segmentation)โ์ด๋ผ๋ ์ฐ๊ตฌ๋ฅผ ์งํ!!
๋ ผ๋ฌธ์์ ๋์จ ์๋ก๋ TV ์ฑ๋์ ๋ฐ๊ฟ! ํ์๋ ์ฌ๋์ ์ดํดํ์ง๋ง ๋ก๋ด์ ์ดํด๋ฅผ ๋ชป ํ๊ธฐ์,
ํ ์ด๋ธ๋ก ๊ฐ์, ๋ฆฌ๋ชจ์ปจ์ ์ฐพ๊ณ , ์ฑ๋๋ณ๊ฒฝ ๋ฒํผ์ ๋๋ฌ! ๋ผ๊ณ ๋ช ๋ นํด์ผํ๋๋ฐ,
์ด๋ฐ ๋จ์ ์ ํด๊ฒฐํ๊ณ ์ ์ถ๋ก ๊ธฐ๋ฅ์ ๋ฃ์๊ฒ์!
โ LISA์ ํต์ฌ ํน์ง!
๐ 1. ์ถ๋ก ๋ถํ (Reasoning Segmentation)
- ๋ณต์กํ ์ธ์ด ์ง์ ์ดํด:
โ์ด ์ด๋ฏธ์ง์์ ๋ฏธ๊ตญ ๋ํต๋ น์ด ๋๊ตฌ์ธ์ง ๋ถํ ๋ง์คํฌ๋ฅผ ์ถ๋ ฅํ๊ณ ์ด์ ๋ฅผ ์ค๋ช ํ์ธ์.โ ์ ๊ฐ์ ์ง์ ์ฒ๋ฆฌ ๊ฐ๋ฅ - ์ธ๊ณ ์ง์ ํ์ฉ:
โ๋นํ๋ฏผ C๊ฐ ๋ง์ ์์โ ๋ฑ ์ค์ ์ง์์ ํ์ฉํด ์ ์ ํ ์์ญ ๋ถํ - ์ค๋ช
์ ๊ณต:
๋ถํ ๊ฒฐ๊ณผ์ ๋ํ ์ด์ ์ ์ค๋ช ์์ฑ ๊ฐ๋ฅ
๐ง 2. ํ๋ฐฉ์ผ๋ก ์ฒ๋ฆฌ! LISA ๋ชจ๋ธ ๊ตฌ์กฐ
- SEG ํ ํฐ ๋์
:
์๋ก์ด ํ ํฐSEG
๋ฅผ ํ์ฉํด, ์๋ฒ ๋ฉ ์์ฒด๋ฅผ ๋ง์คํฌ๋ก ํด์ํ๋ embedding-as-mask ํจ๋ฌ๋ค์ ์ฌ์ฉ - ๋ค์ค ๋ชจ๋ฌ LLM ํ์ฉ:
๋ํ ์ธ์ด ๋ชจ๋ธ์ ์ธ์ด ์ดํด ๋ฅ๋ ฅ์ ์๊ฐ ์ ๋ณด์ ๊ฒฐํฉ - End-to-End ํ์ต:
์ธ์ด ์ง์ + ์ด๋ฏธ์ง โ ์ง์ ๋ง์คํฌ ์์ฑ๊น์ง ์ด์ด์ง๋ ๊ตฌ์กฐ
๐ 3. ReasonSeg ๋ฒค์น๋งํฌ๋ผ๋ ๊ฒ์ ๋ง๋ฆ!!
LISA์ ์ฑ๋ฅ์ ํ๊ฐํ๊ธฐ ์ํด ReasonSeg๋ผ๋ ์๋ก์ด ๋ฒค์น๋งํฌ๊ฐ ๊ตฌ์ถํจ!!
- ๐ฆ ์ด ์ํ ์: 1218
- ๐งช ๋ฐ์ดํฐ ๊ตฌ์ฑ:
- Train: 239๊ฐ
- Validation: 200๊ฐ
- Test: 779๊ฐ
- ๐ผ ์ด๋ฏธ์ง ์ถ์ฒ: OpenImages, ScanNetv2
- ๐ ์ง์๋ฌธ ๊ตฌ์ฑ: ์งง์ ๊ตฌ + ๋ณต์กํ ๋ฌธ์ฅ
ReasonSeg๋ ๋ชจ๋ธ์ด ์ค์ ์ถ๋ก ๊ธฐ๋ฐ ๋ถํ ๊ณผ์ ๋ฅผ ์ผ๋ง๋ ์ ์ํํ ์ ์๋์ง๋ฅผ ํ๊ฐํ๊ธฐ ์ํด ์ค๊ณ
๐๏ธโโ๏ธ ๋ชจ๋ธ ํ์ต ๋ฐฉ๋ฒ!
LISA๋ end-to-end ๋ฐฉ์์ผ๋ก ํ์ต๋๋ฉฐ, ๋ค์ ์ธ ๊ฐ์ง ์ฃผ์ ๋ฐ์ดํฐ ์์ค๋ก ๊ตฌ์ฑ๋ฉ๋๋ค:
1. Semantic Segmentation Datasets
์ฌ์ฉ ๋ฐ์ดํฐ์ : ADE20K, COCO-Stuff, LVIS-PACO
๋ฌด์์ด๋
(ex. ์์) ์ ๋ํ ํ์ต!
- ์ ๋ ฅ: ์ด๋ฏธ์ง + ํด๋์ค ์ด๋ฆ
- ์ถ๋ ฅ: ์ด์ง ๋ง์คํฌ
โ ํฝ์ ์์ค ์๋งจํฑ ์ดํด ํ์ต - QA ํฌ๋งท ์์:
1
2
USER: <IMAGE> Can you segment the chair in this image?
ASSISTANT: It is <SEG>.
2. Referring Segmentation Datasets
์ฌ์ฉ ๋ฐ์ดํฐ์ : refCOCO, refCOCO+, refCOCOg, refCLEF
์ ๋ฐ์ดํฐ ์ ์ ๋ชจ๋ref*
๋ฐ์ดํฐ ์ ์ผ๋ก์ถ๋ก ์ดํด
๋ฅผ ์ํ ๋ํ์ ์ธ ๋ฐ์ดํฐ์ !
์ด ๋๋ถ์ ์ถ๋ก ์ด ๊ฐ๋ฅํด์ง๋๊ฒ์ด์ง์ ~~
๋ช ์์ ๊ฐ์ฒด ์ง์๋ฌธ์ QA ํ์์ผ๋ก ๋ณํ : โthe red chair on the rightโ โ โCan you segment the red chair on the right in this image?โ
๋ฌด์์ ๋์ด์ด๋ค ๊ฒ์ด๋
๋ฅผ (ex. ๋๋ฌด ์์) ํ์ต
- ์ ๋ ฅ: ์ด๋ฏธ์ง + ๋ช ์์ ๊ฐ์ฒด ์ค๋ช
- ์ถ๋ ฅ: ๋์ ๊ฐ์ฒด์ ๋ํ ์ด์ง ๋ง์คํฌ
โ ์์ฐ์ด ๊ธฐ๋ฐ ๊ฐ์ฒด ์ง์ + ๋ถํ ๋ฅ๋ ฅ ํ์ต
3. Visual Question Answering (VQA)
๐ ์ค์ํ ์ : ํ์ต ๋ฐ์ดํฐ์๋ reasoning segmentation์ฉ ์์ ๊ฐ ํฌํจ๋์ง ์์์์๋,
LISA๋ ์ ๋ก์ท(zero-shot)์ผ๋ก ReasonSeg์์ ๋งค์ฐ ์ฐ์ํ ์ฑ๋ฅ์ ๋ณด์๋ค๋ ๊ฒ!
- ์ ๋ ฅ: ์ด๋ฏธ์ง + ์์ฐ์ด ์ง๋ฌธ
- ์ถ๋ ฅ: ์์ฐ์ด ๋ต๋ณ
โ ํ ์คํธ ์ดํด + ์๊ฐ ์ ๋ณด ํตํฉ ๋ฅ๋ ฅ ํ์ต - ์ฌ์ฉ ๋ชจ๋ธ
- LLaVA-Instruct-150k (v1)
- LLaVA-v1.5-mix665k (v1.5)
๐ LISA์ ๊ตฌ์กฐ : Embedding-as-Mask Paradigm
๊ธฐ์กด์ polygon ์ํ์ค ๊ธฐ๋ฐ ๋ถํ ๋ฐฉ์์ ์ฐ์ฐ ๋น์ฉ์ด ํฌ๊ณ ์ผ๋ฐํ์ ์ด๋ ค์์ด ์์๊ธฐ์,
LISA๋ Embedding-as-Mask๋ผ๋ ์๋ก์ด ๊ตฌ์กฐ๋ฅผ ๋์
ํฉ๋๋ค.
๐ ํต์ฌ ๊ตฌ์ฑ์์
<SEG>
ํ ํฐ ์ถ๊ฐ โ ๋ถํ ์์ฒญ์ ๋ช ์- LLM์ ๋ง์ง๋ง ๋ ์ด์ด์์
<SEG>
์๋ฒ ๋ฉ ์ถ์ถ - MLP๋ฅผ ํตํด ๋ง์คํฌ ์๋ฒ ๋ฉ ์์ฑ
- Vision Encoder์์ ์ถ์ถํ ์๊ฐ ํน์ง๊ณผ ํจ๊ป ๋์ฝ๋์ ์ ๋ ฅ
- ์ต์ข ์ด์ง ๋ง์คํฌ ์ถ๋ ฅ
์์ SEG๋ก ๋ถํฐ ๋ง์คํฌ ์ถ๋ ฅ ๋ถ๋ถ์ ์กฐ๊ธ ๋ ์ฝ๊ฒ ์ดํดํ๊ณ ์
์๋์ ๊ฐ์ด psuedo code๋ก ํ๋ฆ์ ํ์
ํด๋ณด์์ต๋๋ค!!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# ์ด๋ฏธ์ง์ ํ
์คํธ ์
๋ ฅ
x_img = load_image_tensor(...) # [3, H, W]
x_txt = "Can you segment the red chair in this image? It is <SEG>."
# 1. ํ
์คํธ ํ ํฌ๋์ด์ฆ & <SEG> ์์น ํ์ธ
input_ids = tokenizer(x_txt, return_tensors='pt')
seg_token_index = input_ids.input_ids[0].tolist().index(tokenizer.convert_tokens_to_ids("<SEG>"))
# 2. Vision Encoder: ์ด๋ฏธ์ง ํน์ง ์ถ์ถ
f_img = vision_encoder(x_img) # shape: [B, C, H', W']
# 3. Multimodal LLM ์ธ์ฝ๋ฉ
# (์ด๋ฏธ์ง ํ ํฐ + ํ
์คํธ ํ ํฐ โ LLM์ผ๋ก ์ธ์ฝ๋ฉ)
output_hidden_states = multimodal_llm(input_ids, image_features=f_img, output_hidden_states=True)
# 4. ๋ง์ง๋ง ๋ ์ด์ด์์ <SEG> ํ ํฐ์ ์๋ฒ ๋ฉ ์ถ์ถ
h_tilde_seg = output_hidden_states.last_hidden_state[0, seg_token_index] # shape: [hidden_dim]
# 5. MLP๋ฅผ ํตํด h_seg ์์ฑ
h_seg = mlp_projection(h_tilde_seg) # shape: [proj_dim]
# 6. ๋ง์คํฌ ๋์ฝ๋: h_seg + ์ด๋ฏธ์ง ํผ์ฒ โ ๋ถํ ๋ง์คํฌ ์์ฑ
pred_mask = mask_decoder(h_seg, f_img) # shape: [1, H, W], binary segmentation
# 7. Loss ๊ณ์ฐ (ํ์ต ์ค์ผ ๊ฒฝ์ฐ)
loss = bce_loss(pred_mask, gt_mask) + dice_loss(pred_mask, gt_mask)
๐ฏ ํ์ต ๋ชฉํ ํจ์
๐ = ฮป_txt * ๐_txt + ฮป_mask * ๐_mask
๐_txt: ํ
์คํธ ์์ฑ ์์ค (Auto-regressive CE)
๐_mask: ๋ง์คํฌ ์์ค = BCE + DICE
ฮป_txt, ฮป_mask : ๋ ์์ค ํญ๋ชฉ์ ๊ฐ์ค์น (ํ์ดํผํ๋ผ๋ฏธํฐ)
๐ 1. ํ
์คํธ ์์ฑ ์์ค ๐_txt
LLM์ด ์์ฑํ ์๋ต ๋ฌธ์ฅ์์
<SEG>
์ด์ ์ ์์ฐ์ด ํ ์คํธ ๋ถ๋ถ์ ์ ํ๋๋ฅผ ํ๊ฐ!!
- ์ผ๋ฐ์ ์ธ ์ธ์ด ๋ชจ๋ธ ํ์ต ๋ฐฉ์๊ณผ ๋์ผํ
Autoregressive Cross-Entropy Loss ์ฌ์ฉ
๐ 2. ๋ง์คํฌ ์์ค ๐_mask
<SEG>
ํ ํฐ์์ ์ถ์ถํ ์๋ฒ ๋ฉ์ ํตํด ์์ฑ๋ ๋ถํ ๋ง์คํฌ์ ์ ํ๋๋ฅผ ํ๊ฐ!
๋ ๊ฐ์ง ์์ค์ ์กฐํฉํ์ฌ ์ฌ์ฉํฉ๋๋ค:
- BCE (Binary Cross-Entropy): ํฝ์ ๋จ์ ์ ํ๋
- DICE Loss: ์ ์ฒด ๋ง์คํฌ์ ํํ ์ ์ฌ๋ ๋ฐ์
๐ ์ฑ๋ฅ๊ณผ ํจ์จ์ฑ
๋ชจ๋ธ | GPU ์์ | ํ์ต ์๊ฐ |
---|---|---|
VisionLLM | 4 ร 8 ร A100 80GB | 50 Epochs (๋นํ์ค์ ) |
LISA-7B | 8 ร RTX 3090 24GB | 3์ผ ๋ฏธ๋ง |
LISA๋ ํจ์จ์ฑ๊ณผ ์ฑ๋ฅ์ ๋ชจ๋ ๋ง์กฑํ๋ ์ค์ฉ์ ๋ถํ ๋ชจ๋ธ์ ๋๋ค.
โจ ๊ฒฐ๋ก
LISA๋ ๊ธฐ์กด ๋ฉํฐ๋ชจ๋ฌ LLM์ ์ถ๋ก ๊ธฐ๋ฐ ์ด๋ฏธ์ง ๋ถํ ๋ฅ๋ ฅ์ ๋ถ์ฌํจ์ผ๋ก์จ,
๋จ์ํ ๋ช
๋ น ์ดํ์ ๋์ด์ ๋ณต์กํ ์ธ์ด์ ์์ฒญ์ ์ดํดํ๊ณ ์คํํ ์ ์๋ ๋ชจ๋ธ๋ก ์งํ!!
๐ฎ LLM ๋ง์ผ๋ก๋ ๊ทธ๋ฆฌ๊ณ ์ต๊ทผ๋์จ Multi Modal Model๋ก ๋ชจ๋ ๊ฒ์ ๋คํ ์ ์์๊ฒ ๊ฐ์๋๋ฐ.
ํฅํ์๋ ๋ ๋ค์ํ ์คํ์ผ์ Model ๋ค์ด ๋์ฌ๊ฒ ๊ฐ๊ณ ์ด๋ฐ๊ฒ์ ํ๋๋ก ํตํฉํ๋ ๋ฌด์ธ๊ฐ๋ ๋์ค๊ฒ ๊ตฐ์!!