README.md 4.7 KB
Newer Older
kihoon.lee's avatar
kihoon.lee committed
1
# LogicKor
kihoon.lee's avatar
kihoon.lee committed
2
3
각 Task별 롯데 GPT의 정량적 성능 측정을 위한 LogicKor 기반 벤치마크
  
kihoon.lee's avatar
kihoon.lee committed
4
12B 기준 80GB 1장 가능
kihoon.lee's avatar
kihoon.lee committed
5

kihoon.lee's avatar
kihoon.lee committed
6
### 1. 인퍼런스 결과 생성
kihoon.lee's avatar
kihoon.lee committed
7

kihoon.lee's avatar
kihoon.lee committed
8
9
```bash
python generator.py --model LDCC/Chat-Mistral-Nemo-12B-32k --gpu_devices 2 --model_len 32000
kihoon.lee's avatar
update    
kihoon.lee committed
10
11

python3 lotte-generator.py --model LDCC/Chat-Mistral-Nemo-12B-32k --gpu_devices 2 --model_len 32000
kihoon.lee's avatar
kihoon.lee committed
12
13
14
```


kihoon.lee's avatar
kihoon.lee committed
15
#### 2. 모델 평가 with OpenAI
kihoon.lee's avatar
kihoon.lee committed
16

kihoon.lee's avatar
kihoon.lee committed
17
```bash
kihoon.lee's avatar
update    
kihoon.lee committed
18
19
20
python evaluator.py -o generated/LDCC/Chat-Mistral-Nemo-12B-32k -m LDCC/Chat-Mistral-Nemo-12B-32k -k sk-### -t 30 -j gpt-4o

python lotte-evaluator.py -o generated/LDCC/Chat-Mistral-Nemo-12B-32k -m LDCC/Chat-Mistral-Nemo-12B-32k -k sk-### -t 30 -j gpt-4o
kihoon.lee's avatar
kihoon.lee committed
21
```
kihoon.lee's avatar
kihoon.lee committed
22
23


kihoon.lee's avatar
kihoon.lee committed
24
### 3. 결과 확인
kihoon.lee's avatar
kihoon.lee committed
25

kihoon.lee's avatar
kihoon.lee committed
26
27
28
29
```bash
python score.py -p evaluated/LDCC/Chat-Mistral-Nemo-12B-32k/default.jsonl
python score.py -p evaluated/LDCC/Chat-Mistral-Nemo-12B-32k/1-shot.jsonl
python score.py -p evaluated/LDCC/Chat-Mistral-Nemo-12B-32k/cot-1-shot.jsonl
kihoon.lee's avatar
update    
kihoon.lee committed
30
python lotte-score.py -p evaluated/LDCC/Chat-Mistral-Nemo-12B-32k/lotte_single_turn.jsonl
kihoon.lee's avatar
kihoon.lee committed
31
```
kihoon.lee's avatar
kihoon.lee committed
32

kihoon.lee's avatar
kihoon.lee committed
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
### default
| Category | Single turn | Multi turn |
|---|---|---|
| 추론(Reasoning) | 9.43 | 9.14 |
| 코딩(Coding) | 9.71 | 9.14 |
| 글쓰기(Writing) | 9.86 | 9.29 |
| 수학(Math) | 8.86 | 9.14 |
| 이해(Understanding) | 10.00 | 10.00 |
| 문법(Grammar) | 9.14 | 10.00 |

| Category | Score |
|---|---|
| Single turn | 9.50 |
| Multi turn | 9.45 |
| Overall | 9.48 |

### 1-shot
| Category | Single turn | Multi turn |
|---|---|---|
| 수학(Math) | 8.29 | 9.29 |
| 추론(Reasoning) | 9.57 | 7.43 |
| 코딩(Coding) | 9.71 | 9.00 |
| 글쓰기(Writing) | 9.71 | 9.00 |
| 이해(Understanding) | 9.43 | 10.00 |
| 문법(Grammar) | 10.00 | 10.00 |

| Category | Score |
|---|---|
| Single turn | 9.45 |
| Multi turn | 9.12 |
| Overall | 9.29 |

### cot-1-shot
| Category | Single turn | Multi turn |
|---|---|---|
| 추론(Reasoning) | 9.71 | 9.71 |
| 수학(Math) | 6.57 | 8.00 |
| 코딩(Coding) | 9.57 | 9.29 |
| 글쓰기(Writing) | 9.86 | 9.71 |
| 이해(Understanding) | 9.57 | 10.00 |
| 문법(Grammar) | 10.00 | 10.00 |

| Category | Score |
|---|---|
| Single turn | 9.21 |
| Multi turn | 9.45 |
kihoon.lee's avatar
kihoon.lee committed
79
80
| Overall | 9.33 |

kihoon.lee's avatar
update    
kihoon.lee committed
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
### lotte score
| Category | Single turn |
|---|---|
| task_assistant_mail_introduce | 9.00 |
| text2sql | 8.00 |
| task_assistant_mail_meeting | 9.00 |
| task_assistant_mail_share | 9.00 |
| search_keyword | 1.00 |
| mrc | 3.00 |
| task_assistant_mail_pr | 9.00 |
| lotte_qa | 9.00 |
| search_summary | 9.00 |
| meeting_summary | 9.00 |
| task_assistant_hire | 8.00 |
| review_summary | 8.00 |

kihoon.lee's avatar
kihoon.lee committed
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125

### 문제 예시
```json
{"id": 42, "category": "문법(Grammar)", "questions": ["나는어제친구와김치찌개를먹었다.\n\n이 문장을 올바르게 띄어 써보아라.", "아래 문장의 높임 표현을 올바르게 수정보아라.\n\n할머니가 밥을 먹는다."], "references": ["나는 어제 친구와 김치찌개를 먹었다.", "할머니께서 진지를 잡수신다."]}

{"id": 22, "category": "코딩(Coding)", "questions": ["시간 복잡도를 어떻게 구할 수 있는지 설명해주고, 많이 쓰이는 알고리즘 중에 최적화를 통해 시간 복잡도를 줄인 예시를 알려줘.", "공간 복잡도라는 용어도 있던데 뭐가 다른 거야?"], "references": [null, null]}
```

### category
각각 7개씩 존재하며, 단일턴과 멀티턴(2턴)으로 구성되어있음.
- **기존**
	- 추론(Reasoning)
	- 수학(Math)
	- 글쓰기(Writing)
	- 코딩(Coding)
	- 이해(Understanding)
	- 문법(Grammar)
- **추가 예정**
	- [키워드 검색](https://ldccai.lotte.net/gitlab/wonchul_kim/koalpaca/-/blob/main/data_chat/instruct/search_keyword.json)
	- [검색 raw 데이터 요약](https://ldccai.lotte.net/gitlab/wonchul_kim/koalpaca/-/blob/main/data_chat/instruct/search_summary.json)
	- [상품 리뷰 요약](https://ldccai.lotte.net/gitlab/wonchul_kim/koalpaca/-/blob/main/data_chat/instruct/review_summary.json)
	- [회의 요약](https://ldccai.lotte.net/gitlab/wonchul_kim/koalpaca/-/blob/main/data_chat/instruct/meeting_summary.json)
	- [업무도우미](https://ldccai.lotte.net/gitlab/wonchul_kim/koalpaca/-/blob/main/data_chat/instruct/task_assistant.json)
	- [상품 리뷰 요약](https://ldccai.lotte.net/gitlab/wonchul_kim/koalpaca/-/blob/main/data_chat/instruct/review_summary.json)
	- [text2sql](https://ldccai.lotte.net/gitlab/wonchul_kim/koalpaca/-/blob/main/data_chat/instruct/text2sql.json)
	- [sql2text](https://ldccai.lotte.net/gitlab/wonchul_kim/koalpaca/-/blob/main/data_chat/instruct/sql2answer.json)
	- [감성채팅](https://ldccai.lotte.net/gitlab/wonchul_kim/koalpaca/-/blob/main/data_chat/empathetic_dialogues_mutli_turn.json)
	- [롯데QA](https://ldccai.lotte.net/gitlab/wonchul_kim/koalpaca/-/blob/main/data_chat/lotte/%EB%A1%AF%EB%8D%B0QA_240105.json)