README.md 1.72 KB
Newer Older
kihoon.lee's avatar
kihoon.lee committed
1
# LogicKor
kihoon.lee's avatar
kihoon.lee committed
2

kihoon.lee's avatar
kihoon.lee committed
3
한국어 언어모델 다분야 사고력 벤치마크
kihoon.lee's avatar
kihoon.lee committed
4

kihoon.lee's avatar
kihoon.lee committed
5
12B 기준 80GB 1장 가능
kihoon.lee's avatar
kihoon.lee committed
6

kihoon.lee's avatar
kihoon.lee committed
7
### 1. 인퍼런스 결과 생성
kihoon.lee's avatar
kihoon.lee committed
8

kihoon.lee's avatar
kihoon.lee committed
9
10
```bash
python generator.py --model LDCC/Chat-Mistral-Nemo-12B-32k --gpu_devices 2 --model_len 32000
kihoon.lee's avatar
kihoon.lee committed
11
12
13
```


kihoon.lee's avatar
kihoon.lee committed
14
#### 2. 모델 평가 with OpenAI
kihoon.lee's avatar
kihoon.lee committed
15

kihoon.lee's avatar
kihoon.lee committed
16
17
18
```bash
python evaluator.py -o generated/LDCC/Chat-Mistral-Nemo-12B-32k -k sk-### -t 30
```
kihoon.lee's avatar
kihoon.lee committed
19
20


kihoon.lee's avatar
kihoon.lee committed
21
### 3. 결과 확인
kihoon.lee's avatar
kihoon.lee committed
22

kihoon.lee's avatar
kihoon.lee committed
23
24
25
26
27
```bash
python score.py -p evaluated/LDCC/Chat-Mistral-Nemo-12B-32k/default.jsonl
python score.py -p evaluated/LDCC/Chat-Mistral-Nemo-12B-32k/1-shot.jsonl
python score.py -p evaluated/LDCC/Chat-Mistral-Nemo-12B-32k/cot-1-shot.jsonl
```
kihoon.lee's avatar
kihoon.lee committed
28

kihoon.lee's avatar
kihoon.lee committed
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
### default
| Category | Single turn | Multi turn |
|---|---|---|
| 추론(Reasoning) | 9.43 | 9.14 |
| 코딩(Coding) | 9.71 | 9.14 |
| 글쓰기(Writing) | 9.86 | 9.29 |
| 수학(Math) | 8.86 | 9.14 |
| 이해(Understanding) | 10.00 | 10.00 |
| 문법(Grammar) | 9.14 | 10.00 |

| Category | Score |
|---|---|
| Single turn | 9.50 |
| Multi turn | 9.45 |
| Overall | 9.48 |

### 1-shot
| Category | Single turn | Multi turn |
|---|---|---|
| 수학(Math) | 8.29 | 9.29 |
| 추론(Reasoning) | 9.57 | 7.43 |
| 코딩(Coding) | 9.71 | 9.00 |
| 글쓰기(Writing) | 9.71 | 9.00 |
| 이해(Understanding) | 9.43 | 10.00 |
| 문법(Grammar) | 10.00 | 10.00 |

| Category | Score |
|---|---|
| Single turn | 9.45 |
| Multi turn | 9.12 |
| Overall | 9.29 |

### cot-1-shot
| Category | Single turn | Multi turn |
|---|---|---|
| 추론(Reasoning) | 9.71 | 9.71 |
| 수학(Math) | 6.57 | 8.00 |
| 코딩(Coding) | 9.57 | 9.29 |
| 글쓰기(Writing) | 9.86 | 9.71 |
| 이해(Understanding) | 9.57 | 10.00 |
| 문법(Grammar) | 10.00 | 10.00 |

| Category | Score |
|---|---|
| Single turn | 9.21 |
| Multi turn | 9.45 |
| Overall | 9.33 |