README.md 4.88 KB
Newer Older
kihoon.lee's avatar
kihoon.lee committed
1
# LogicKor
kihoon.lee's avatar
kihoon.lee committed
2
3
각 Task별 롯데 GPT의 정량적 성능 측정을 위한 LogicKor 기반 벤치마크
  
kihoon.lee's avatar
kihoon.lee committed
4
12B 기준 80GB 1장 가능
kihoon.lee's avatar
kihoon.lee committed
5

kihoon.lee's avatar
update    
kihoon.lee committed
6
7
8
9
10
11
12
## Quick Start
```bash
sh start.sh
```
requirments 설치 뒤, 위의 sh 파일 실행하면 됩니다.

## Detailled Usage
kihoon.lee's avatar
kihoon.lee committed
13
### 1. 인퍼런스 결과 생성
kihoon.lee's avatar
kihoon.lee committed
14

kihoon.lee's avatar
kihoon.lee committed
15
16
```bash
python generator.py --model LDCC/Chat-Mistral-Nemo-12B-32k --gpu_devices 2 --model_len 32000
kihoon.lee's avatar
update    
kihoon.lee committed
17
18

python3 lotte-generator.py --model LDCC/Chat-Mistral-Nemo-12B-32k --gpu_devices 2 --model_len 32000
kihoon.lee's avatar
kihoon.lee committed
19
20
21
```


kihoon.lee's avatar
kihoon.lee committed
22
#### 2. 모델 평가 with OpenAI
kihoon.lee's avatar
kihoon.lee committed
23

kihoon.lee's avatar
kihoon.lee committed
24
```bash
kihoon.lee's avatar
update    
kihoon.lee committed
25
26
27
python evaluator.py -o generated/LDCC/Chat-Mistral-Nemo-12B-32k -m LDCC/Chat-Mistral-Nemo-12B-32k -k sk-### -t 30 -j gpt-4o

python lotte-evaluator.py -o generated/LDCC/Chat-Mistral-Nemo-12B-32k -m LDCC/Chat-Mistral-Nemo-12B-32k -k sk-### -t 30 -j gpt-4o
kihoon.lee's avatar
kihoon.lee committed
28
```
kihoon.lee's avatar
kihoon.lee committed
29
30


kihoon.lee's avatar
kihoon.lee committed
31
### 3. 결과 확인
kihoon.lee's avatar
kihoon.lee committed
32

kihoon.lee's avatar
kihoon.lee committed
33
34
35
36
```bash
python score.py -p evaluated/LDCC/Chat-Mistral-Nemo-12B-32k/default.jsonl
python score.py -p evaluated/LDCC/Chat-Mistral-Nemo-12B-32k/1-shot.jsonl
python score.py -p evaluated/LDCC/Chat-Mistral-Nemo-12B-32k/cot-1-shot.jsonl
kihoon.lee's avatar
update    
kihoon.lee committed
37
python lotte-score.py -p evaluated/LDCC/Chat-Mistral-Nemo-12B-32k/lotte_single_turn.jsonl
kihoon.lee's avatar
kihoon.lee committed
38
```
kihoon.lee's avatar
kihoon.lee committed
39

kihoon.lee's avatar
kihoon.lee committed
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
### default
| Category | Single turn | Multi turn |
|---|---|---|
| 추론(Reasoning) | 9.43 | 9.14 |
| 코딩(Coding) | 9.71 | 9.14 |
| 글쓰기(Writing) | 9.86 | 9.29 |
| 수학(Math) | 8.86 | 9.14 |
| 이해(Understanding) | 10.00 | 10.00 |
| 문법(Grammar) | 9.14 | 10.00 |

| Category | Score |
|---|---|
| Single turn | 9.50 |
| Multi turn | 9.45 |
| Overall | 9.48 |

### 1-shot
| Category | Single turn | Multi turn |
|---|---|---|
| 수학(Math) | 8.29 | 9.29 |
| 추론(Reasoning) | 9.57 | 7.43 |
| 코딩(Coding) | 9.71 | 9.00 |
| 글쓰기(Writing) | 9.71 | 9.00 |
| 이해(Understanding) | 9.43 | 10.00 |
| 문법(Grammar) | 10.00 | 10.00 |

| Category | Score |
|---|---|
| Single turn | 9.45 |
| Multi turn | 9.12 |
| Overall | 9.29 |

### cot-1-shot
| Category | Single turn | Multi turn |
|---|---|---|
| 추론(Reasoning) | 9.71 | 9.71 |
| 수학(Math) | 6.57 | 8.00 |
| 코딩(Coding) | 9.57 | 9.29 |
| 글쓰기(Writing) | 9.86 | 9.71 |
| 이해(Understanding) | 9.57 | 10.00 |
| 문법(Grammar) | 10.00 | 10.00 |

| Category | Score |
|---|---|
| Single turn | 9.21 |
| Multi turn | 9.45 |
kihoon.lee's avatar
kihoon.lee committed
86
87
| Overall | 9.33 |

kihoon.lee's avatar
update    
kihoon.lee committed
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
### lotte score
| Category | Single turn |
|---|---|
| task_assistant_mail_introduce | 9.00 |
| text2sql | 8.00 |
| task_assistant_mail_meeting | 9.00 |
| task_assistant_mail_share | 9.00 |
| search_keyword | 1.00 |
| mrc | 3.00 |
| task_assistant_mail_pr | 9.00 |
| lotte_qa | 9.00 |
| search_summary | 9.00 |
| meeting_summary | 9.00 |
| task_assistant_hire | 8.00 |
| review_summary | 8.00 |

kihoon.lee's avatar
kihoon.lee committed
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120

### 문제 예시
```json
{"id": 42, "category": "문법(Grammar)", "questions": ["나는어제친구와김치찌개를먹었다.\n\n이 문장을 올바르게 띄어 써보아라.", "아래 문장의 높임 표현을 올바르게 수정보아라.\n\n할머니가 밥을 먹는다."], "references": ["나는 어제 친구와 김치찌개를 먹었다.", "할머니께서 진지를 잡수신다."]}

{"id": 22, "category": "코딩(Coding)", "questions": ["시간 복잡도를 어떻게 구할 수 있는지 설명해주고, 많이 쓰이는 알고리즘 중에 최적화를 통해 시간 복잡도를 줄인 예시를 알려줘.", "공간 복잡도라는 용어도 있던데 뭐가 다른 거야?"], "references": [null, null]}
```

### category
각각 7개씩 존재하며, 단일턴과 멀티턴(2턴)으로 구성되어있음.
- **기존**
	- 추론(Reasoning)
	- 수학(Math)
	- 글쓰기(Writing)
	- 코딩(Coding)
	- 이해(Understanding)
	- 문법(Grammar)
kihoon.lee's avatar
update    
kihoon.lee committed
121

kihoon.lee's avatar
kihoon.lee committed
122
- **추가 예정**
kihoon.lee's avatar
update    
kihoon.lee committed
123
124
125
	- [사용자 질의를 검색용 키워드로 변환](https://ldccai.lotte.net/gitlab/wonchul_kim/koalpaca/-/blob/main/data_chat/instruct/search_keyword.json)
	- [검색된 텍스트를 정리하여 요약](https://ldccai.lotte.net/gitlab/wonchul_kim/koalpaca/-/blob/main/data_chat/instruct/search_summary.json)
	- [상품 리뷰를 정리하여 요약](https://ldccai.lotte.net/gitlab/wonchul_kim/koalpaca/-/blob/main/data_chat/instruct/review_summary.json)
kihoon.lee's avatar
kihoon.lee committed
126
127
128
129
130
131
	- [회의 요약](https://ldccai.lotte.net/gitlab/wonchul_kim/koalpaca/-/blob/main/data_chat/instruct/meeting_summary.json)
	- [업무도우미](https://ldccai.lotte.net/gitlab/wonchul_kim/koalpaca/-/blob/main/data_chat/instruct/task_assistant.json)
	- [상품 리뷰 요약](https://ldccai.lotte.net/gitlab/wonchul_kim/koalpaca/-/blob/main/data_chat/instruct/review_summary.json)
	- [text2sql](https://ldccai.lotte.net/gitlab/wonchul_kim/koalpaca/-/blob/main/data_chat/instruct/text2sql.json)
	- [sql2text](https://ldccai.lotte.net/gitlab/wonchul_kim/koalpaca/-/blob/main/data_chat/instruct/sql2answer.json)
	- [감성채팅](https://ldccai.lotte.net/gitlab/wonchul_kim/koalpaca/-/blob/main/data_chat/empathetic_dialogues_mutli_turn.json)
kihoon.lee's avatar
update    
kihoon.lee committed
132
	- [롯데 QA](https://ldccai.lotte.net/gitlab/wonchul_kim/koalpaca/-/blob/main/data_chat/lotte/%EB%A1%AF%EB%8D%B0QA_240105.json)
kihoon.lee's avatar
kihoon.lee committed
133