# LogicKor 한국어 언어모델 다분야 사고력 벤치마크 12B 기준 80GB 1장 가능 ### 1. 인퍼런스 결과 생성 ```bash python generator.py --model LDCC/Chat-Mistral-Nemo-12B-32k --gpu_devices 2 --model_len 32000 ``` #### 2. 모델 평가 with OpenAI ```bash python evaluator.py -o generated/LDCC/Chat-Mistral-Nemo-12B-32k -k sk-### -t 30 ``` ### 3. 결과 확인 ```bash python score.py -p evaluated/LDCC/Chat-Mistral-Nemo-12B-32k/default.jsonl python score.py -p evaluated/LDCC/Chat-Mistral-Nemo-12B-32k/1-shot.jsonl python score.py -p evaluated/LDCC/Chat-Mistral-Nemo-12B-32k/cot-1-shot.jsonl ``` ### default | Category | Single turn | Multi turn | |---|---|---| | 추론(Reasoning) | 9.43 | 9.14 | | 코딩(Coding) | 9.71 | 9.14 | | 글쓰기(Writing) | 9.86 | 9.29 | | 수학(Math) | 8.86 | 9.14 | | 이해(Understanding) | 10.00 | 10.00 | | 문법(Grammar) | 9.14 | 10.00 | | Category | Score | |---|---| | Single turn | 9.50 | | Multi turn | 9.45 | | Overall | 9.48 | ### 1-shot | Category | Single turn | Multi turn | |---|---|---| | 수학(Math) | 8.29 | 9.29 | | 추론(Reasoning) | 9.57 | 7.43 | | 코딩(Coding) | 9.71 | 9.00 | | 글쓰기(Writing) | 9.71 | 9.00 | | 이해(Understanding) | 9.43 | 10.00 | | 문법(Grammar) | 10.00 | 10.00 | | Category | Score | |---|---| | Single turn | 9.45 | | Multi turn | 9.12 | | Overall | 9.29 | ### cot-1-shot | Category | Single turn | Multi turn | |---|---|---| | 추론(Reasoning) | 9.71 | 9.71 | | 수학(Math) | 6.57 | 8.00 | | 코딩(Coding) | 9.57 | 9.29 | | 글쓰기(Writing) | 9.86 | 9.71 | | 이해(Understanding) | 9.57 | 10.00 | | 문법(Grammar) | 10.00 | 10.00 | | Category | Score | |---|---| | Single turn | 9.21 | | Multi turn | 9.45 | | Overall | 9.33 |