Crossing Linguistic Horizons

Finetuning and Comprehensive Evaluation of Vietnamese Large Language Models

Robustness-Aware Summarization Leaderboard

Models VietNews WikiLingua
R1 R2 RL SC BS Cv De Cp R1 R2 RL SC BS Cv De Cp
URA-LLaMa 70B 0.34 ± 0.00 0.15 ± 0.00 0.23 ± 0.00 -0.06 ± 0.00 -0.11 ± 0.18 0.10 ± 0.00 0.10 ± 0.00 39.63 ± 0.87 0.28 ± 0.00 0.11 ± 0.00 0.19 ± 0.00 -0.16 ± 0.00 0.25 ± 0.23 0.50 ± 0.01 0.50 ± 0.01 167.42 ± 7.09
URA-LLaMa 13B 0.35 ± 0.00 0.14 ± 0.00 0.23 ± 0.00 -0.09 ± 0.00 -0.07 ± 0.17 0.64 ± 0.00 0.65 ± 0.00 134.65 ± 3.76 0.20 ± 0.00 0.07 ± 0.00 0.13 ± 0.00 -0.17 ± 0.00 0.20 ± 0.11 0.38 ± 0.00 0.38 ± 0.00 103.69 ± 3.33
URA-LLaMa 7B 0.37 ± 0.00 0.12 ± 0.00 0.24 ± 0.00 -0.10 ± 0.00 -0.24 ± 0.18 0.65 ± 0.00 0.65 ± 0.00 17.92 ± 0.87 0.37 ± 0.00 0.12 ± 0.00 0.24 ± 0.00 -0.17 ± 0.00 0.11 ± 0.18 0.65 ± 0.00 0.65 ± 0.00 20.49 ± 0.95
LLaMa-2 13B 0.05 ± 0.00 0.01 ± 0.00 0.04 ± 0.00 -0.15 ± 0.00 -0.24 ± 0.18 0.03 ± 0.00 0.03 ± 0.00 55.91 ± 0.65 0.04 ± 0.00 0.00 ± 0.00 0.03 ± 0.00 -0.17 ± 0.00 0.09 ± 0.00 0.05 ± 0.00 0.05 ± 0.00 66.85 ± 6.72
LLaMa-2 7B 0.05 ± 0.00 0.01 ± 0.00 0.05 ± 0.00 -0.10 ± 0.00 -0.19 ± 0.04 0.07 ± 0.00 0.07 ± 0.00 55.29 ± 0.88 0.04 ± 0.00 0.00 ± 0.00 0.04 ± 0.00 -0.17 ± 0.00 0.15 ± 0.00 0.06 ± 0.00 0.06 ± 0.00 58.32 ± 3.32
Vietcuna 7B 0.03 ± 0.00 0.01 ± 0.00 0.02 ± 0.00 -0.10 ± 0.00 -0.18 ± 0.06 0.91 ± 0.00 0.91 ± 0.00 1026.61 ± 3.86 0.08 ± 0.00 0.02 ± 0.00 0.05 ± 0.00 -0.17 ± 0.00 -0.19 ± 0.05 0.78 ± 0.00 0.78 ± 0.00 505.45 ± 8.64
MixSUra 8x7B 0.41 ± - 0.19 ± - 0.26 ± - - ± - -0.03 ± - 0.86 ± - 0.87 ± - 29.15 ± - 0.46 ± - 0.21 ± - 0.28 ± - - ± - 0.26 ± - 0.88 ± - 0.98 ± - 19.10 ± -
GPT-3.5 0.34 ± 0.00 0.19 ± 0.00 0.23 ± 0.00 -0.10 ± 0.00 0.05 ± 0.14 0.81 ± 0.00 0.81 ± 0.00 128.44 ± 2.94 0.39 ± 0.00 0.19 ± 0.00 0.25 ± 0.00 -0.17 ± 0.00 0.28 ± 0.11 0.82 ± 0.00 0.82 ± 0.00 200.90 ± 7.40
GPT-4 0.39 ± 0.00 0.21 ± 0.00 0.26 ± 0.00 -0.10 ± 0.09 0.04 ± 0.00 0.83 ± 0.00 0.83 ± 0.71 24.48 ± 0.00 0.45 ± 0.00 0.20 ± 0.00 0.27 ± 0.00 -0.17 ± 0.00 0.28 ± 0.00 0.80 ± 0.03 0.81 ± 0.00 20.40 ± 1.59