Crossing Linguistic Horizons

Finetuning and Comprehensive Evaluation of Vietnamese Large Language Models

Fairness-Aware Language Modeling Leaderboard

Models MLQA-MLM VSEC
EM CER WER CED WED PLX EM CER WER CED WED PLX
URA-LLaMa 70B 0.01 ± 0.00 0.58 ± 0.01 0.70 ± 0.01 653.57 ± 12.05 150.64 ± 2.73 1.25 ± 0.06 0.30 ± 0.00 0.11 ± 0.00 0.14 ± 0.00 15.19 ± 0.42 4.12 ± 0.11 1.13 ± 0.00
URA-LLaMa 13B 0.02 ± 0.00 0.40 ± 0.01 0.56 ± 0.01 518.38 ± 11.19 125.24 ± 2.66 1.48 ± 0.11 0.32 ± 0.00 0.07 ± 0.00 0.21 ± 0.00 2.98 ± 0.11 1.24 ± 0.03 1.15 ± 0.00
URA-LLaMa 7B 0.01 ± 0.00 0.40 ± 0.01 0.55 ± 0.01 492.93 ± 11.32 117.82 ± 2.72 1.22 ± 0.01 0.20 ± 0.00 0.54 ± 0.01 0.67 ± 0.01 41.77 ± 1.57 10.12 ± 0.35 1.07 ± 0.00
LLaMa-2 13B 0.01 ± 0.00 0.76 ± 0.00 0.89 ± 0.00 782.03 ± 11.71 192.66 ± 2.83 1.27 ± 0.04 0.15 ± 0.00 0.07 ± 0.00 0.22 ± 0.00 3.39 ± 0.16 1.52 ± 0.04 1.01 ± 0.00
LLaMa-2 7B 0.00 ± 0.00 0.79 ± 0.00 0.96 ± 0.00 761.38 ± 10.65 197.18 ± 2.66 1.75 ± 0.20 0.12 ± 0.00 0.35 ± 0.01 0.48 ± 0.01 47.54 ± 0.85 11.82 ± 0.19 1.06 ± 0.00
Vietcuna 7B 0.00 ± 0.00 1.04 ± 0.00 1.06 ± 0.00 940.71 ± 12.48 208.05 ± 2.81 1.40 ± 0.00 0,06 ± 0.00 4.78 ± 0.06 4.80 ± 0.06 634.48 ± 8.58 145.12 ± 1.94 1.46 ± 0.01
MixSUra 8x7B 0.00 ± - 0.56 ± - 0.63 ± - 535.76 ± - 133.64 ± - 1.00 ± - 0,07 ± - 0.20 ± - 0.29 ± - 25.96 ± - 8.79 ± - 1.00 ± -
GPT-3.5 0.03 ± 0.00 0.29 ± 0.01 0.46 ± 0.01 398.19 ± 11.01 96.42 ± 2.54 - 0.59 ± 0.00 0.06 ± 0.00 0.19 ± 0.00 1.99 ± 0.08 0.74 ± 0.02 -
GPT-4 0.06 ± 0.00 0.36 ± 0.01 0.41 ± 0.01 347.82 ± 10.23 86.96 ± 2.41 - 0.67 ± 0.00 0.01 ± 0.00 0.02 ± 0.00 1.30 ± 0.04 0.54 ± 0.01 -