Comparative Analysis of Compact Language Models for Low-Resource NLP: A Study on DistilBERT, TinyBERT, and MobileBERT

Nima Garshasebi

DOI: 10.5281/zenodo. 15907007

Abstract:

Compact language models such as DistilBERT, TinyBERT, and MobileBERT are designed for resource-constrained devices like mobile phones. DistilBERT, with 66 million parameters, achieves a GLUE score of 77 and occupies 207 MB of memory on mobile devices, but shows reduced performance on SQuAD (F1 79.8). TinyBERT-4, with only 14.5 million parameters, matches DistilBERT’s GLUE score of 77, is 9.4 times faster than BERT-Base, and uses approximately 55 MB of memory (estimated), though it struggles with low-data tasks like CoLA. TinyBERT-6, with 67 million parameters, reaches a GLUE score of 79.4. MobileBERT, with 25.3 million parameters, scores 77.7 on GLUE, excels on SQuAD (F1 90.3), and has a latency of 62 ms on a Pixel 4. MobileBERT-TINY, with 15.1 million parameters, is faster (40 ms) but scores lower at 75.8 on GLUE. This paper compares these models in terms of accuracy, inference speed, and memory usage, demonstrating that MobileBERT is better suited for complex tasks like question answering, while TinyBERT-4 is optimal for ultra-light applications.

Keywords:

Compact Models, Low-Resource NLP, MobileBERT, TinyBERT, Inference Speed