Zum Inhalt springen

Understanding Transformers via N-gram Statistics