Most of your text is longer than around 400 words, and your primary goal is to train a classifier on the text
The chunk and average strategy takes the entire text and breaks it into chunks of n tokens, extracts a classification token embedding for each chunk and then averages those final embeddings. The chunk size depends chosen model but is often 512 tokens or about 400 words. If the input text is less than the maximum sequence length of the model then this technique is equivalent to the truncate method.