In most cases, one uses tests whose size is equal to the significance level.
i would say by collecting data
9.5m questions
12.2m answers