The term for this is reliable.
It is a common feature in studies, especially those utilizing statistical tests and p-values, that the sample size was too small, or a special population was used, and so future researchers who try to redo the experiments do not end up with the same results. However, tests that do give the same results are called reliable, and are generally given more credibility since it was not just the original researcher who could show successful results.