I believe the answer is: low interrater reliability
Interrater reliability refers to the baseline score that agreeed upon by raters when evaluating a certain consensus. The conclusion with low interrater reliabiloity tend to not be seen as valid and might cause confusion if the raters are working for the same client