Answer:
I believe it is the 4th one
Explanation: A, while you could get the the structure of the speech somehow through listening, you won't get it directly from the tone.
B, that again you can get from listening, but not from the tone.
C, unless they specifically mention their location, there isn't a solid way to tell where they are from just listening.
This leaves the fourth choice.
Also if you search up the definition of Tone, one of the answers you get is