This mainly occurs when a person receives a call or dials a virtual number for the first time. It happens because when executing the response provided by a client's callback the media file must be downloaded from it's current location and saved in the Africa's Talking servers after which; the file is played. 

To prevent the delay from occurring you can upload the media file to our server in advance using the upload media function. Here's a tutorial on how to do this:

The file will be uploaded directly to our servers.

