Friday, April 4, 2014

Issue in implementation of read/write interface to FTP used libcurl multi interface

Libcurl example of I/O interface (fopen, read, fclose) works perfectly using libcurl 7.23.0, but may not work at least since libcurl 7.30.0 (and it still exists in the latest 7.36.0) if you use host name instead of IP in URL. Attempt to use code like in this example may do no real FTP file reading.
After calling
curl_multi_fdset(multi_handle, &fdread, &fdwrite, &fdexcep, &maxfd);
from the function fill_buffer variable maxfd gets value -1 and no reading performed.

The reason of such behaviour is in the libcurl source code. It resolve host name to IP in separate thread (function getaddrinfo_thread from asyn-thread.c). But this thread has bad synchronization with main thread, where connection is performed. As a result race condition occurs and connection performed before resolve.

But if URL contains IP address instead of host name, resolving stage will be skipped and all will work as expected.

In real world application without patching libcurl you can perform resolve by yourself, e.g.:
1. Get host name and port (if available) from URL
2. Using function getaddrinfo (has the same signature on Unix and Windows) get list of IP addresses for the host name
3. Iterate IP list recieved testing connection using original URL, where hostname replaced with corresponding IP. Use CURLOPT_CONNECT_ONLY to perform only connection without any other actions
4. Add IP you successfully connected to the some cache, e.g. map of hostname and IP, to use it later

Such behaviour successfully used in one of projects I involved.

Another approach is to avoid using multi interface, but use only easy interface of libcurl. In such case you should implement read/write interface by implementing parallel threading by yourself. Such approach I implemented in other project I was involved and it still successfully used by tens of thousands peolpe.

No comments: