cf4ocl (C Framework for OpenCL)
v2.1.0
Object-oriented framework for developing and benchmarking OpenCL projects in C/C++
|
cf4ocl user guide.
The C Framework for OpenCL, cf4ocl, is a cross-platform pure C object-oriented framework for accelerating the development of OpenCL projects in C, with support for C++.
This user guide is organized as follows. First, in section Using the library, the basics of cf4ocl are introduced, followed by a more thorough discussion on how to use wrapper and other library modules. The utilities provided with cf4ocl are presented in section Using the utilities. Advanced topics, such as the wrapper architecure, are discussed in section Advanced.
The cf4ocl library offers an object-oriented interface to the OpenCL API using wrapper classes and methods (or structs and functions, in C terms), grouped into modules of the same name, as shown in the following table:
cf4ocl module | cf4ocl wrapper class | Wrapped OpenCL type |
---|---|---|
Platform module | CCLPlatform* | cl_platform_id |
Device module | CCLDevice* | cl_device_id |
Context module | CCLContext* | cl_context |
Queue module | CCLQueue* | cl_command_queue |
Program module | CCLProgram* | cl_program |
Kernel module | CCLKernel* | cl_kernel |
Event module | CCLEvent* | cl_event |
MemObj module | CCLMemObj* | cl_mem |
Buffer module | CCLBuffer* | cl_mem |
Image module | CCLImage* | cl_mem |
Sampler module | CCLSampler* | cl_sampler |
Some of the provided methods directly wrap OpenCL functions (e.g. ccl_buffer_enqueue_copy()), while others perform a number of OpenCL operations in one function call (e.g. ccl_kernel_set_args_and_enqueue_ndrange()). The wrapper classes are organized in a hierarchical inheritance tree.
Additional modules are also available:
cf4ocl module | Description |
---|---|
Device selector module | Automatically select devices using filters. |
Device query module | Helpers for querying device information, mainly used by the ccl_devinfo program. |
Errors module | Convert OpenCL error codes into human-readable strings. |
Platforms module | Management of the OpencL platforms available in the system. |
Profiler module | Simple, convenient and thorough profiling of OpenCL events. |
The cf4ocl constructors and destructors have new
and destroy
in their name, respectively. In cf4ocl, the new/destroy rule states the following:
For each invoked constructor, the respective destructor must also be invoked.
This might seem obvious, but in many instances several objects are obtained using other (non-constructor) methods during the course of a program. These objects are automatically released and should not be destroyed by client code.
For example, it is possible to get a kernel belonging to a program using the ccl_program_get_kernel() function:
The returned kernel wrapper object will be freed when the program is destroyed; as such, there is no need to free it. Destroying the program will suffice:
The ccl_<class>_get_info_<scalar|array>()
macros can be used to get information about OpenCL objects. Use the array
version when the expected return value is a pointer or array, or the scalar
version otherwise (e.g. when the expected return value is primitive or scalar type).
For example, to get the name and the number of compute cores on a device:
The ccl_<class>_get_info()
macros serve more specific scenarios, and are likely to be used less often. These macros return a CCLWrapperInfo* object, which contains two public fields:
value
- A pointer to the requested value as returned by the OpenCL clGet*Info()
functions.size
- The size in bytes of the value pointed to by the value
field.To use the value, a cast should be performed on the value
field to convert it to the required type (which is what the ccl_<class>_get_info_<scalar|array>()
macros automatically do).
The values and objects returned by these macros are automatically released when the respective wrapper object is destroyed and should never be directly freed by client code.
Error-reporting cf4ocl functions provide two methods for client-side error handling:
The first method consists of analysing the return value of a function. Error-throwing functions which return a pointer will return NULL
if an error occurs. The remaining error-reporting functions return CL_FALSE
if an error occurs (or CL_TRUE
otherwise). Client code can check for errors by looking for NULL
or CL_FALSE
return values, depending on the function. This error handling method does not provide additional information about the reported error. For example:
The second method is more flexible. A CCLErr object is initialized to NULL
, and a pointer to it is passed as the last argument to the function being called. If the CCLErr object is still NULL
after the function call, no error has occurred. Otherwise, an error occurred and it is possible to get a user-friendly error message:
An error domain and error code are also available in the CCLErr object. The domain indicates the module or library in which the error was generated, while the code indicates the specific error that occurred. Three kinds of domain can be returned by error-reporting cf4ocl functions, each of them associated with distinct error codes:
Domain | Codes | Description |
---|---|---|
CCL_ERROR | ccl_error_code enum | Error in cf4ocl not related with external libraries |
CCL_OCL_ERROR | cl.h | Error in OpenCL function calls |
A GLib domain | GLib-module dependent | Error in GLib function call (file open/save, etc) |
For example, it is possible for client code to act on different OpenCL errors:
Finally, if client code wants to continue execution after an error was caught, it is mandatory to use the ccl_err_clear() function to free the error object and reset its value to NULL
. Not doing so is a bug, especially if more error-reporting functions are to be called moving forward. For example:
Even if the program terminates due to an error, the ccl_err_clear() function can be still be called to destroy the error object, avoiding memory leaks to be reported by tools such as Valgrind.
The internals of CCLErr-based error handling are discussed in further detail in section The GLib and OpenCL dependencies.
Each OpenCL class is associated with a cf4ocl module. At the most basic level, each module offers a wrapper class and functions which wrap or map their OpenCL equivalents. This, in itself, already simplifies working with OpenCL in C, because the cf4ocl wrapper classes internally manage related objects, which may be created during the course of a program. As such, client code just needs to follow the new/destroy rule for directly created objects, thus not having to worry with memory allocation/deallocation of intermediate objects.
In most cases, however, each cf4ocl module also provides methods for other common and not-so-common OpenCL host code patterns, allowing the programmer to avoid their verbosity and focus on OpenCL device code.
All cf4ocl wrapper classes extend the CCLWrapper* abstract wrapper class. The properties and methods of this class, which are concerned with reference counts, wrapping/unwrapping of OpenCL objects and getting object information, are essentially of internal use by other cf4ocl classes. This functionality is also available to client code which requires a more advanced integration with cf4ocl, as explained in the advanced section.
Several OpenCL objects, namely cl_platform_id
, cl_context
and cl_program
, have a direct relationship with a set of cl_device_id
objects. In order to map this relationship, cf4ocl provides the CCLDevContainer* class, which is an intermediate class between the CCLWrapper* parent class and the CCLPlatform*, CCLContext* and CCLProgram* wrappers. The CCLDevContainer* class implements functionality for managing a set of CCLDevice* wrapper instances. This functionality is exposed to client code through concrete wrapper methods. For example, the CCLContext* class provides the ccl_context_get_all_devices(), ccl_context_get_device() and ccl_context_get_num_devices() methods for this purpose.
The platform wrapper module provides functionality for simple handling of OpenCL platform objects. Platform wrappers can be obtained in two ways:
The provided functions allow to get the device wrappers associated with the given platform wrapper, as well as to obtain the supported OpenCL version of the underlying OpenCL platform object.
Information about platform objects can be fetched using the platform info macros:
However, because the platform info return type is always char*
, cf4ocl provides an additional helper macro for platform wrappers, ccl_platform_get_info_string(), which is simpler to use.
The CCLPlatform* class extends the CCLDevContainer* class; as such, it provides methods for handling a list of devices associated with the platform:
Usage example:
The device wrapper module provides functionality for simple handling of OpenCL device objects. In most cases, device wrapper objects should not be directly instanced by client code. They are usually fetched from device container objects such as CCLPlatform* or CCLContext* instances, or created as sub-devices using the ccl_device_create_subdevices() function. In either case, when the parent object is destroyed, the associated devices (or sub-devices) are also destroyed. As such, and in accordance with the cf4ocl new/destroy rule, the ccl_device_destroy() destructor function will rarely be used.
Information about device objects can be fetched using the device info macros:
Example: getting the first device in a context
The context wrapper module provides functionality for simple handling of OpenCL context objects. Context wrappers can be created using three different approaches:
Instantiation and destruction of context wrappers follows the cf4ocl new/destroy rule; as such, context wrapper objects must be released with the ccl_context_destroy() function.
Information about context objects can be fetched using the context info macros:
The CCLContext* class extends the CCLDevContainer* class; as such, it provides methods for handling a list of devices associated with the context:
Example: using all devices in a platform
Example: select device from menu
The command queue wrapper module provides functionality for simple handling of OpenCL command queue objects. Queue wrappers can be instantiated with the ccl_queue_new() and ccl_queue_new_full() constructors. While both constructors can be used with any OpenCL version, the later is targeted for OpenCL >= 2.0, exposing features such as on-device queues to client code. If OpenCL >= 2.0 features are requested for platforms which do not support them, a warning will be logged and the queue will be created without the unsupported features.
Instantiation and destruction of queue wrappers follows the cf4ocl new/destroy rule; as such, queues should be freed with the ccl_queue_destroy() destructor.
Queue wrappers created with the CL_QUEUE_PROFILING_ENABLE
property can be automatically profiled with the profiler module.
Information about queue objects can be fetched using the info macros:
Example:
The memory object wrapper module provides functionality for simple handling of generic OpenCL memory objects. All the functions in this module are direct wrappers of the respective OpenCL memory object functions, with the exception of ccl_memobj_get_opencl_version(), which returns the OpenCL version of the platform associated with the memory object.
For specific buffer and image handling, see the buffer wrapper and image wrapper modules.
Information about memory objects can be fetched using the memory object info macros:
The buffer wrapper module provides functionality for simple handling of OpenCL buffer objects. All the functions in this module are direct wrappers of the respective OpenCL buffer functions, except for the ccl_buffer_new_from_region() function. This function wraps clCreateSubBuffer() but assumes that the sub-buffer will represent a specific region in the original buffer (which is the only sub-buffer type, up to OpenCL 2.1).
Buffer wrapper objects can be directly passed as kernel arguments to functions such as ccl_kernel_set_args_and_enqueue_ndrange() or ccl_kernel_set_args_v().
Information about buffer objects can be fetched using the info macros from the memory object module:
Instantiation and destruction of buffer wrappers follows the cf4ocl new/destroy rule.
Example:
The image wrapper module provides functionality for simple handling of OpenCL image objects. All the functions in this module are direct wrappers of the respective OpenCL image functions. The ccl_image_new() constructor accepts a variable list of arguments which describe the image to be created. There is also the ccl_image_new_v() constructor, which accepts the image description arguments given in a CCLImageDesc* object. Both constructors will automatically use the old style clCreateImage2D()/clCreateImage3D() OpenCL functions if the OpenCL version of the underlying platform is less or equal than 1.1, or the new clCreateImage() constructor otherwise. Instantiation and destruction of image wrappers follows the cf4ocl new/destroy rule; as such, images should be freed with the ccl_image_destroy() destructor.
Image wrapper objects can be directly passed as kernel arguments to functions such as ccl_program_enqueue_kernel() or ccl_kernel_set_arg().
Information about image objects can be fetched using the image info macros:
If the information to be fetched is relative to the memory object parent class (e.g. CL_MEM_TYPE
or CL_MEM_FLAGS
), then the memory object module info macros should be used instead:
Example:
enqueue some image processing kernel...
The sampler wrapper module provides functionality for simple handling of OpenCL sampler objects. Sampler wrapper objects can be instanced with two constructors:
The former follows the constructor format in OpenCL 1.0-1.2, where three main sampler properties are directly given as constructor arguments, namely the normalized coordinates flag, the addressing mode and the filter mode. The later uses the OpenCL 2.0 constructor format, in which sampler properties are given in a zero-terminated array of cl_sampler_properties
. Both formats can be used without concern for the underlying platform's OpenCL version, because cf4ocl will automatically select the most adequate OpenCL constructor.
Sampler wrapper objects should be freed with the ccl_sampler_destroy() function, in accordance with the cf4ocl new/destroy rule.
Sampler wrapper objects can be directly passed as kernel arguments to functions such as ccl_kernel_set_args_and_enqueue_ndrange() or ccl_kernel_set_args().
Information about sampler objects can be fetched using the sampler info macros:
Example:
The program wrapper module provides functionality for simple handling of OpenCL program objects. OpenCL program objects can be created from source code, from binary data or from built-in kernels using the clCreateProgramWithSource(), clCreateProgramWithBinary() or clCreateProgramWithBuiltInKernels(), respectively (the later requires OpenCL >= 1.2). cf4ocl provides a set of CCLProgram* program wrapper constructors which not only map the native OpenCL program constructors, but also extend some of their functionality.
For creating programs from source code, cf4ocl provides the following constructors:
Program constructors which use binary data follow the same pattern as their source code counterparts:
The ccl_program_new_from_built_in_kernels() constructor directly wraps the native OpenCL clCreateProgramWithBuiltInKernels() function, allowing to create programs from built-in kernels. This method is only available for platforms which support OpenCL version 1.2 or higher.
Like most cf4ocl wrapper objects, program wrapper objects follow the new/destroy rule, and should be released with the ccl_program_destroy() destructor.
The ccl_program_build() and ccl_program_build_full() methods allow to build a program executable from the program source or binary. While the later directly maps the native clBuildProgram() OpenCL function, the former provides a simpler interface which will be useful in many situations.
Compilation and linking (which require OpenCL >= 1.2) are provided by the ccl_program_compile() and ccl_program_link() functions.
Information about program objects can be obtained using the program module info macros:
However, program binaries cannot be retrieved using these macros. Consequently, cf4ocl provides a specific and straightforward API for handling them:
Program build information can be obtained using a specific set of info macros:
For simple programs and kernels, the program wrapper module offers three functions, which can be used after a program is built:
NULL
-terminated variable list of parameters.NULL
-terminated array of parameters.Program wrapper objects only keep one kernel wrapper instance per kernel function; as such, for a given kernel function, these methods will always use the same kernel wrapper instance (and consequently, the same OpenCL kernel object). While this will work for single-threaded host code, it will fail if the same kernel wrapper is invoked from different threads. In such cases, use the kernel wrapper module API for handling kernel wrapper objects.
The CCLProgram* class extends the CCLDevContainer* class; as such, it provides methods for handling a list of devices associated with the program:
Example:
Kernel code:
Host code:
The kernel wrapper module provides functionality for simple handling of OpenCL kernel objects. Kernel wrappers can be obtained using two approaches:
While the first approach might be more convenient, it will not work properly if the same kernel function is to be handled and executed by different threads. In these cases, use the second approach to create distinct kernel wrapper instances (wrapping distinct OpenCL kernel objects) for the same kernel function, one for each thread.
This module offers several functions which simplify kernel execution. For example, the ccl_kernel_set_args_and_enqueue_ndrange() function can set all kernel arguments and execute the kernel in one call.
Information about kernel objects can be fetched using the kernel info macros:
Six additional macros are provided for getting kernel workgroup info and kernel argument info (the later are only available from OpenCL 1.2 onwards). These work in the same way as the regular info macros:
Example: getting a kernel wrapper from a program wrapper
Example: creating a kernel wrapper
This module defines the CCLArg* class which wraps kernel arguments. Several functions in the kernel wrapper module, such as ccl_kernel_set_args() or ccl_kernel_set_args_and_enqueue_ndrange(), accept kernel arguments as parameters. CCLBuffer*, CCLImage* and CCLSampler* objects can be directly passed as global kernel arguments to these functions. However, local and private kernel arguments need to be passed using the macros provided in this module, namely ccl_arg_local() and ccl_arg_priv(), respectively.
The ccl_arg_skip constant can be passed to methods which accept a variable list of ordered arguments in order to skip a specific argument.
Example:
Kernel code:
Host code:
The event wrapper module provides functionality for simple handling of OpenCL event objects. Typically, event wrappers will not be directly created by client code. They are returned by event producing functions (such as ccl_image_enqueue_write(), which wraps the clEnqueueWriteImage() OpenCL function). As such, and in accordance with the new/destroy rule, regular event wrappers objects should not be destroyed by client code. They are automatically released when the command queue wrapper where the event took place is destroyed. User events (OpenCL >= 1.1), created with the ccl_user_event_new() constructor, are the only exception. These are special events which allow applications to enqueue commands that wait on user-controlled occurrences before the command is executed by the device. These events should be destroyed with ccl_event_destroy().
The event wait list section provides additional information on how to use events to synchronize the execution of OpenCL commands. Events are also used by the profiler module, although indirectly via CCLQueue* wrappers, to profile and benchmark applications.
Information about event objects can be fetched using the respective info macros:
Three additional macros are provided for getting event profiling info. These work in the same way as the regular info macros:
This module provides simple management of event wait lists. Client code must initialize CCLEventWaitList variables to NULL
, and can reuse them between ccl_*_enqueue_*()
function calls. No allocation and deallocation of events and event wait lists is required if populated event wait lists are consumed by ccl_*_enqueue_*()
functions; otherwise, unused non-empty event wait lists should be freed with the ccl_event_wait_list_clear() function.
Example 1:
Example 2:
This module offers a mechanism for selecting OpenCL devices, mainly for context creation, although its functionality can be used for other purposes. The ccl_context_new_from_filters_full() context wrapper constructor (and the ccl_context_new_from_filters() macro) accepts a CCLDevSelFilters object containing a set of filters. These filters define which devices can be used in the context. Instances of CCLDevSelFilters must be initialized to NULL:
Filters can then be added to the CCLDevSelFilters object with the ccl_devsel_add_dep_filter() and ccl_devsel_add_indep_filter() functions, which add dependent or independent filters, respectively. Filters are processed in the order in which they are added. The next example shows how to add an independent and a dependent filter to the CCLDevSelFilters object:
At this stage, the most common use for the CCLDevSelFilters object is to pass it to a context wrapper constructor:
However, the CCLDevSelFilters object can also be used for explicit device selection:
The CCLDevSelDevices object represented by devices
is just a GLib pointer array, so we have direct access to the list of device wrappers and its size. For example, let us list the filtered devices by name:
CCLDevSelFilters objects are automatically freed and reset to NULL
when passed to context wrapper constructors or to the ccl_devsel_select() function.
See the complete example here.
The device selector module provides two additional helper functions which return and print a list of all OpenCL devices available in the system, respectively:
Finally, the device selector module also offers the ccl_devsel_devices_new() function, which returns a CCLDevSelDevices object containing device wrappers for all OpenCL devices in the system. This array should not be modified directly, and when no longer required, should be freed with ccl_devsel_devices_destroy().
This module facilitates the querying of OpenCL devices. It primarily supports the ccl_devinfo utility, but may also be of use to client code.
This module offers a function to convert OpenCL error codes into human-readable strings. It is widely used by other cf4ocl modules, but may also be useful to client code which directly uses OpenCL functions.
Example:
The platforms module provides functionality for managing the OpenCL platforms available in the system. The ccl_platforms_new() function returns a list of platforms available in the system. The ccl_platforms_count() can be used to get the number of platforms in the list, while the ccl_platforms_get() will return the platform.
Example:
The profiler module provides classes and methods for profiling wrapped OpenCL events and queues.
The profiling module offers two methods for obtaining information about the performed computations:
ccl_prof_*_summary()
functions.In order to use the first method, the CL_QUEUE_PROFILING_ENABLE
property should be specified when creating command queue wrappers with ccl_queue_new() or ccl_queue_new_full().
After all the computations and memory transfers take place, the utilized queue wrappers are passed to the profiler using the ccl_prof_add_queue() function. The ccl_prof_calc() function can then be called to perform the required analysis.
At this stage, different types of profiling information become available, and can be iterated over:
While this information can be subject to different types of examination by client code, the profiler module also offers some functionality which allows for a more immediate interpretation of results:
Example: Conway's game of life using double-buffered images (complete example)
The output of ccl_prof_print_summary() will be something like:
Aggregate times by event : ------------------------------------------------------------------ | Event name | Rel. time (%) | Abs. time (s) | ------------------------------------------------------------------ | NDRANGE_KERNEL | 97.2742 | 3.7468e-02 | | READ_IMAGE | 2.6747 | 1.0303e-03 | | WRITE_IMAGE | 0.0511 | 1.9690e-05 | ------------------------------------------------------------------ | Total | 3.8518e-02 | --------------------------------- Event overlaps : ------------------------------------------------------------------ | Event 1 | Event2 | Overlap (s) | ------------------------------------------------------------------ | READ_IMAGE | NDRANGE_KERNEL | 1.3618e-04 | ------------------------------------------------------------------ | Total | 1.3618e-04 | ----------------------------------------- Tot. of all events (eff.) : 3.838198e-02s Total ellapsed time : 4.295200e-02s Time spent in device : 89.36% Time spent in host : 10.64%
Instead of the default command-based event names such as NDRANGE_KERNEL
, specific names can be set with the ccl_event_set_name() function. This allows to: (a) separate the aggregation of events of the same type (e.g., differentiate between the execution of two different kernels); and, (b) aggregate events of different types (e.g., aggregate reads and writes into a single "comms" event).
The ccl_plot_events script can be used to plot a Gantt-like chart of the events which took place in the queues. Running the following command...
$ python ccl_plot_events.py prof.tsv
...will produce the following image:
cf4ocl is bundled with the following utilities:
The wrapper classes, which wrap OpenCL types, are implemented using an object-oriented (OO) approach in C. While C does not directly provide OO constructs, it is possible to implement features such as inheritance, polymorphism or encapsulation. Using this approach, cf4ocl is able to offer a clean and logical class system, while being available in a form (C) which can be directly or indirectly invoked from other programming languages.
Each cf4ocl wrapper class is defined by a source (.c) file and a header (.h) file. The former contains the private class properties and the method implementations, while the later defines its public API. The class body is implemented in the source file as a C struct
; the header file provides an opaque pointer to it, which is the public side of the class from a client code perspective. Inheritance is implemented by including a member representing the parent class struct
in the body of the child class struct
. This requires the sharing of parent class implementations. In order to keep these opaque, the respective struct
is defined in "private" header files which are not included in the public API. This way, instances of the child class can be cast to its parent type when required. The child class struct
effectively extends the parent class struct
. An example of this approach can be shown with the definitions of the abstract CCLWrapper* class and of the concrete CCLEvent* class, which extends CCLWrapper*:
In _ccl_abstract_wrapper.h
(not part of public API):
In ccl_common.h:
Methods are implemented as functions which accept the object on which they operate as the first parameter. When useful, function-like macros are also used as class methods, such as the case of the info macros. Polymorphism is not used, as the so called "abstract" methods are just functions which provide common operations to concrete methods, named differently for each concrete class. For example, the ccl_dev_container_get_device()
abstract method is called by the ccl_context_get_device(), ccl_platform_get_device() and ccl_program_get_device() concrete methods, for which it provides common functionality.
The cf4ocl class hierarchy is shown in the following inheritance diagram:
The CCLWrapper* base class is responsible for common functionality of wrapper objects, namely:
clGet*Info()
OpenCL functions)Wrapper constructors create the OpenCL object to be wrapped, but delegate memory allocation to the special ccl_<class>_new_wrap()
functions. These accept the OpenCL object, and in turn call the ccl_wrapper_new()
function, passing it not only the object, but also the size in bytes of the wrapper to be created. The ccl_wrapper_new()
function allocates memory for the wrapper (initializing this memory to zero), and keeps the OpenCL object (wrapping it) in the created wrapper instance. For example, the ccl_kernel_new() creates the cl_kernel
object with the clCreateKernel() OpenCL function, but then relies on the ccl_kernel_new_wrap() function (and thus, on ccl_wrapper_new()
) for allocation and initialization of the new CCLKernel* wrapper object memory.
The destruction of wrapper objects and respective memory deallocation is performed in a similar fashion. Each wrapper class has its own ccl_<class>_destroy()
method, but this method delegates actual object release to the "abstract" ccl_wrapper_unref()
function. This function accepts the wrapper to be destroyed, its size in bytes, and two function pointers: the first, with prototype defined by ccl_wrapper_release_fields()
, is a wrapper specific function for releasing internal wrapper objects, which the super class has no knowledge of; the second is the OpenCL object destructor function, with prototype defined by ccl_wrapper_release_cl_object()
. Continuing on the kernel example, the ccl_kernel_destroy() method delegates kernel wrapper destruction to ccl_wrapper_unref()
, passing it the kernel wrapper object, its size (i.e. sizeof(
CCLKernel )
), the "private" (static in C) ccl_kernel_release_fields()
function for destroying kernel internal objects, and the clReleaseKernel() OpenCL kernel destructor function.
As such, all cf4ocl wrapper objects use a common memory allocation and deallocation strategy, implemented in the CCLWrapper* super class.
The ccl_<class>_new_wrap()
special constructors respect the new/destroy rule. Wrappers created with their special constructor must be released with the respective ccl_<class>_destroy()
function. This allows client code to create OpenCL objects directly with OpenCL functions, and then wrap the objects to take advantage of cf4ocl functionality and features. The OpenCL object can be retrieved from its wrapper at all times with the respective ccl_<class>_unwrap
method.
If ccl_<class>_new_wrap()
functions are passed an OpenCL object which is already wrapped, a new wrapper will not be created. Instead, the existing wrapper is returned, with its reference count increased by 1. Thus, there is always a one-to-one relationship between wrapped OpenCL objects and their respective wrappers. In reality, the ccl_<class>_destroy()
functions decreases the reference count of the respective wrapper, only destroying it if the reference count reaches zero. Client code can increase and decrease the reference count of a wrapper object using the associated ccl_<class>_ref()
and ccl_<class>_unref()
macros. The ccl_<class>_ref()
macros call the ccl_wrapper_ref() function, casting the wrapper to its base class (CCLWrapper*), while the ccl_<class>_unref()
macros are just aliases for the respective ccl_<class>_destroy()
functions.
The CCLWrapper* class maintains a static hash table which associates OpenCL objects (keys) to cf4ocl wrappers (values). Access to this table is thread-safe and performed by the ccl_wrapper_new()
and ccl_wrapper_unref()
functions.
The management of OpenCL object information is also handled by the CCLWrapper* class. The ccl_wrapper_get_info() method accepts two wrapper objects, the first being the object to query; the second is an auxiliary object required by some lower-level OpenCL info functions, such clGetKernelWorkGroupInfo(), which requires a device object besides the kernel object. ccl_wrapper_get_info() also requires the specification of the type of query to perform via the CCLInfo enumeration. ccl_wrapper_get_info() returns a CCLWrapperInfo* object, which contains two public properties: the queried value and its size. To be useful, the value must be cast to the correct type. The ccl_wrapper_get_info_value() and ccl_wrapper_get_info_size() methods call ccl_wrapper_get_info(), but directly return the value and size of the CCLWrapper* object, respectively.
The requested information is kept in the information table of the respective wrapper object. When the wrapper object is destroyed, all the information objects are also released. As such, client code does not need to worry about freeing objects returned by the ccl_wrapper_get_info*()
methods. These also accept a use_cache
boolean argument, which if true, causes the methods to first search for the information in the wrappers information table, in case it has already been requested; if not, they proceed with the query as normal.
Client code will commonly use the info macros of each wrapper in order to fetch information about the underlying OpenCL object. These macros expand into the ccl_wrapper_get_info*()
methods, automatically casting objects and values to the appropriate type, selecting the correct clGet*Info()
function for the object being queried. The cache is never used by the info macros.
The intermediate CCLDevContainer* class provides functionality for managing a set of CCLDevice* wrapper instances, abstracting code common to the CCLPlatform*, CCLContext* and CCLProgram* classes, all of which internally keep a set of devices. The CCLDevContainer* class contains three "abstract" methods for accessing the associated set of CCLDevice* wrappers, namely:
ccl_dev_container_get_all_devices()
: get all CCLDevice* wrappers in device container object.ccl_dev_container_get_device()
: get CCLDevice* wrapper at given index.ccl_dev_container_get_num_devices()
: return number of devices in device container object.Concrete wrapper implementations rely on this functionality, which is exposed to client code via specific methods, e.g. in the case of CCLProgram* objects, these are ccl_program_get_all_devices(), ccl_program_get_device() and ccl_program_get_num_devices(), respectively.
The relationship between the CCLMemObj* class and the CCLBuffer* and CCLImage* classes follows that of the respective OpenCL types. In other words, both OpenCL images and buffers are memory objects with common functionality, and cf4ocl directly maps this relationship with the respective wrappers.
cf4ocl relies heavily on its two dependencies: GLib and OpenCL. In previous versions of cf4ocl no special care was taken to shield client code from these dependencies, and in some cases client applications were required to link against them during the build process. However, developers may not wish to tackle additional libraries, and keeping cf4ocl self-contained promotes a cleaner build process. As such, since version 2.1.0, client applications are not required to link against either GLib or OpenCL, except if they specifically wish to use the functionality provided by these libraries.
One feature of GLib previously exposed to cf4ocl client code was its error reporting framework, since cf4ocl uses it for internal and external error handling. From version 2.1.0 onwards, this framework is no longer directly exposed to client code. Two backward-compatible changes were implemented:
Newly written client code should use the CCLErr class and the ccl_err_clear() function in order to avoid a direct dependency on GLib.
A second aspect of GLib indirectly exposed to client code is the use of its pointer arrays as the underlying type for the CCLDevSelDevices, CCLDevSelFilters and CCLEventWaitList classes. The last two are automatically freed in typical client code usage, but there can be situations in which CCLDevSelDevices objects may have to be explicitly released. This is can be accomplished with the ccl_devsel_devices_destroy() function, which is a wrapper for GLib's g_ptr_array_free() function. Thus, client code never needs to directly or explicitly manage GLib pointer arrays.
Finally, several cf4ocl functions, such as ccl_devsel_get_device_strings(), return a NULL
-terminated array of strings. Previously, these arrays were released using the g_strfreev() GLib function. From version 2.1.0 onwards, cf4ocl provides the ccl_strv_clear() for this purpose. In practice this function is just a wrapper for g_strfreev(). Thus, this change does not break compatibility with existing client code.
cf4ocl internally uses the GLib message logging framework to log messages and warnings. cf4ocl log output is handled by GLib's default log handler, which outputs warnings and messages to stderr
. Debug messages are also shown if the G_MESSAGES_DEBUG
environment variable is set to cf4ocl2
. If client code wishes to redirect this output, it can do so by specifying another log function for the cf4ocl2
log domain with g_log_set_handler(). For example:
This requires the client application to be linked against GLib..