Tuesday, May 27, 2014

Tracing Your OpenGL Application with Vogl Just Got Easier!

VoglEditor now has support for launching and tracing your application, directly from the UI. Previously, (and still available) you'd have to launch your application from the command line, which could add confusion on where to supply vogl or application command line arguments.
> cd vogl_build/bin  
> VOGL_CMD_LINE="--vogl_tracefile vogltrace.glxspheres64.bin" 
  LD_PRELOAD=$(readlink -f libvogltrace64.so) ./glxspheres64 
Also, it is not clear what options vogl has to control its tracing behavior; what would you run to find out those options?!? (Answer: there actually is no executable to run which will provide that help information, but the information is at the bottom of this post)

Generate Trace


The new addition to VoglEditor is available via a "Generate Trace" button on the main toolbar, and provides a clean interface for launching your application as well as setting some common command line options.





Application to trace: Path to the application that you'd like to trace.
Application arguments: Command line arguments to your application.
Output trace file: Path to the resulting trace file.
vogl options:
  • Use SteamLauncher
    •  When launching a Steam game, this is the preferred approach to enable tracing, although you'll need to sync the vogl_chroot repo to get the script. If the script is not available, the option will be disabled.
    • You should already have Steam running prior to using this option, otherwise the client may appear to hang after the application has closed.
    • The dialog asking to load the new trace file may pop-up before the application has been launched; this is due to the way that the script works. Please wait until AFTER you exit your application to load the trace file.
    • Application arguments are not currently supported by the script.
    • The "Application to trace" should be the game ID, or one of the following pre-configured names:
      • 214910 - AirConflicts
      • 400       - Portal1
      • 218060 - BitTripRunner
      • 570       - Dota2
      • 35720   - Trine2
      • 440       - TF2
      • 41070   - Sam3
      • 1500     - Darwinia
      • 550       - L4D2
      • 1500     - Darwinia2
      • 570       - Dota2Beta
      • 221810 - TheCave
      • 220200 - KerbalSpaceProgram
      • 44200   - GalconFusion
      • 201040 - GalconLegends
      • 25000   - Overgrowth
      • 211820 - Starbound
  • 64-bit
    • Enable if your application is 64-bit
  • Force debug context
    • Force the OpenGL Debug Context so that additional driver performance information can be output
  • Gather call stacks
    • If your application has been compiled with symbols, this will tell the tracer to gather a call stack at every OpenGL API call
  • Disable glProgramBinary
    • Applications which use binary programs will be able to replay on the same driver and hardware as they were traced, but may not replay correctly after updating the graphics driver, or replaying on a different platform. To solve this, the tracer can disable the GL_ARB_get_program_binary extension and will force all calls to glProgramBinary() to fail, which should cause your application to take a fallback path. These calls should replay correctly and the replayed stream will reflect the fallback path of your application.
The trace will complete when your application exits, and VoglEditor will ask you if you'd like to load the resulting trace file, allowing you to quickly dive in and identify any rendering artifacts.

Command Line Vogl Tracing Options

Although the UI only supports 3 of the vogl tracing options, a lot more are available at the command line and we plan to integrate UI support for these functionalities in the future. My apologees for not adding detailed help for these options, but I believe the names themselves should be sufficient, and the code is available if you wish to understand them more thoroughly ;-)
  • "vogl_dump_gl_full"
  • "vogl_dump_gl_calls"
  • "vogl_dump_gl_buffers"
  • "vogl_dump_gl_shaders"
  • "vogl_sleep_at_startup <duration>"
  • "vogl_pause"
  • "vogl_long_pause"
  • "vogl_quiet"
  • "vogl_debug"
  • "vogl_verbose"
  • "vogl_flush_files_after_each_call"
  • "vogl_flush_files_after_each_swap"
  • "vogl_disable_signal_interception"
  • "vogl_logfile <filename>"
  • "vogl_logfile_append"
  • "vogl_tracefile <output trace filename>"
  • "vogl_tracepath <path for generated files>"
  • "vogl_dump_png_screenshots"
  • "vogl_dump_jpeg_screenshots"
  • "vogl_jpeg_quality"
  • "vogl_screenshot_prefix <prefix for files>"
  • "vogl_hash_backbuffer"
  • "vogl_dump_backbuffer_hashes <filename>"
  • "vogl_sum_hashing"
  • "vogl_null_mode"
  • "vogl_force_debug_context"
  • "vogl_disable_client_side_array_tracing"
  • "vogl_disable_gl_program_binary"
  • "vogl_func_tracing"
  • "vogl_backtrace_all_calls"
  • "vogl_backtrace_no_calls"
  • "vogl_exit_after_x_frames <frame count>"

Thursday, May 8, 2014

Shared Contexts, Uniforms, ARB Programs, and Buffers get added to VoglEditor

After an extended holiday, a flurry of work has been completed on vogleditor, adding improved support for shared contexts, program uniforms, ARB programs, and visualizing buffer contents. In the meantime, John McDonald has been working on the Windows port, and Rich Geldreich has been adding the final bits of the GL3.3 feature set and making progress on GL4!

Shared Contexts

It turns out that a significant number of games use shared contexts so that resources can be used by multiple contexts and on different threads. Very quickly it became clear that simply supporting the 'current context' was not going to be sufficient.



A drop-down has been added above the state tabs which lists all the available contexts. As expected, the current context is the default selection, and the set of contexts which are shared by each context are indicated.



In the case above, we can see that both the current context 0x84bd954 and context 0x8a9cbd4 share context 0x8533d24. Most OpenGL objects which are created can be accessible by all three contexts; Vogl will represent this by storing these shareable objects in the shared context (0x8533d24), and this can be seen via the objects listed in the state tree (below, left). Other objects (primarily Framebuffer and Vertex Array Objects) are not shareable, and therefore are only listed the corresponding context's state tree (below, right). 

Note that although the shared objects are only listed in the state tree of the owning context, they will appear in the object explorers for all objects that can access those objects.

 

Program Uniforms

... are now visible below the shader source code. The location, name, value, and type can be seen. Currently these values are NOT editable, but it could be an easy addition if someone wants to add it in soon.
 



 

ARB Programs

A new explorer has been added to display ARB Program Objects. These programs only contain source for a single shader type, so there is no drop-down control to select a shader type. This is in contrast to a Program Object which has several shaders attached to it. Instead of uniforms, ARB Programs rely on environment and local parameters, which are always 4 component floating point vectors. The parameters which are used by the program are visible in the bottom half of the explorer. The program source is editable and can be saved and will be used when replaying the trace. Environment and local parameters are not yet editable.


Buffer Objects

Buffer objects can be used to hold arbitrary data - vertex indices, vertex attributes, or even texture data. A single buffer can hold a variety of data types, so the explorer provides a set of common types that you can use to interpret the buffer contents - 8, 16, 32, 64-bit hex, float, double, byte, unsigned byte, short, unsigned short, int, and unsigned int are among the options available.
 

Monday, March 24, 2014

VoglEditor Adds Trimming Support and Persistent Settings

After receiving a few bug reports about users not being able to load large trace files, it became very clear that VoglEditor needed support for trimming large trace files.

VoglReplay Trim Support

The functionality was already available to users via command line options to voglreplay, but getting there after making a trace file is not an obvious second step.



VoglEditor Trim Support

The command line above can now be executed interactively from within VoglEditor when a trace file is loaded. VoglEditor detects whether the trace contains "too many" frames, and will display the following prompt. Selecting 'No' will continue to load the selected trace; 'Yes' will bring up the Trim Trace dialog.



The dialog will ensure that valid frames are entered for 'trim_frame' and 'trim_len' - validating that the user selects a frame that actually exists in the loaded trace file, and that "too many" frames are not included in the trim file (as this would cause the same dialog to appear when loading the trimmed trace). A new trim filename and path can be directly typed into the dialog, or click the '...' button to open a typical FileSaveDialog.

Clicking 'OK' will confirm the file does not exist and prompt you to overwrite if it does (or cancel to return to the dialog and select a new file name). The trace file will be trimmed in another process, saved to disk, and then the user is prompted to open the newly trimmed trace.


As expected, 'Yes' will load the new trimmed file into VoglEditor; 'No' will cause the originally selected trace file to be loaded.

This same trimming functionality is also now made available on-demand using the new 'Trim Trace' toolbar button.


NOTE: Currently, the 'Trim Trace' behavior is slightly different than the 'Play Trace', in that its input is the original trace file, whereas 'Play Trace' operates on the trace data that is in memory in the editor. This means that any edits made in VoglEditor will be reflected by 'Play Trace', but NOT reflected in 'Trim Trace'.

So... what constitutes as "too many" frames? You decide! (continue below)

VoglEditor Settings File

When implementing the automatic detection of "large traces" to prompt for trimming, it became clear that different users will have different definitions of "large" - in house we often look at multi-gigabyte trace files with hundreds of frames, and other systems may only be able to support a few hundred frames, or perhaps for debugging a particular issue only a handful of frames will be needed. In order to configure the definition of "large traces", a settings file was added.

To keep it easily readable, editable, and consistent with other vogl features, the settings file is stored in JSON. To stick with Linux standards, it is stored in one of the following paths:
$XDG_CONFIG_HOME/vogleditor_settings.json
or
$HOME/.config/vogleditor/vogleditor_settings.json
In addition to the number of frames at which to consider a trace "large", the initial implementation of the settings file also supports storing the window position and size.

Here are the contents of the default settings file, which will be automatically created when you first open vogleditor, and automatically resaved when the editor is closed.
{
  "metadata" : {
      "vogleditor_settings_file_format_version" : "0x1"
   },
   "settings" : {
      "trim_large_trace_prompt_size" : 200,
      "window_position_left" : 0,
      "window_position_top" : 0,
      "window_size_width" : 1024,
      "window_size_height" : 768
   }
}
Manually editing this file should be done while VoglEditor is closed, as any manual changes will be overwritten when the editor is closed.

Thursday, March 20, 2014

VoglEditor Feature List

The past few months have been spent plugging away with Valve on a user interface for a new OpenGL debugging tool called VOGL, which we've recently open sourced on github - https://github.com/ValveSoftware/vogl

We are developing entirely in Linux, but are also assisting several members of the community who are working on ports to other platforms.

Since I've primarily been working on the user interface, called "VoglEditor", here's an initial feature set as of the time VOGL went public. New features are being added almost daily, so keep an eye out for updates, and feel free to get involved in the project!

Current VoglEditor features:
  • Loading multi-frame trace files
  • Obtaining a GL state snapshot at any API call that is not within a glBegin/glEnd block
  • Save / Load debug sessions
    • Saves a JSON file which links together the base trace file and all collected state snapshots
    • Saves all collected state snapshots to disk
  • Traces can be replayed from within vogleditor
  • CPU-based timeline 
    • Shows API call execution and cost
    • Most expensive call is shown in red and all other calls are scaled between Red -> Green based on relative execution time
  • API call hierarchy
    • Shows frames and API calls hierarchically
    • Clicking on an API call will launch the replayer and collect a state snapshot after that call executes
    • Icons indicate which API calls have a snapshot
    • Supports searching entrypoints and parameters for a supplied string, and navigating to prev / next search match
    • Supports jumping to prev / next draw call
    • Supports jumping to prev / next snapshot   
  • State snapshot panel 
    • Viewing all GL state within a snapshot
    • Automatic diff'ing of state between two snapshots
    • GL object explorers will default to displaying currently bound / active objects
    • Visualization of all existing framebuffers, renderbuffers, and textures
      • All visualizations can be scrolled and zoomed
      • Ability to view RGBA (with customizable alpha blend color), RGB, individual color components, 1-component, and 1/component
      • Support for viewing individual samples
      • Support for Y-flipping the images
    • Viewing of all created shader objects 
    • Viewing of all created program objects and their linked / attached shaders
      • Linked shaders can be edited and saved back into the snapshot so that the changes affect the trace replay 

Thursday, October 24, 2013

Writing Debug Visualizers for GDB / QtCreator 2.8

Whether you call them Debug Helpers, Debug Dumpers, Debug Visualizers, or whatever else, sometimes you want to customize the way your classes are displayed by the debugger. One of the most common reasons for needing a custom debug visualizer is when you've written a class with a dynamically allocated array. Typically, the debugger will simply see this as a pointer to a single object and will either show you the value of the pointer (an address), or the value of the object being pointed to, but you won't get to see all the elements in the array.

We ran into this exact scenario when we started using QtCreater 2.8.1 for a recent project. The available documentation (http://qt-project.org/doc/qtcreator-2.8/creator-debugging-helpers.html) provides a lot of good information, but it seems to only be part of the puzzle, and you're left to dig under the couch to find the missing pieces.

I'm going to assume that you've read through the linked documentation above so you have a general idea of what is going on, but don't worry if you don't understand it all right away (I certainly didn't). In the step-by-step guide below, I'll highlight the missing pieces and explain what I was able to figure out.

Here's an example of the default visualization:


 ... and an example of what we'd like to see...
 
...collapsed:



...and expanded:
 
...
 


QtCreator has support for debug helpers written in both C++ and Python, with some caveats to each.

C++ based helpers:
  • must be recompiled for each version of Qt, and the resulting library will be dynamically loaded into the debugged application, which can put additional stress on the application.
  • can be used with QtCreator on all platforms.

Python based helpers:
  • do not need to be recompile.
  • can be used by any python-enabled gdb, even outside of QtCreator.

We decided to use the Python based helpers because we have some engineers that love using gdb directly while others are using QtCreator, and also because we wanted to avoid having an extra library loaded into our application while debugging.

It is useful to know that QtCreator installs it's Dumper class source code to: /<home dir>/qtcreator-2.8.1/share/qtcreator/dumper/* Here you will find the C++ header and implementation files, several python files including qttypes.py which does formatting of the Qt data types, and a set of tests. All of these files are very useful references.

Step 1: Create a python file to contain the debug visualizer.

For this example, put the file in your home directory, and call it debugVisualizer.py

Step 2: Tell gdb about the new python file.

Create a ~/.gdbinit file if you don't already have one, and add the following line, replacing <home dir> with the full path to your home directory (note that it will not work if you use '~/':
python execfile('/<home dir>/debugVisualizer.py')


Step 3: Identify the members of the class you want to visualize.

I'll demonstrate here using a dynamic array of templated objects, which has a pointer to the templated type and a size of the array.

template<typename T> class DynamicArray {
   public:
      DynamicArray()
      : m_pArray(NULL),
        m_size(0)
      {}
     
      T* m_pArray;
      unsigned int m_size;
};

Step 4: Decide on how you want it to be displayed.

There are several ways we could view this data, see the earlier screenshot to see what we're going to accomplish. To help ease the introduction, I'll start off by just exposing the array size and the array pointer, then we'll add all the array elements in the final step.

Step 5: Formatting the data.

Part 1: Specify the exact string 

(this part can actually be skipped, but it is worth explaining as it can be used to debug and test visualizations later on)

As the documentation explains, the debugger plugin needs a string that provides the formatting, but it doesn't give a very good description of each of the tags.

iname: This is an internal name that just needs to be unique, it is optional and if not specified will be automatically generated by the debugger plugin. It appears to be best to simply let this be automatically generated, if I specify an iname as part of a child, the child will not appear in the debugger.
name: The variable's name, or whatever you want to show up in the 'name' column. It is optional, but should be specified for clarity.
addr: The address of the variable. This is also optional, but very useful when debugging and wanting to look at memory locations.
value: This will be the content of the value column. The debugger knows how to display basic data types, so they can be used when generating a more complex value for our DynamicArray class.
type: This can be a customized string to display in the type column. Although the documentation does not specify this as being optional, I've found that the correct type tends to be displayed automatically.
numchild: This identifies the number of children, primarily for the purposes of whether or not the node should be expandable (hence the documentation says that "zero/nonzero is sufficient"). It's probably best to make sure this number is accurate.
childnumchild: The default number of grandchildren. I'm assuming this is used to improve memory allocation, but it's marked as optional and I haven't confirmed its importance yet.
children: This node specifies the children, all of which can have any of the tags above. Getting this part of the string to be formatted correctly is one of the most difficult parts of writing a custom debug visualizer. Also, this node can be added (and should ONLY be added) if the parent item is expanded in view. This will get more discussion later on.

The following python code will display a node with two children - one for the size and one for the array pointer.

#!/usr/bin/python

def qdump__DynamicArray(d, value):
    array = value["m_pArray"]
    size = value["m_size"]
    innerType = d.templateArgument(value.type, 0)
    p = gdb.Value(array.cast(innerType.pointer()))
    d.put('value="[%d] @0x%x",' % (size, p.dereference().address) +
          'addr="%x", ' % value.address +
          'numchild="2",' +
          'children=[' +
          '{name="m_size", value="%d", numchild="0",},' % size +
          '{name="m_pArray", value="@0x%x", numchild="0",},' % p.dereference().address +
          ']')

NOTE: In the documentation where it shows what the string should look like, the quotation marks are not correct. If you use that format (with double quotes on the outside and single quotes to surround the values) the variables will not appear in the debugger at all! You must use the quotations as I have shown them in the example above. Also the starting and ending bracket '{' and '}' will be automatically added to by the debug plugin, so they do not need to be included in the string.

The first two lines define 'array' and 'size' variables equal to the array and size members of our DynamicArray class. Remember that python stores classes as a hash, so member variables are obtained using the dictionary syntax.

Using the put(..) member of the Dumper class allows us to append a string directly to the dumper. In this case we're not specifying an 'iname', 'name', or 'addr', as the debug plugin already knows that information. We are providing a value to display, the number of children, and a child node for each of the member variables.

A subtle detail that you may have noticed that we are printing the value of the 'm_pArray' variable (an address) as part of the value for the overall object. The reason for this is so that when the node is collapsed in the tree, the important information is easily visible. In order to get the correct pointer (and to use it later on for referencing the actual values), we obtain the inner type of the template argument and cast the array member to that type. We are however using the actual address of the object ('value.address') when we set the 'addr' field so that the debugger can easily navigate to the appropriate memory location.

If you make a mistake here, you may experience errors in several ways:
  • The variable may not appear at all in the visualizer - likely due to incorrect quotation marks.
  • The debug plugin may revert back to using the default visualization - likely due to python syntax errors, in which case running gdb from the command line might display useful information.
  • The visualizer may show a value of "<not accessible>" meaning that gdb could not access something that you are trying to do in the python code - most likely due to incorrectly trying to access or use the members of class being visualized.

Part 2: Put the Dumper class to work.

Now that this initial data is displaying the way I'd like, I'm going to use the Dumper class to recreate the same contents

#!/usr/bin/python

def qdump__DynamicArray(d, value):
    array = value["m_pArray"]
    size = value["m_size"]
    innerType = d.templateArgument(value.type, 0)
    p = gdb.Value(array.cast(innerType.pointer()))
    d.putValue('[%d] @0x%x' % (size, p.dereference().address))
    d.putAddress(value.address)
    d.putNumChild(2)
    with Children(d):
        d.putSubItem("m_size", size)
        with SubItem(d, "m_pArray"):
            d.putValue("@0x%x" % p.dereference().address)

You can see how the hardcoded text is now being replaced by functions of the dumper object. You might notice that I am using d.putSubItem(..) for the size, but "with SubItem(..)" for the array. The dumper object does not have an overloaded putSubItem(..) which formats the value as an address, and instead will display the object that is being pointed to (ie, the first element in the array). Since that is not what we want just yet, we have to use "with SubItem(..)" and do our own formatting of the address.

As it turns out, we will have to use "with SubItem(..)" anyway since we will want to add more children for each element in the array.

Part 3: Dynamically add array elements as children

#!/usr/bin/python

def qdump__DynamicArray(d, value):
    array = value["m_pArray"]
    size = value["m_size"]
    maxDisplayItems = 100
    innerType = d.templateArgument(value.type, 0)
    p = gdb.Value(array.cast(innerType.pointer()))
    d.putValue('[%d] @0x%x' % (size, p.dereference().address))
    d.putAddress(value.address)
    d.putNumChild(2)
    with Children(d):
        d.putSubItem("m_size", size)
        with SubItem(d, "m_pArray"):
            d.putItemCount(size)
            d.putNumChild(size)
            if d.isExpanded():
                numDisplayItems = min(maxDisplayItems, size)
                with Children(d, numChild=size,
                       maxNumChild=numDisplayItems,
                       childType=innerType,
                       addrBase=p,
                       addrStep=p.dereference().__sizeof__):
                    for i in range(0,numDisplayItems):
                        d.putSubItem(i, p.dereference())
                        p += 1

In this version, we've replaced the 'm_pArray' SubItem with a more complex sub item which has 'size' number of children. The "putItemCount(size)" line tells the debug plugin to set the value to "<size items>" (ie "<1000 items>"). Since we are displaying an array, showing the number of items in the array is appropriate (even though that same number will be visible as the value of 'm_size').

You will also notice the use of "d.isExpanded()" - this tells the debugger to only evaluate the inner text once the parent item is expanded. Since our array may hold many items, adding all the children (whether visible or not) by default could really slow down the debugger when a breakpoint is hit and all the variables in scope need to be evaluated. By moving the addition of children into the isExpanded condition, we delay that processing until the parent node is expanded. Note that once the node has been expanded and all the children are added, they will not be re-evaluated if the user continually collapses and expands the node. It will however be reevaluated if the user steps or runs the debugger.

Another trick to save processing time (and perhaps screen realestate) is to limit the number of elements shown in the list. This is achieved using the "maxDisplayItems = 100" variable and comparing that value to the size of the array to determine the number of items to display ("numDisplayItems = min(maxDisplayItems, size)"). I've added "maxDisplayItems" near the top of the function to make it easy to find and customize that value if desired. The "numDisplayItems" is then passed into "with Children(...)" to help customize how the debugger handles the long list of children. See below for more details on these parameters and the role they play:

numChild: The number of children that will be added.
maxNumChild: This is the number of children that will be added and displayed. NOTE: This does not actually control how many children will get displayed by the debugger, this number is compared to numChild, and if the maxNumChild is less than numChild, a final node will be automatically added to the children with the name "<incomplete>". This will happen even if you add all numChild SubItems to the parent.
childType: The default data type to use for each child.
addrBase: The base address to use for the array object. In our case, it is the value of the array pointer.
addrStep: The number of bytes needed to step from one element to the next. Since we are adding children for an array of elements, each element is a fixed size and so the debugger can automatically calculate the address for each child using this value.

Finally, the children are added one at a time using a for-loop. We are specifying the name of the SubItem as 'i' which will be automatically formatted with brackets around it like so: "[0]". Also, because putSubItem(...) is being used to add the item, the object being dereferenced will automatically be formatted based on its debug helper. In this manner, if our array contains elements of another custom class which has a custom debug helper, it will automatically be used by the debug plugin and formatted correctly!

All done! Have fun debugging!