blog tags:


I'm Dmitry Popov,
lead developer and director of Infognition.

Known in the interwebs as Dee Mon since 1997. You could see me as thedeemon on reddit or LiveJournal.

Articles Technology Blog News Forum Company
Accessing raw video data in DirectShow
October 18, 2013

In DirectShow one makes multimedia apps by building graphs where nodes (called filters) process the data (capture or read, convert, compress, write etc.) and graph's edges are streams of multimedia samples (video frames, chunks of audio, etc.). All the work with raw data takes place inside the filters, and the host application usually doesn't deal with video/audio data directly, it only arranges the filters in a graph telling them what to do. This is fine for some typical tasks when you've got all necessary filters, but sometimes there is no ready made filter for your task and then you need to create your own filter which may be rather tedious and complicated. However there are many cases when you only want to peek at the data, analyse or save some part of it, you don't want a transform that changes media type, so making your own filter is an overkill. There is a nice standard filter in DirectShow which can give you access to raw video data (let's stick with just video for now) without creating your own filters, it's called Sample Grabber. MSDN says it's deprecated, but this filter is still available in all versions of Windows including Windows 8. Even if it goes away some time in the future, recreating it will be very easy, so your program will not need to change.

In this little tutorial I'll show how to make a small DirectShow app in C++ which takes video stream from USB camera, applies a simple video effect (accessing raw video data) and shows it on screen. The idea is simple: make a graph where video stream flows from the camera through sample grabber to a video renderer. Each time a frame passes the sample grabber it calls my callback where I manipulate with raw video data before it's sent down the stream for displaying. With GraphEditPlus making such application is a matter of minutes. First I need to create my graph in GraphEditPlus and generate code for it. I start by selecting "Video Capture Sources" category in the filters window. There is only one capture source on my laptop - "USB2.0 Camera", so I take it to the graph with a double click.

The camera can provide video in different formats and resolutions. DirectShow filter representing the camera exposes IAMStreamConfig interface which allows listing available output formats and selecting one of them, in which to provide the data. I right click its output pin and select "IAMStreamConfig::SetFormat" to see the list of media types and select one of them:

This selection will be reflected in the generated source code. My camera can produce uncompressed YUY2 video and compressed MJPG, both in different resolutions. Also, media format can be either FORMAT_VideoInfo or FORMAT_VideoInfo2, and it's important to use the first one, otherwise Sample Grabber will not accept it. So I select YUY2 640x480 FORMAT_VideoInfo.

Then I need to add the Sample Grabber, so I just start typing its name in the filters search box and after entering "sa" here it is. Double click, added to the graph.

Then I just need to connect it with the camera and then render video stream from its output pin (right click on the pin, "Render"). Graph is ready, I run it and see the video from my camera, here's a view from my office window:

I tell GraphEditPlus to generate source code and open Visual Studio 2010. Two minutes from the start and I already have a ready DirectShow app. Well, almost. In the good old times we had VC6, DirextX 9 SDK and life was simple. And in C# world life is still simple. But in C++ now you need some Windows SDK to use DirectShow, and in recent versions of this SDK Microsoft got more serious about deprecating some parts of it and there is no Qedit.h file which describes Sample Grabber (although qedit.h should really be about DirectShow Editing Services, the engine of Movie Maker). The last version of Windows SDK containing qedit.h is version 6.0 (SDK for Vista and .NET 3) but even there it is useless because it refers to "dxtrans.h" which is missing. There is even a pragma message in qedit.h saying "To compile qedit.h you must install the DirectX 9 SDK, to obtain the dxtrans.h header." And although it's not mentioned, not every version of DX9 SDK will help. But luckily you don't really need them. If you do have SDK with qedit.h then you can comment out reference to dxtrans.h there and include it this way in your source file:

#define __IDxtCompositor_INTERFACE_DEFINED__
#define __IDxtAlphaSetter_INTERFACE_DEFINED__
#define __IDxtJpeg_INTERFACE_DEFINED__
#include <Qedit.h>

And if you don't have qedit.h in your SDK then you can just use this file: SampleGrabber.h. It's an excerpt from original headers describing just the Sample Grabber and nothing else.

So, I create a Win32 console application, paste the source code generated by GraphEditPlus, add references to Windows SDK Include directory, Lib directory and two lib files (strmiids.lib and quartz.lib, parts of DirectShow). In the BuildGraph function I see a media format description structure is rigorously populated to be passed to IAMStreamConfig::SetFormat, but some fields like dwBitRate and AvgTimePerFrame are not really necessary, so they can be skipped. What's really important is media major type, subtype (MEDIASUBTYPE_YUY2 in my case) and resolution.

The video rendering part is presented in the generated code in all detail, but it's also not necessary to create and connect those filters by hand, simple RenderStream with NULL in last argument is enough to render video stream from Sample Grabber on screen. So the graph building code after all adjustments looks like this:

#include "SampleGrabber.h"

// {C1F400A0-3F08-11D3-9F0B-006008039E37}
0xC1F400A0, 0x3F08, 0x11D3, 0x9F, 0x0B, 0x00, 0x60, 0x08, 0x03, 0x9E, 0x37); //qedit.dll

HRESULT BuildGraph(IGraphBuilder *pGraph)
    HRESULT hr = S_OK;

    //graph builder
    CComPtr<ICaptureGraphBuilder2> pBuilder;
    hr = pBuilder.CoCreateInstance(CLSID_CaptureGraphBuilder2);
    CHECK_HR(hr, _T("Can't create Capture Graph Builder"));
    hr = pBuilder->SetFiltergraph(pGraph);
    CHECK_HR(hr, _T("Can't SetFiltergraph"));

    //add USB2.0 Camera
    /*CComPtr<IBaseFilter> pUSB20Camera = CreateFilterByName(L"USB2.0 Camera", CLSID_VideoCaptureSources);
    hr = pGraph->AddFilter(pUSB20Camera, L"USB2.0 Camera");
    CHECK_HR(hr, _T("Can't add USB2.0 Camera to graph"));*/
    AM_MEDIA_TYPE pmt;
    ZeroMemory(&pmt, sizeof(AM_MEDIA_TYPE));
    pmt.majortype = MEDIATYPE_Video;
    pmt.subtype = MEDIASUBTYPE_YUY2;
    pmt.formattype = FORMAT_VideoInfo;
    pmt.bFixedSizeSamples = TRUE;
    pmt.cbFormat = 88;
    pmt.lSampleSize = 614400;
    pmt.bTemporalCompression = FALSE;
    ZeroMemory(&format, sizeof(VIDEOINFOHEADER));
    format.bmiHeader.biSize = 40;
    format.bmiHeader.biWidth = 640;
    format.bmiHeader.biHeight = 480;
    format.bmiHeader.biPlanes = 1;
    format.bmiHeader.biBitCount = 16;
    format.bmiHeader.biCompression = 844715353; 
    format.bmiHeader.biSizeImage = 614400;
    pmt.pbFormat = (BYTE*)&format;
    CComQIPtr<IAMStreamConfig, &IID_IAMStreamConfig> isc(GetPin(pUSB20Camera, L"Capture"));
    hr = isc->SetFormat(&pmt);
    CHECK_HR(hr, _T("Can't set format"));    
    //add SampleGrabber
    CComPtr<IBaseFilter> pSampleGrabber;
    hr = pSampleGrabber.CoCreateInstance(CLSID_SampleGrabber);
    CHECK_HR(hr, _T("Can't create SampleGrabber"));
    hr = pGraph->AddFilter(pSampleGrabber, L"SampleGrabber");
    CHECK_HR(hr, _T("Can't add SampleGrabber to graph"));
    CComQIPtr<ISampleGrabber, &IID_ISampleGrabber> pSampleGrabber_isg(pSampleGrabber);

    //here we provide our callback:
    hr = pSampleGrabber_isg->SetCallback(new CallbackObject(), 0);
    CHECK_HR(hr, _T("Can't set callback"));

    //connect USB2.0 Camera and SampleGrabber
    hr = pBuilder->RenderStream(NULL, NULL, pUSB20Camera, NULL, pSampleGrabber);
    CHECK_HR(hr, _T("Can't render stream to SampleGrabber"));

    //render the video in a window
    hr = pBuilder->RenderStream(NULL, NULL, pSampleGrabber, NULL, NULL);
    CHECK_HR(hr, _T("Can't render stream from SampleGrabber"));
    return S_OK;

The crucial part is where I call ISampleGrabber::SetCallback. The first argument is the callback object implementing ISampleGrabberCB interface and the second, 0, says which callback method to call. There are two methods in ISampleGrabberCB, SampleCB and BufferCB, we're really interested in SampleCB as it's the one that gets called on each sample. I'm going to leak a few bytes of memory by non deleting the callback object, but it's ok in my case as it's only going to be created once.

So we need an implementation of ISampleGrabberCB to be given to Sample Grabber. Here is one, very simple:

class CallbackObject : public ISampleGrabberCB {

    CallbackObject() {};

    STDMETHODIMP QueryInterface(REFIID riid, void **ppv) 
        if (NULL == ppv) return E_POINTER;
        if (riid == __uuidof(IUnknown)) {
            *ppv = static_cast<IUnknown*>(this);
             return S_OK;
        if (riid == __uuidof(ISampleGrabberCB))   {
            *ppv = static_cast<ISampleGrabberCB*>(this);
             return S_OK;
        return E_NOINTERFACE;
    STDMETHODIMP_(ULONG) AddRef() {    return S_OK;  }
    STDMETHODIMP_(ULONG) Release() {   return S_OK;  }

    STDMETHODIMP SampleCB(double SampleTime, IMediaSample *pSample);
    STDMETHODIMP BufferCB(double SampleTime, BYTE *pBuffer, long BufferLen) { return S_OK; }

STDMETHODIMP CallbackObject::SampleCB(double SampleTime, IMediaSample *pSample)
    if (!pSample)
        return E_POINTER;
    long sz = pSample->GetActualDataLength();
    BYTE *pBuf = NULL;
    if (sz <= 0 || pBuf==NULL) return E_UNEXPECTED;
    for(int i=0;i<sz;i+=2)
        pBuf[i] = 255 - pBuf[i];
    return S_OK;

It's a COM object, hence the obligatory QueryInterface, AddRef and Release, the latter two implemented lousy, completely ignoring the COM reference counting thingy. Then go two ISampleGrabberCB methods, one doing nothing, as it will never be called, and the other is the most important one, this is where the magic happens. Each time a video frame is produced by the camera it is passed to sample grabber and it calls this method providing an IMediaSample pointer through which we can request amount of data in this sample and a pointer to the data itself. The data is mutable, so if we don't change video resolution and format, we can just mutate contents of this buffer and this is what the next filter down the chain will receive. In this case I want to invert each pixel's intensity (luma) while keeping color (chroma) intact. Since the video comes in YUY2 format this is pretty easy: every other byte in the buffer is some pixel's luma, so I just subtract it from 255. And this is it, no more code is required (the rest, i.e. main loop as well as filter creation and pin search routines are all generated by GraphEditPlus). Here's what I see after I compile and run the program:

Instead of showing video on screen you can direct the stream to some muxer and file writer to record it to disk. Or just use Null Renderer if you don't need the data after it leaves Sample Grabber. For example, your callback can just save each frame to a bitmap file or send over network. Anyway something needs to be connected to Sample Grabber's output pin, so use the Null Renderer if you don't have anything meaningful to connect.

That's all, next time I'll show how to do the same in C#.