blog tags:

About:

I'm Dmitry Popov,
lead developer and director of Infognition.

Known in the interwebs as Dee Mon since 1997. You could see me as thedeemon on reddit or LiveJournal.

RSS
Articles Technology Blog News Company
Blog
Working with raw video data in C# with DirectShow
October 19, 2013

Too many questions arising on different forums and emails I see revolve around one theme: how do I access actual video data in a DirectShow app? People want to save images from their web cams, draw something in the video, analyse it, etc. But in DirectShow direct work with raw data is encapsulated and hidden inside filters, the building blocks out of which we build DirectShow graphs. So when dealing with DirectShow we've got a bunch of filters on our hands and we tell them what to do but we don't get the actual raw bytes of audio/video data, hence multiple questions, because there isn't always a suitable filter that does exactly what you need. In some cases we're forced to create our own filter, but that may be rather complicated and tedious. However many cases can be solved by using one standard filter which is part of DirectShow itself: Sample Grabber. Here's an example of using it to implement a video effect and apply it to a stream of video from a web camera.

This short tutorial will basically repeat my previous post but this time we'll use C# instead of C++. I want to make a simple app in C# that gets video stream from a USB camera, applies some video effect (directly accessing raw video data) and displays result in a window. The idea is simple: make a graph where video stream flows from the camera through sample grabber to a video renderer. Each time a frame passes the sample grabber it calls my callback where I manipulate with raw video data before it's sent down the stream for displaying. All I need to do is build a graph in GraphEditPlus, generate code for it and then tweak it a little. I open GraphEditPlus, select "Video Capture Sources" category in the filters window and add the only available on my laptop source to the graph.

Capture sources can usually provide video in different formats and resolutions. DirectShow filter representing the camera exposes IAMStreamConfig interface which allows listing available output formats and selecting one of them, in which to provide the data. I right click its output pin and select "IAMStreamConfig::SetFormat" to see the list of media types and select one of them:

This selection will be reflected in the generated source code. My camera can produce uncompressed YUY2 video and compressed MJPG, both in different resolutions. Also, media format can be either FORMAT_VideoInfo or FORMAT_VideoInfo2, and it's important to use the first one, otherwise Sample Grabber will not accept it. So I select YUY2 640x480 FORMAT_VideoInfo.

Then I need to add the Sample Grabber, so I just start typing its name in the filters search box and after entering "sa" here it is. One double click and it's added to the graph.

Then I just need to connect it with the camera and then render video stream from its output pin (right click on the pin, "Render"). Graph is built, I run it and see the video from my camera, here's a view from my office window:

The graph is ready, now I tell GraphEditPlus to generate C# code for me, then I paste it to a fresh C# console app project in VS 2010. The nicest thing about DirectShow in C# is that I don't need to bother with having different SDKs and headers installed, I only need to add a reference to DirectShowLib.

Now, the changes I need to make in the code are pretty simple. First, there is a lot of code in BuildGraph function initializing media type for video stream format to be passed to SetFormat, not all of those details are required, the most important ones are media type, subtype and resolution. Some fields like dwBitRate and AvgTimePerFrame can be skipped. Also, I don't really need to create and connect manually the rendering part of the graph, in this case AVI Decompressor (which performs color space conversion) and Video Renderer. Just a call to RenderStream with nulls in last two arguments is enough for graph builder to create the rendering part automatically. At last, I need to tell Sample Grabber to call my callback method for each video frame passing by, in my callback I will change the video data, performing the video effect I need. Here's full graph building code after the changes:

static void BuildGraph(IGraphBuilder pGraph)
{
    int hr = 0;
    //graph builder
    ICaptureGraphBuilder2 pBuilder = (ICaptureGraphBuilder2)new CaptureGraphBuilder2();
    hr = pBuilder.SetFiltergraph(pGraph);
    checkHR(hr, "Can't SetFiltergraph");

    Guid CLSID_VideoCaptureSources = new Guid("{860BB310-5D01-11D0-BD3B-00A0C911CE86}"); //
    Guid CLSID_SampleGrabber = new Guid("{C1F400A0-3F08-11D3-9F0B-006008039E37}"); //qedit.dll

    //add USB2.0 Camera
    IBaseFilter pUSB20Camera = CreateFilterByName(@"USB2.0 Camera", CLSID_VideoCaptureSources);
    hr = pGraph.AddFilter(pUSB20Camera, "USB2.0 Camera");
    checkHR(hr, "Can't add USB2.0 Camera to graph");

    //add SampleGrabber
    IBaseFilter pSampleGrabber = (IBaseFilter)Activator.CreateInstance(Type.GetTypeFromCLSID(CLSID_SampleGrabber));
    hr = pGraph.AddFilter(pSampleGrabber, "SampleGrabber");
    checkHR(hr, "Can't add SampleGrabber to graph");
    //set callback
    hr = ((ISampleGrabber)pSampleGrabber).SetCallback(new SampleGrabberCallback(), 0);
    checkHR(hr, "Can't set callback.");

    AMMediaType pmt = new AMMediaType();
    pmt.majorType = MediaType.Video;
    pmt.subType = MediaSubType.YUY2;
    pmt.formatType = FormatType.VideoInfo;
    pmt.fixedSizeSamples = true;
    pmt.formatSize = 88;
    pmt.sampleSize = 614400;
    pmt.temporalCompression = false;
    VideoInfoHeader format = new VideoInfoHeader();
    format.SrcRect = new DsRect();
    format.TargetRect = new DsRect();
    format.BmiHeader = new BitmapInfoHeader();
    format.BmiHeader.Size = 40;
    format.BmiHeader.Width = 640;
    format.BmiHeader.Height = 480;
    format.BmiHeader.Planes = 1;
    format.BmiHeader.BitCount = 16;
    format.BmiHeader.Compression = 844715353;
    format.BmiHeader.ImageSize = 614400;
    pmt.formatPtr = Marshal.AllocCoTaskMem(Marshal.SizeOf(format));
    Marshal.StructureToPtr(format, pmt.formatPtr, false);
    hr = ((IAMStreamConfig)GetPin(pUSB20Camera, "Capture")).SetFormat(pmt);
    DsUtils.FreeAMMediaType(pmt);
    checkHR(hr, "Can't set format");

    //connect USB2.0 Camera and SampleGrabber
    hr = pGraph.ConnectDirect(GetPin(pUSB20Camera, "Capture"), GetPin(pSampleGrabber, "Input"), null);
    checkHR(hr, "Can't connect USB2.0 Camera and SampleGrabber");

    //render the video
    hr = pBuilder.RenderStream(null, null, pSampleGrabber, null, null);
    checkHR(hr, "Can't render video from grabber");
}

I call SetCallback on sample grabber and provide two things: an object of a class implementing ISampleGrabberCB interface and 0 which tells sample grabber which method of ISampleGrabberCB to call. So of two methods of that interface, BufferCB and SampleCB, one of them will never be called, and the other, SampleCB, is the place where all the magic happens. Each time a video frame passes through the sample grabber it will call my SampleCB method and provide the video sample as IMediaSample value, which I can query for data length and a pointer to actual data. So here's full code of my callback object class:

class SampleGrabberCallback : ISampleGrabberCB
{
    public SampleGrabberCallback()
    {
    }

    public int BufferCB(double SampleTime, IntPtr pBuffer, int BufferLen)
    {
        return 0;
    }

    public int SampleCB(double SampleTime, IMediaSample pSample)
    {
        if (pSample == null) return -1;
        int len = pSample.GetActualDataLength();
        IntPtr pbuf;
        if (pSample.GetPointer(out pbuf) == 0 && len > 0)
        {
            byte[] buf = new byte[len];
            Marshal.Copy(pbuf, buf, 0, len);
            for (int i = 0; i < len; i += 2)
                buf[i] = (byte)(255 - buf[i]);
            Marshal.Copy(buf, 0, pbuf, len);
        }
        Marshal.ReleaseComObject(pSample);
        return 0;
    }
}

C# will take care of all COM stuff, except one thing: actual video data is not in the managed heap, we cannot access it directly without going unsafe. So we either use unsafe block and call IntPtr.toPointer to work with the video data directly, or we use Marshal.Copy to copy the data to our array in the managed heap, modify the data in this array and then copy it back. This approach doesn't use require an unsafe block, and this is what I did here.

My video effect is simple: I want to invert each pixel's intensity without changing its color. Since the data in my case arrives in YUY2 format I know that every other byte denotes some pixel's intensity, so I just subtract it from 255 to get it inverted. Chroma bytes remain intact, keeping the colors.

This is it, I compile and run the program to see it work as expected:

Rest of the code (creating the filters and the main loop) was generated by GraphEditPlus and remained without changes. Whole development took just a few minutes, describing the solution in this post requires a lot more time than producing the solution itself.