Working with raw video data in C# with DirectShow
October 19, 2013
Too many questions arising on different forums and emails I see revolve around
one theme: how do I access actual video data in a DirectShow app? People want to
save images from their web cams, draw something in the video, analyse it, etc.
But in DirectShow direct work with raw data is encapsulated and hidden inside filters,
the building blocks out of which we build DirectShow graphs. So when dealing with DirectShow
we've got a bunch of filters on our hands and we tell them what to do but we don't
get the actual raw bytes of audio/video data, hence multiple questions, because there
isn't always a suitable filter that does exactly what you need. In some cases we're forced
to create our own filter, but that may be rather complicated and tedious. However many
cases can be solved by using one standard filter which is part of DirectShow itself:
Sample Grabber.
Here's an example of using it to implement a video effect and apply it to a stream of video
from a web camera.
This short tutorial will basically repeat my previous post but this time we'll use C#
instead of C++.
I want to make a simple app in C# that gets video stream from a USB camera, applies some
video effect (directly accessing raw video data) and displays result in a window.
The idea is simple: make a graph where video stream flows from the camera through
sample grabber to a video renderer. Each time a frame passes the sample grabber it calls my
callback where I manipulate with raw video data before it's sent down the stream for
displaying.
All I need to do is build a graph in
GraphEditPlus,
generate code for it and then tweak it a little. I open GraphEditPlus, select
"Video Capture Sources" category in the filters window and add the only available on my
laptop source to the graph.
Capture sources can usually provide video in different formats and resolutions.
DirectShow filter
representing the camera exposes IAMStreamConfig interface which allows listing available
output formats and selecting one of them, in which to provide the data.
I right click its output pin and select "IAMStreamConfig::SetFormat" to see the list of
media types and select one of them:
This selection will be reflected in the generated source code. My camera can produce uncompressed
YUY2 video and compressed MJPG, both in different resolutions. Also, media format can be
either FORMAT_VideoInfo or FORMAT_VideoInfo2, and it's important to use the first one,
otherwise Sample Grabber will not accept it. So I select YUY2 640x480 FORMAT_VideoInfo.
Then I need to add the Sample Grabber, so I just start typing its name in the filters
search box and after entering "sa" here it is. One double click and it's added to the graph.
Then I just need to connect it with the camera and then render video stream from its output pin
(right click on the pin, "Render"). Graph is built, I run it and see the video from my camera,
here's a view from my office window:
The graph is ready, now I tell GraphEditPlus to generate C# code for me, then I paste it
to a fresh C# console app project in VS 2010. The nicest thing about DirectShow in C# is that
I don't need to bother with having different SDKs and headers installed,
I only need to add a reference to DirectShowLib.
Now, the changes I need to make in the code are pretty simple. First, there is a lot of code
in BuildGraph function initializing media type for video stream format to be passed to
SetFormat, not all of those details are required, the most important ones are media
type, subtype and resolution. Some fields like dwBitRate and AvgTimePerFrame can be skipped.
Also, I don't really need to create and connect manually the rendering part of the graph,
in this case AVI Decompressor (which performs color space conversion) and Video Renderer.
Just a call to RenderStream with nulls in last two arguments is enough for graph builder to
create the rendering part automatically. At last, I need to tell Sample Grabber to call
my callback method for each video frame passing by, in my callback I will change the
video data, performing the video effect I need. Here's full graph building code after
the changes:
static void BuildGraph(IGraphBuilder pGraph)
{
int hr = 0;
//graph builder
ICaptureGraphBuilder2 pBuilder = (ICaptureGraphBuilder2)new CaptureGraphBuilder2();
hr = pBuilder.SetFiltergraph(pGraph);
checkHR(hr, "Can't SetFiltergraph");
Guid CLSID_VideoCaptureSources = new Guid("{860BB310-5D01-11D0-BD3B-00A0C911CE86}"); //
Guid CLSID_SampleGrabber = new Guid("{C1F400A0-3F08-11D3-9F0B-006008039E37}"); //qedit.dll
//add USB2.0 Camera
IBaseFilter pUSB20Camera = CreateFilterByName(@"USB2.0 Camera", CLSID_VideoCaptureSources);
hr = pGraph.AddFilter(pUSB20Camera, "USB2.0 Camera");
checkHR(hr, "Can't add USB2.0 Camera to graph");
//add SampleGrabber
IBaseFilter pSampleGrabber = (IBaseFilter)Activator.CreateInstance(Type.GetTypeFromCLSID(CLSID_SampleGrabber));
hr = pGraph.AddFilter(pSampleGrabber, "SampleGrabber");
checkHR(hr, "Can't add SampleGrabber to graph");
//set callback
hr = ((ISampleGrabber)pSampleGrabber).SetCallback(new SampleGrabberCallback(), 0);
checkHR(hr, "Can't set callback.");
AMMediaType pmt = new AMMediaType();
pmt.majorType = MediaType.Video;
pmt.subType = MediaSubType.YUY2;
pmt.formatType = FormatType.VideoInfo;
pmt.fixedSizeSamples = true;
pmt.formatSize = 88;
pmt.sampleSize = 614400;
pmt.temporalCompression = false;
VideoInfoHeader format = new VideoInfoHeader();
format.SrcRect = new DsRect();
format.TargetRect = new DsRect();
format.BmiHeader = new BitmapInfoHeader();
format.BmiHeader.Size = 40;
format.BmiHeader.Width = 640;
format.BmiHeader.Height = 480;
format.BmiHeader.Planes = 1;
format.BmiHeader.BitCount = 16;
format.BmiHeader.Compression = 844715353;
format.BmiHeader.ImageSize = 614400;
pmt.formatPtr = Marshal.AllocCoTaskMem(Marshal.SizeOf(format));
Marshal.StructureToPtr(format, pmt.formatPtr, false);
hr = ((IAMStreamConfig)GetPin(pUSB20Camera, "Capture")).SetFormat(pmt);
DsUtils.FreeAMMediaType(pmt);
checkHR(hr, "Can't set format");
//connect USB2.0 Camera and SampleGrabber
hr = pGraph.ConnectDirect(GetPin(pUSB20Camera, "Capture"), GetPin(pSampleGrabber, "Input"), null);
checkHR(hr, "Can't connect USB2.0 Camera and SampleGrabber");
//render the video
hr = pBuilder.RenderStream(null, null, pSampleGrabber, null, null);
checkHR(hr, "Can't render video from grabber");
}
I call SetCallback on sample grabber and provide two things: an object of a class
implementing
ISampleGrabberCB
interface and 0 which tells sample grabber which method
of ISampleGrabberCB to call. So of two methods of that interface, BufferCB and SampleCB,
one of them will never be called, and the other, SampleCB, is the place where all the magic
happens. Each time a video frame passes through the sample grabber it will call my SampleCB
method and provide the video sample as IMediaSample value, which I can query for data length
and a pointer to actual data. So here's full code of my callback object class:
class SampleGrabberCallback : ISampleGrabberCB
{
public SampleGrabberCallback()
{
}
public int BufferCB(double SampleTime, IntPtr pBuffer, int BufferLen)
{
return 0;
}
public int SampleCB(double SampleTime, IMediaSample pSample)
{
if (pSample == null) return -1;
int len = pSample.GetActualDataLength();
IntPtr pbuf;
if (pSample.GetPointer(out pbuf) == 0 && len > 0)
{
byte[] buf = new byte[len];
Marshal.Copy(pbuf, buf, 0, len);
for (int i = 0; i < len; i += 2)
buf[i] = (byte)(255 - buf[i]);
Marshal.Copy(buf, 0, pbuf, len);
}
Marshal.ReleaseComObject(pSample);
return 0;
}
}
C# will take care of all COM stuff, except one thing: actual video data is not in the
managed heap, we cannot access it directly without going unsafe. So we either use unsafe
block and call IntPtr.toPointer to work with the video data directly, or we use
Marshal.Copy to copy the data to our array in the managed heap, modify the data in this
array and then copy it back. This approach doesn't use require an unsafe block, and this
is what I did here.
My video effect is simple: I want to invert each pixel's intensity without changing its
color. Since the data in my case arrives in YUY2 format I know that every other byte
denotes some pixel's intensity, so I just subtract it from 255 to get it inverted. Chroma
bytes remain intact, keeping the colors.
This is it, I compile and run the program to see it work as expected:
Rest of the code (creating the filters and the main loop) was generated by GraphEditPlus
and remained without changes. Whole development took just a few minutes, describing the
solution in this post requires a lot more time than producing the solution itself.
tags: directshow
|