back to article DARPA nails cash to project 'FENCE' — a smart camera that only sends pics when pixels change

The USA’s Defense Advanced Research Projects Agency (DARPA) has announced it will fund development of a new type of “event-based” camera that only transmits information about pixels that have changed. The Agency last week announced last week that Raytheon, BAE Systems and Northrop Grumman will develop the new snapper under the …

  1. Anonymous Coward
    Anonymous Coward

    Why does this system seem like a variation of mpeg video compression minus regular keyframes.

    As I understand mpeg it first sends a complete picture (keyframe), then for subsequent frames it only sends what has changed between frames until something (time based or amount of difference) triggers a new keyframe

    1. Neil Barnes Silver badge

      It does indeed sound like normal compressed video - but that uses key frames where the whole image is sent every few frames, with frames between them showing the difference as things change. Plus lots of other goodies as well, but that's the basis.

      This FENCE will have to resolve the same issue: if it gets one bit wrong in its transmission, everything afterwards is confused... so they'll have to send some sort of key frame to cope with inevitable disconnection issues, and key frames hold a shedload more data that difference frames. So it's going to be a juggling act, I feel.

      Perhaps they've got a perfect transmission path, guaranteed?

      1. Def Silver badge

        But video encoders only encode what you give them.

        All this camera needs to do is keep the previous raw frame in memory, and compare the current frame to that. If there are no differences (or very subtle differences), don't forward it to the video encoder. The result will be a variable frame rate (read: super low frame rate) video stream.

        With a simple way to turn this mode on or off, this would take me a few hours to write, test, and ship.

        1. Fruit and Nutcase Silver badge

          this would take me a few hours to write, test, and ship.

          Yes but that would not feed $$$ to the hungry military mega-corps.

          1. Yet Another Anonymous coward Silver badge

            This is different though, it describes the content of each transmitted pixel in a separate XML file which also lists all the security classifications

            For backwards compatibility the XML file is also in EBCDIC

        2. Cynic_999 Silver badge

          The term for what you have just described is "motion detection." You can already buy cheap cameras that have this facility, so no need to write your own. It is not however what is described in the article.

          1. Anonymous Coward
            Anonymous Coward

            "you have just described is motion detection."

            Actually it kind of is, it is the virtual counterpart... "scene detection". Set the matrix to a very low tolerance, then every time a new reference I-Frame is created... send said I-Frame :-/. You should see the savings you get with Anime content... its nuts.

            I'm not sure how the camera computes what is different, but absolutely everything this Reg article describes has been done for 25+ years digitally (who knows how long in analog). While not all analog ATEM devices have a digital version, they've certainly have a mark button for scene detection for 40+ years.

            Maybe this is one of those "$1200" dollar hammer government contracts. I wouldn't be surprised if the source code for this project is 99% FFMPEG.

            1. Mage Silver badge

              Dates from analogue era

              The analogue ones in the late 1970s didn't work well with Image Intensifiers as they are noisy, so cameras with motion detection in the dark used a pair of IR flood lamps. Actually a pair of 200W heat lamps with black filters. Then later CCD replaced tube cameras and IR LEDs replaced the filtered heat lamps.

              There must be more to this than suggested in the article. But the word "smart" on a product or project is meaningless.

            2. Def Silver badge

              I'm pretty sure the $1200 hammer contracts are a myth used to cover up funding for black projects. I.e., the hammer cost $100 and $1100 mysteriously ended up elsewhere.

              1. Anonymous Coward
                Anonymous Coward

                The "$1200 hammer" happens all the time, but to cover up mistakes. i.e. XYZ was supposed to be performed, but someone didn't do XYZ (lazy / vacation / didn't feel like it), so they bought a $1200 hammer and said... "Well, we're out of money to do XYZ".

                This.... is.... real.

        3. Anonymous Coward
          Anonymous Coward

          Event based cameras don't simply compare frames for differences - the data change is detected at the pixel level and the output can be per-pixel asynchronous - it's wrong to think in terms of frames at all.

          An advantage of this is that a system using such a camera can have very fast reactions, comparatively speaking, since it doesn't have to wait 1/60th of a second or whatever for a whole load of data (or less data, after a whole load of data has been compared, as is proposed above). Rather, any change in the image is relayed back as soon as detected. This reaction speed may be why DARPA are interested.

          1. Anonymous Coward
            Anonymous Coward

            "..since it doesn't have to wait 1/60th"

            But no modern sensor does.

            1/60, 1/120, 1/128,000 etc. is only for output/creation compliance, it's not related to the limits of the sensor directly. If only computation is desired, then the limit is at the CPU driving it or the sensor, as seen with how electronic view finders and "pixel peeking" work. *IF* these cameras detect when light changes without polling of some kind, then they'd have to be more analog than digital, which I'm not sure is the case (but that is obviously possible).

            "Event based cameras don't simply compare frames for differences - the data change is detected at the pixel level and the output can be per-pixel asynchronous"

            That's simply how any digital camera works, in fact that might describe exactly how the chemical process in film works (not sure though... has to be close to it). CCTV sensors mix it up a bit by having a co-processor due to the low light requirements (like Sony "Starvis" or whatever), but they just do the same thing in their own routine under their own direction.

            The real problem is all the questions on how this DARPA camera is different, while simultaneously being "classified".

      2. katrinab Silver badge

        Sure, but one thing you are going to see a lot of is changes in daylight/temperature, due to either a change in the time of day, or a change in the weather. They don't care about that sort of change, and don't want the camera to tell them about it.

        Then, if it starts snowing / raining, that would change the actual scene, but again it is probably not something they want to be told about.

        Or if the wind blows, and there are trees in the scene, they will move, and they probably don't want to know about that.

        Or if there are non-human animals in the vicinity, they probably don't care about what they are up to.

  2. FF22

    Differential compression....

    .. is the term DARPA doesn't seem to know, and want to reinvent, despite being like half a century old.

    I myself have written remote control software for slow modems (1200-9600 bauds) that used it and only sent those regions of the screen over the cable that have actually changed, reducing typical bandwidth usage by >95%.

    1. Anonymous Coward
      Anonymous Coward

      Re: Differential compression....

      Clearly you need millions in research cash to reinvent the wheel. Like you I also came up with the same obvious idea in the 1990s when trying to write Remote Access Tools for use over 56k modems.

      Just goes to show how bad their cameras must have been before this.

    2. Julz Silver badge

      Re: Differential compression....

      Came here to say something very similar. The compression used by Sun Microsystems Sun Ray thin clients used this approach and I guess many other implementations as you have mentioned. It seems that reinventing the wheel can be lucrative though.

      Edit. Having looked up event cameras there is a twist. They are good at spotting fast moving things which would seem to be useful in a military situation. From wikipedia;

      "Image reconstruction from events has the potential to create images and video with high dynamic range, high temporal resolution and minimal motion blur. Image reconstruction can be achieved using temporal smoothing, e.g. high-pass or complementary filter. Alternative methods include optimization and gradient estimation followed by Poisson integration."

    3. Anonymous South African Coward Silver badge

      Re: Differential compression....

      Which begs my question - are remote control software (such as UltraVNC/Remote Desktop/Teamviewer) using differential compression in order to reduce load on the link?

  3. Gene Cash Silver badge

    "Open sourced"

    > The open-sourcing excludes the program’s secure architectures.

    So like open-sourcing the Linux kernel, except you just leave out the low-level parts. What's the point, besides buzzword bingo?

  4. Pascal Monett Silver badge

    "detected by the thermal detector [..] and machine learning algorithms"

    I have a problem understanding that. Does that mean that the camera has a statistical analysis machine sitting behind it, judging what has changed and what to send ?

    Or is it that they're going to ML the thing thoroughly and put the resulting code in the camera's software ? That sounds more likely.

    Oh, and I like the video that starts with the mention that it is comparing actual "normal" camera output with a simulation of what a "neuromorphic" camera would produce (because anything high-tech these days is either quantum or neuro-something, obviously). In other words, their fancy video is just a pie-in-the-sky, we-have-no-proof PR puff piece.

  5. elsergiovolador Silver badge

    Low hanging fruit

    Seems like they picked all the low hanging fruits and now are beating around the bush. Slap some trending keywords on the proposal and let that funding flow in.

  6. Anonymous South African Coward Silver badge

    SSITH... FETT... can we start with the Star Wars jokes then?

    Although SSITH tend to remind me of Slithe from Thundercats.

  7. Michael H.F. Wilkinson

    Potential problems: trees, leaves, wind and sunshine (or clouds)

    This bears a striking similarity to a traffic camera system produced by a local IT company I visited a decade or so ago at least. The aim was to let traffic cameras only record the passing cars, and stay shtum when there was no traffic. So a simple image differencing method was implemented, and if the sum of absolute differences between consecutive frames was sufficient, the system started sending a burst of frames, until the situation became static again.

    Some traffic cameras, happily transmitted continuously from sunrise to sunset, especially on windy, sunny days, when the pattern of the shadow and light caused by the sun shining through the trees along the road caused loads of pixels to change, without any actual vehicle or person passing by the camera. Rushing clouds could trigger similar problems, as could snowflakes, hail, or rain. In the end they had to do far more advanced object recognition, in particular recognizing license plates to make the system robust (pedestrians and cyclists were not of interest in this system).

    No doubt the boffins and DARPA will have thought of this

  8. Cynic_999 Silver badge

    Why "AI" is required

    While differential video is used in MPG encoding and is nothing new, the downside of a simple pixel comparison algorithm is that it has poor compression on scenes with objects such as trees moving in the wind, clouds moving across the sky etc. A complex "AI" algorithm is required in order to differentiate between changes that are not significant and changes that are significant.

    An interesting fact is that the image that we "see" is not the real-time image that enters our eyeballs by a long chalk. The signal from our retina takes a significant fraction of a second to travel along the optic nerve to the brain, and if this were fed to the processing part of our brain directly, it would arrive too late to enable us to e.g. catch a ball. So instead the real-time video signal is passed through a portion of our brain that acts as an optical pre-processor which extrapolates what the image is likely to be in a few hundred mS time, and that is the made-up image that we think we are seeing. This organic optical processor also fills in any missing pixels (e.g. in the area the optic nerve enters the retina) based on the patterns that surround the missing area and which were seen in previous "frames", and "corrects" images that seem to be inconsistent. It also creates an imaginary image to bridge the periodic complete loss of video that occurs every time we blink. There are many optical illusions that clearly show how our brain's optical processor can be fooled.

    If the real image entering the eyeball subsequently proves to be different to the image our optical processor had extrapolated (i.e. events did not unfold as expected), then our *memory* of the incorrect image is *deleted* and the real image that was eventually received is substituted. This is why things can "suddenly appear out of nowhere."

    Just like the reflex reaction that causes us to pull away from the source of pain even before the pain registers in our brain, a reflex reaction will cause us to close our eyes if an object is heading towards our eyes even before the image of that object has reached our brain. This is because our nerves do not act only as simple wires that carry signals from our sensory organs to our brain, but nerves also have limited processing powers that can send commands to our muscles that bypass the brain completely if they detect an "emergency" situation.

    We have a *very* long way to go before technology becomes as sophisticated as the human body.

  9. Nick Ryan Silver badge

    Hmmm... I read it that the aim was to produce a camera (sensor) that only sends the changed information and not so much just the use of a normal camera sensor after which lots of processing is performed and then the data is sent. The latter would not be low power, although it would be a suitable way to prototype the algorithm.

    As noted already above, slow changes would have to be filtered out somehow which means that individual light sensors would likely have to directly communicate/be compared with their neighbours and to only send an update if a light reading had moved beyond a certain threshold.

  10. HildyJ Silver badge


    Despite the prior comments, this doesn't sound like normal compression.

    As I read it, they are looking for the camera not only to detect (and "compress" out) changes in the sequence of images, but also classify those changes as significant or insignificant and ignore the insignificant pixels. An example might be a moving leaf. One might be insignificant, a large group might be insignificant, but a man sized group might be very significant. Hence the 'neuro'.

    I have my doubts that a camera can be trained to do this and, like facial recognition, a lot will depend on the training set. But if DARPA can pull it off, it would be a major development in military hardware.

  11. Kinetic

    Not your regular compression

    Yes, small amounts of data are good, but here the main aim seems to be low power. Transmission over long distance is likely to use a lot of power, so minimising that is a good idea. However doing lots of processing also consumes power, hence create a sensor that only gives you the changes in the first place. Low light noise and vibrations probably complicates this.

    One presumes that they have a clever low power way to achieve all that.

  12. David Pearce


    The reason that MPEG works with describing block motion is that in the real world of shaky cameras and turbulence the entire image shifts a few pixels in some random direction.

    Encoding by pixel change only works if everything is VERY steady

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Biting the hand that feeds IT © 1998–2021