Wednesday, January 19, 2011

NASA Conference on Intelligent Data Understanding

In October last year I had the privilege to attend the NASA Conference on Intelligent Data Understanding (CIDU) at the Intelligent Systems Division, Ames Research Centre in Mountain View CA.

CIDU is focused on applying Data Mining, Knowledge Discovery and Machine Learning techniques to a number of NASA relevant domains, including aviation, earth sciences and astronomy. These core areas of NASA's mission all have a common problem with the exponential increase in data generated. Whether this is the increased fidelity and resolution of the next generation telescopes such as the James Webb, or high resolution satellite data for near real time earth coverage mapping.

The main focus of the conference was on algorithm development, both improvement in accuracy and computation of existing algorithms and development of specific algorithms to solve certain classes of problem. Some of the applications were, for me, quite spectacular. In particular are the attempts to carry real-time classification of astronomical events from data streaming from digital sky surveys such as SDSS.

George Djorgovski, Co-Director for Advanced Computing Research at CalTech, gave a lecture on applying a number of advanced data mining techniques and algorithms to identify events such as Supernova etc. The real impressive aspect was the shear scale of the data being processed, in the Petabytes, and the fact that the goal was to able to react to these events in real-time to direct the right telescope / observation resources to investigate and gather detailed data.

Closer to my profession (and the reason I was there), were techniques to help drive diagnostics and prognostics in the aviation domain. Honeywell gave an interesting presentation on work they're collaborating with NASA on a next generation Vehicle Level Reasoning System. GE presented their research on Prognostics and anomaly detection on Jet Engines. The key difference between the Earth and Space Science fields and System Engineering, is, in general, the Systems Engineering problems tending to require combined (fused) data driven and model (physics) based approaches, rather then just data mining on it's own.

I guess to the casual observer these computing techniques may appear esoteric and only really relevant to high end science and complex engineering systems, but I believe that the exponential growth in data in business and consumer spaces will require the application of these techniques to have any chance of being able to "make sense" of it all. This is very much IBM's current viewpoint with their Smarter Planet theme.

The really exciting part of this conference though, was the knowledge that, as esoteric at they may seem, these computational capabilities are available to everyone, Open Source, though Apache Mahout. Mahout provides a lot of the core algorithms that the researchers presenting at CIDU were improving and / or extending. With Mahout sitting on top of the Hadoop platform and being available on Amazon EC2, everyone has (reasonable) access to their very own NASA Ames Supercomputing facility!

All in all a very fruitful event. Here's hoping I get to attend next year!