Big Data has been getting lots of attention recently, and for good reason – the capability to store and analyze large data sets presents opportunities we have never had before.
I have used ad hoc tools to analyze “Medium Data” for instruction set optimization, company-wide debug tool usage, and random fails in a compute farm. It is easy to see the potential for larger data sets and more powerful analytic tools.
In the RTL verification space, the largest data sets are simulation results and coverage, especially at the SoC level. But having a lot of data does not automatically solve problems – the data has to be structured appropriately and contain enough detail to enable analysis.
Much of the simulation and coverage data today is scattered throughout different files, in different formats, and important relationships in the data aren’t always included. When the data is brought together in a common data base, standard data mining approaches may not be enough – we will likely need to incorporate specific domain knowledge.
In a previous post, A Simple Approach to System Coverage, I made the case that a relatively small number of important coverage points identified at the block-level were the basis of the most sensible form of system-level coverage. This “Small Data” could turn out to be an important part of the domain knowledge needed to analyze system-level Big Data.
University research groups such as Dr. Li-C Wang’s team at UCSB have made some progress applying data mining and machine learning to simulations, identifying stimulus patterns leading to targeted events for specific designs (described in this paper from 2014 DAC).
In general, this appears to be a hard problem, requiring domain knowledge to be incorporated into a kernel to guide the learning algorithms. From Dr. Wang’s paper:
Our experiences show that the challenges in practical implementation are often related to the kernel or feature development, while choosing an existing learning algorithm to apply is relatively easy.
As the use of Big Data for verification evolves from research to practice, today’s system-level coverage “Small Data” may prove to be a key to making use of Big Data later.
© Ken Albin and System Semantics, 2014. All Rights Reserved.