Hello everyone!
I am Niraj and I will be sharing my code contribution of the tenth week of the GSoC.
Background
One of our user has mentioned that we aren't currently showing file path where we found CVE in the output report. We were just logging it on the console so, my peer Harmandeep Singh has implemented a way to to store paths with the CVE data and write it in the output file. But unfortunately it was breaking CVEScanner
whenever we use --input-file
flag for scanning CVEs from CSV or JSON file.
When I start digging it, I also found out several issues in the current data structure for all_cve_data
which are as following:
- Old
CVEData
wasNamedTuple
and since newly added path attribute was mutable it can create hard to find bugs. - To update path we need to scan
all_cve_data
to find product for which we want to append paths and its time Complexity is O(n2) which can be reduced to O(n) using better structure. - Throwing vendor, product, version in different function was decreasing readability. So, a
ProductInfo
datastructure would be nice to pack this data together since we never need that alone. -
TriageData
wasn't syncing with oldCVEData
. So, csv2cve or input_engine was breaking.
What did I do this week?
I have experimented with various data structure to find out the one that handle all of the issues mentioned above efficiently. In the past, all_cve_data
was Set[CVEData] which was sufficient at that time because all attributes were immutable in CVEData
and we were just using set to remove duplicates from the final output.
But, when we introduced paths attribute we need to change paths every time we detect same product in different file and set doesn't have any easy way to get value stored in it apart from looping over whole set to find what we are looking for.
Note: set can only assert if object is in the container or not but doesn't have any way to retrieve actual object from the container. I have implemented a MutableSet data structure which provides functionality to retrieve actual object from the container using
__getitem__
but I didn't want to use my custom data structure as long as it is feasible to use standard data structures.
So, I have refactored CVEData
into two parts: 1) immutable ProductInfo(vendor, product, version)
and 2) mutable CVEData(list_of_cves, paths_of_cves)
. And I am storing mapping of ProductInfo
and CVEData
(Dict[ProductInfo][CVEData]) into all_cve_data
so now we can access CVEData
of a product without having to traverse whole all_cve_data
.
I have moved all data structures into utils to avoid circular imports and I have also added test for paths.
What am I doing this week?
I will continue to improve documentation of the code I generated like adding docstrings and comments. And I am also going to add requested how-to guides to improve User Experience.
Top comments (0)