Page 1 of 1

right tool(s) for data classification request

Posted: Thu Aug 17, 2017 12:43 pm
by qt_ky
We are already licensed for IA and use it for traditional data profiling against relational databases. Recently some interesting questions have arisen.

A customer needs to crawl a large number of servers (web servers, file servers, database servers, application servers, etc.) to find where sensitive data resides (like PII), which is one of the features that IA advertises. With the wide variety of servers and file types, I assume they cannot be predefined as importable metadata.

This sounds a bit like what an antivirus product does except that it would try to classify the data.

Does IA have any file-crawling capabilities that could be used to find where PII data resides in this scenario?

Is there another tool or utility that could possibly be used to bridge the gap to help IA to find the sensitive data?

Or is there a more appropriate tool for the job?

Posted: Thu Aug 17, 2017 3:51 pm
by PaulVL
That is not so much a datastage question but a Unix Security Scan question.

Not sure which forum that would be.

IA is not the tool to crawl your network since it cannot dynamically created connections, schemas, etc...

Posted: Thu Aug 17, 2017 5:40 pm
by qt_ky
It is an IA question right now because "data classification / find PII" is an IA feature that the customer is quite excited about.

Yes, it may be similar to a UNIX security scan function although I would wager most of the servers or virtual servers are Windows and some are likely to be Linux-flavored.

Posted: Fri Aug 18, 2017 8:02 am
by PaulVL
To my knowledge IA doesn't have the ability to crawl across host looking for files that may or may not contain a SN. That is his fundamental issue. ONCE he finds the file he can scan for SN, but I believe that the fact that he found it implies a SN detection of some sort.

And how is IA supposed to know the schema layout of the file?

Posted: Fri Aug 18, 2017 9:05 am
by UCDI
text files might be doable but a SSN for example can be almost any grouping of 8 bytes in any file anywhere on the disk for a binary file. And that is uncompressed text files, compression or encryption would make those impossible also.

Posted: Mon Aug 21, 2017 12:06 am
by ray.wurlod
The Discovery tool's functionality is being incorporated into Information Analyzer thin client, if indeed it hasn't been already (I haven't looked at FP2 yet).