Intellectual property (IP) is more sought after than ever. But don’t just take my word for it; look at attacks such as Hydraq, Stuxnet and recent thefts of proprietary designs and programs from large corporations. Between 2008 and 2009 American business losses due to cyber attacks had grown to more than $1 trillion worth of IP.
For organizations this means the task of protecting intellectual property and sensitive information contained in documents, spreadsheets, and product design files, is more important than ever before.
To protect their intellectual property, organizations must first know where it is. But, locating IP throughout the organization has become much more difficult. Intellectual property is often buried in a sea of unstructured data that is spread out across physical, virtual and cloud-based infrastructure. And there’s more and more of it, as noted by a recent Information Week article:
“The challenge for IT is that unstructured data is growing at a breakneck pace– a compound annual growth rate of 61 percent, according to the International Data Corporation (IDC), almost three times the growth rate of structured data. It’s also scattered throughout the enterprise: in folders on file servers, on laptops, and tucked inside USB drives.”
Also, the difference between sensitive and non-sensitive data are often subtle – as is the case with proprietary source code versus open source – making it even harder to identify.
Data Loss Prevention Technologies Must Evolve to Protect IP
Many organizations implement Data Loss Prevention (DLP) programs to identify sensitive information and create policies to control where data should and shouldn’t go. DLP has historically relied on two categories of detection technology to find sensitive data: fingerprinting and describing technologies.
Fingerprinting is extremely accurate, but you have to collect all the data that needs to be protected before it can be fingerprinted. The alternative, describing data, requires creating lists of keywords that describe the data. Describing data can be time consuming, and requires ongoing expertise and tuning to ensure accurate detection.
A New Way to Find and Protect Data: Vector Machine Learning
A new category of DLP detection technology has emerged that overcomes the limitations of current detection technologies and enables organizations to use software that learns to detect the types of confidential data that require protection. Vector Machine Learning is trained using sample documents to recognize the defining characteristics and identify the subtle differences between sensitive and non-sensitive data. Accuracy of Vector Machine Learning can be improved over time as additional positive and negative samples are fed back into the system.
While machine learning as a concept has been around for decades and has been used in everything from anti-spam engines to Google™ algorithms for translating text, it is only now being applied to DLP content analysis. As a DLP detection technology, Vector Machine Learning helps to quickly and efficiently protect IP and confidential information among increasing amounts of unstructured data.
Vector Machine Learning, combined with current describing and fingerprinting technologies, provides a new model for improving the efficiency and performance of DLP products and programs. If your organization has highly dispersed, growing data sets of unstructured proprietary and confidential information, you’ll want to examine and evaluate VML more closely.Tags: data breach, data loss prevention, IP protection