If your data are used to train a Machine Learning model, chances are that a Data Scientist, a Data Engineer, or an ML engineer is going to stumble upon it! I know, for example, that Walmart categorically refuses to use AWS as Amazon is their direct competitor and doesn't want to risk having their data fall into the wrong hands.
One solution to that data privacy problem could be to encrypt the data, but for typical encryption, that would mean that the whole training data needs to be encrypted. But if the Data Engineers own the key that encrypts the data, what would stop them from decrypting it when they receive new customer data? One way to go about it is to use Full Homomorphic Encryption (FHE).
FHE means that when you encrypt data, it preserves addition and multiplication operations. For example, if E is the encryption function, we have:
E(a + b) = E(a) + E(b) and E(a x b) = E(a) x E(b)
In practice, it means that FHE preserves any polynomial transformation of the data. If a computation involves a polynomial, computing and then encrypting gives the exact same result as encrypting and then computing. If we have a polynomial transformation P then
P(E( a )) = E(P( a ))
If D is the decryption function, we have D( E( a ) ) = a (the data is preserved after decryption). This means that
D(P( E( a ))) = D(E(P( a ))) = P(a)
That means that we can have computations with fully encrypted data completely equivalent to no encryption at all! If P is an ML model, that means that we can train a model with non-encrypted data, and infer on encrypted data. The model output will be encrypted as well and can be decrypted by the party that encrypted the data in the first place. This means that the model server host will never see raw customer data.
That is great, but that means that our ML model needs to be a polynomial transformation. Operations in neural networks like ReLU or Softmax contain MAX and EXP functions that are not polynomial. So, we need to modify the basic components of ML models if we want to use FHE. For example, this paper proposes precise polynomial approximations of NN components: https://arxiv.org/pdf/2105.10879.
Another application for FHE is federated learning. Multiple models are learning on local machines with private data and are aggregated on a remote server. The remote server and the local machines can then sync their models after the aggregation. One problem with that is that we can always reverse engineer information about the private data from the trained model. Because the model aggregation on the remote server is typically a simple average, we can send the encrypted models to the aggregation server instead and decrypt the synched model locally. Unfortunately, that means that each client needs to share the same encryption and decryption keys, which are prone to attacks.
Comments
Post a Comment