d419: Federated Learning: Collaborative Machine Learning without Centralized Training Data

Federated Learning – Machine learning without centralized training data:

Federated Learning

Google Research Blog: https://research.googleblog.com/2017/04/federated-learning-collaborative.html
Discussion, Hacker News: https://news.ycombinator.com/item?id=14055655
Discussion, Reddit: https://www.reddit.com/r/programming/comments/63yka1/google_tensorflowbased_federated_learning_ml/
Technical paper: http://eprint.iacr.org/2017/281.pdf (“Practical Secure Aggregation for Privacy-Preserving Machine Learning”) [PDF]

Abstract—We design a novel, communication-efficient, failure robust protocol for secure aggregation of high-dimensional data. Our protocol allows a server to compute the sum of large, userheld data vectors from mobile devices in a secure manner (i.e. without learning each user’s individual contribution), and can be used, for example, in a federated learning setting, to aggregate user-provided model updates for a deep neural network. We prove the security of our protocol in the honest-but-curious and malicious settings, and show that security is maintained even if an arbitrarily chosen subset of users drop out at any time.

Federated Learning with Secure Aggregation:

Summary by Erik Jonker:

“Your device downloads the current model, improves it by learning from data on your phone, and then summarizes the changes as a small focused update. Only this update to the model is sent to the cloud, using encrypted communication, where it is immediately averaged with other user updates to improve the shared model. All the training data remains on your device, and no individual updates are stored in the cloud.

This is a great model for protecting privacy but still train a central model!”

A simplified view by TheMiamiWhale:

[Old Approach] User: here is my data, use it to update your model. Please don’t do anything shady with it!

[New Approach] User: I calculated the values you should use to update your model, here they are. I’ll keep the raw data for myself, thank you very much!

Acknowledgements (Google Research team):
This post reflects the work of many people in Google Research, including Blaise Agüera y Arcas, Galen Andrew, Dave Bacon, Keith Bonawitz, Chris Brumme, Arlie Davis, Jac de Haan, Hubert Eichner, Wolfgang Grieskamp, Wei Huang, Vladimir Ivanov, Chloé Kiddon, Jakub Konečný, Nicholas Kong, Ben Kreuter, Alison Lentz, Stefano Mazzocchi, Sarvar Patel, Martin Pelikan, Aaron Segal, Karn Seth, Ananda Theertha Suresh, Iulia Turc, Felix Yu, Antonio Marcedone and our partners in the Gboard team.

Share this:

Related