Adversarial Machine-Learning-Enabled Anonymization of OpenWiFi Data
Wireless World: Research and Trends Magazine
PDF
HTML

Keywords

Anonymization
clustering techniques
cluster validation
generative CTGAN

Abstract

Data privacy and protection through anonymization is a critical issue for network operators or data owners before it is forwarded for other possible use of data. With the adoption of Artificial Intelligence (AI), data anonymization augments the likelihood of covering up necessary sensitive information; preventing data leakage and information loss. OpenWiFi networks are vulnerable to any adversary who is trying to gain access or knowledge on traffic regardless of the knowledge possessed by data owners. The odds for discovery of actual traffic information is addressed by applied conditional tabular generative adversarial network (CTGAN). CTGAN yields synthetic data; which disguises as actual data but fostering hidden acute information of actual data. In this paper, the similarity assessment of synthetic with actual data is showcased in terms of clustering algorithms followed by a comparison of performance for unsupervised cluster validation metrics. A well-known algorithm, K-means outperforms other algorithms in terms of similarity assessment of synthetic data over real data while achieving nearest scores 0.634, 23714.57, and 0.598 as Silhouette, Calinski and Harabasz and Davies Bouldin metric respectively. On exploiting a comparative analysis in validation scores among several algorithms, K-means forms the epitome of unsupervised clustering algorithms ensuring explicit usage of synthetic data at the same time a replacement for real data. Hence, the experimental results aim to show the viability of using CTGAN-generated synthetic data in lieu of publishing anonymized data to be utilized in various applications.

https://doi.org/10.13052/2794-7254.005
PDF
HTML

References

A. Machanavajjhala, D. Kifer, J. Gehrke, and M. Venkitasubramaniam, “l-diversity: Privacy beyond k-anonymity,” ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 1, no. 1, pp. 3–es, 2007.

N. Li, T. Li, and S. Venkatasubramanian, “t-closeness: Privacy beyond k-anonymity and l-diversity,” in 2007 IEEE 23rd international conference on data engineering. IEEE, 2007, pp. 106–115.

X. Yang, T. Wang, X. Ren, and W. Yu, “Survey on improving data utility in differentially private sequential data publishing,” IEEE Transactions on Big Data, vol. 7, no. 4, pp. 729–749, 2017.

P. Jain, M. Gyanchandani, and N. Khare, “Big data privacy: a technological perspective and review,” Journal of Big Data, vol. 3, no. 1, pp. 1–25, 2016.

S. Biswas, N. Khare, P. Agrawal, and P. Jain, “Machine learning concepts for correlated big data privacy,” Journal of Big Data, vol. 8, no. 1, pp. 1–32, 2021.

W. Liao, J. He, S. Zhu, C. Chen, and X. Guan, “On the tradeoff between data-privacy and utility for data publishing,” in 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS). IEEE, 2018, pp. 779–786.

J. Coutinho-Almeida, P. P. Rodrigues, and R. J. Cruz-Correia, “Gans for tabular healthcare data generation: A review on utility and privacy,” in International Conference on Discovery Science. Springer, 2021, pp. 282–291.

E. Schubert, J. Sander, M. Ester, H. P. Kriegel, and X. Xu, “Dbscan revisited, revisited: why and how you should (still) use dbscan,” ACM Transactions on Database Systems (TODS), vol. 42, no. 3, pp. 1–21, 2017.

L. R. Rabiner, “A tutorial on hidden markov models and selected applications in speech recognition,” Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, 1989.

J. Yoon, L. N. Drumright, and M. Van Der Schaar, “Anonymization through data synthesis using generative adversarial networks (ads-gan),” IEEE journal of biomedical and health informatics, vol. 24, no. 8, pp. 2378–2388, 2020.

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” Advances in neural information processing systems, vol. 27, 2014.

A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” CoRR, vol. abs/1511.06434, 2015. [Online]. Available: https://api.semanticscholar.org/CorpusID:11758569.

J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2223–2232.

L. Xu, M. Skoularidou, A. Cuesta-Infante, and K. Veeramachaneni, “Modeling tabular data using conditional gan,” Advances in Neural Information Processing Systems, vol. 32, 2019.

L. Xu and K. Veeramachaneni, “Synthesizing tabular data using generative adversarial networks,” arXiv preprint arXiv:1811.11264, 2018.

N. Park, M. Mohammadi, K. Gorde, S. Jajodia, H. Park, and Y. Kim, “Data synthesis based on generative adversarial networks,” Proc. VLDB Endow., vol. 11, pp. 1071–1083, 2018. [Online]. Available: https://api.semanticscholar.org/CorpusID:47017667.

L. Sweeney, “k-anonymity: A model for protecting privacy,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 10, no. 05, pp. 557–570, 2002.

O. Hajihassani, O. Ardakanian, and H. Khazaei, “Anonymizing sensor data on the edge: a representation learning and transformation approach,” ACM Transactions on Internet of Things, vol. 3, no. 1, pp. 1–26, 2021.

V. Mirjalili, S. Raschka, and A. Ross, “Privacynet: semi-adversarial networks for multi-attribute face privacy,” IEEE Transactions on Image Processing, vol. 29, pp. 9400–9412, 2020.

C. Esteban, S. L. Hyland, and G. Rätsch, “Real-valued (medical) time series generation with recurrent conditional gans,” arXiv preprint arXiv:1706.02633, 2017.

S. V. McCoy, “Exploration of user privacy preservation via ctgan data synthesis for deep recommenders.”

S. Kuili, B. Kantarci, M. Erol-Kantarci, M. Chenier, and B. Herscovici, “A holistic machine learning approach to identify performance anomalies in enterprise wifi deployments,” in Big Data IV: Learning, Analytics, and Applications, vol. 12097. SPIE, 2022, pp. 105–117.

Ó. Belmonte-Fernández, E. Sansano-Sansano, A. Caballer-Miedes, R. Montoliu, R. García-Vidal, and A. Gascó-Compte, “A generative method for indoor localization using wi-fi fingerprinting,” Sensors, vol. 21, no. 7, p. 2392, 2021.

M. Vidyasagar, “Bounds on the kullback-leibler divergence rate between hidden markov models,” in 2007 46th IEEE Conference on Decision and Control. IEEE, 2007, pp. 6160–6165.

M. E. Ferrão, P. Prata, and P. Fazendeiro, “Utility-driven assessment of anonymized data via clustering,” Scientific Data, vol. 9, no. 1, p. 456, 2022.

Downloads

Download data is not yet available.