Press Releases 2026

SoftBank Corp. Develops Synthetic Data Generation
Pipeline to Enable Secure Training for
its Large Telecom Model

Pipeline leverages NVIDIA NeMo Safe Synthesizer to balance data quality and security

March 17, 2026
SoftBank Corp.

SoftBank Corp. ("SoftBank") announced it built a synthetic data generation pipeline for its Large Telecom Model (LTM)^*1, a generative AI foundation model for the telecommunications industry. This pipeline enables secure and highly accurate training while protecting confidential information related to network operations. By utilizing NVIDIA NeMo Safe Synthesizer, a synthetic data generation tool that supports Differential Privacy^*2, SoftBank is successfully balancing the preservation of data quality while protecting confidential information, allowing the data to be safely used for LTM training.

[Notes]

*1
For more details on the "Large Telecom Model", please refer to the press release dated March 19, 2025, titled "SoftBank Corp. Develops a Foundational Large Telecom Model (LTM)."
*2
Differential Privacy is a data protection technology that mathematically guarantees the output results will not significantly change depending on the presence or absence of specific individual data, achieved by adding strictly calculated noise during the data training or analysis process.

Background

In advancing its LTM, SoftBank has continuously trained the model on detailed, large-scale proprietary data, including actual network operational and quality metrics, as well as base station configurations. However, because the training data contains highly confidential information, strict privacy and security constraints have historically limited the scope of model and data application. In addition, telecom network data consists of diverse information that is intricately linked. Applying simple anonymization or uniform noise processing destroys the subtle correlations that indicate signs of network failure or causes of quality degradation, rendering the data useless for AI training. To address this, SoftBank leveraged NVIDIA NeMo Safe Synthesizer for its LTM to build a synthetic data generation pipeline that enables secure and highly accurate model training.

Features of the synthetic data generation pipeline

The newly constructed synthetic data generation pipeline integrates NVIDIA NeMo Safe Synthesizer into LTM's data processing workflow. By applying this to large-scale network data, SoftBank can generate secure synthetic datasets that contain no confidential information while fully maintaining complex informational correlations. Robust data protection that preserves data quality is achieved primarily through the following two methods:

Mathematically-guaranteed data protection via differential privacy

By utilizing differential privacy techniques, it is mathematically guaranteed that removing or changing any single record's data has only a negligible effect on what the model learns or outputs. This places a quantifiable limit on the risk of the AI model memorizing rare, confidential information, such as unique data tied to individual network equipment.
Resistance evaluation against inference attacks (MIA/AIA)

In addition to the protection provided by differential privacy, vulnerability testing is conducted on the generated synthetic datasets to evaluate their resistance against inference attacks. Specifically, this verifies that the success rate of Membership Inference Attacks (MIA)—which attempt to guess if specific data was used in training—is no better than random guessing. It also validates against Attribute Inference Attacks (AIA)—which attempt to deduce confidential attributes like base station locations from common attributes—confirming that estimating confidential information is virtually impossible. Only data that passes these strict security evaluations is approved for LTM training and sharing.

To verify the effectiveness of this new pipeline, SoftBank trained and evaluated its LTM using the generated synthetic datasets. Consequently, in evaluations on downstream tasks, such as network performance analysis and interpreting complex operational data, the LTM achieved a level of accuracy suitable for real-world business operations.

Future developments

SoftBank plans to collaborate with telecom operators in Japan and overseas, network equipment vendors, and educational institutions to conduct Proofs of Concept (PoCs) utilizing the LTM synthetic data generation pipeline, further deepening the verification of its utility and safety in real-world business environments. Additionally, SoftBank will promote multi-layered security measures, including the introduction of LLM guardrails^*3 during the AI inference. In the long term, SoftBank aims to contribute to the ecosystem through industry associations like the AI-RAN Alliance, leading the advancement of next-generation communication networks and driving the secure, real-world deployment of generative AI across the entire telecommunications industry.

[Note]

*3
LLM guardrails are safety measure technology that monitors and controls the input and output of large language models.

"While the utilization of generative AI is essential for advancing telecommunications infrastructure, the highly confidential nature of data has historically been the biggest barrier to external collaboration," said Ryuji Wakikawa, Vice President and Head of the Research Institute of Advanced Technology at SoftBank Corp. "Building a pipeline using NVIDIA NeMo Safe Synthesizer that achieves a high-level balance between data utility and robust data protection is a major breakthrough toward the real-world deployment of LTM. SoftBank will expand this secure AI foundation into the broader industry ecosystem, driving the evolution of next-generation communication networks."

"SoftBank's synthetic data pipeline for its Large Telecom Model is a major milestone in telecom AI," said Chris Penrose, Vice President of Business Development for Telecoms at NVIDIA. "By leveraging NVIDIA NeMo Safe Synthesizer, SoftBank is unlocking its network data in a privacy‑protected way, opening the door to a new wave of innovation across the industry—from software developers to network vendors and operators. This sets the stage for a secure, collaborative ecosystem where partners can build and train on high‑quality synthetic network data, accelerating the deployment of generative and agentic AI on 5G‑Advanced and beyond."

SoftBank, the SoftBank name and logo are registered trademarks or trademarks of SoftBank Group Corp. in Japan and other countries.
Other company, product and service names in this press release are registered trademarks or trademarks of the respective companies.

About the SoftBank Research Institute of Advanced Technology: Guided by its mission to implement new technologies into society, SoftBank Corp.'s Research Institute of Advanced Technology promotes R&D and business creation for advanced technologies that support next-generation social infrastructure, including AI-RAN and Beyond 5G/6G, as well as telecommunications, AI, computing, quantum technologies, and technologies in the space and energy sectors. Through industry-academia collaboration and joint research with universities, research institutions and partner companies in Japan and abroad, the SoftBank Research Institute of Advanced Technology is contributing to the creation of global businesses and a sustainable society. For more details, please visit the official website.