High-Speed Vision Transformers Advance Metal 3D Printing
At Carnegie Mellon University’s Next Manufacturing Center, researchers have developed an off-axial imaging system to capture the fleeting dynamics of molten pools in laser powder bed fusion (L-PBF) metal 3D printing. The setup combines a high-speed camera with magnification optics and optical filters designed to block the dominant wavelengths emitted by the plasma plume—a turbulent mix of ionized vapor, condensed particles, and ejected material that can exceed 3500 K. By filtering out much of this glare, the system reveals clearer gradients in melt-pool light emission than conventional direct imaging methods, enabling detailed observation of pool geometry at frame rates up to 54,000 frames per second.

Tests on Ti-6Al-4V tracks under varied power–velocity (P–V) regimes highlight distinct signatures for keyholing, stable melting, balling, and lack-of-fusion. As scan speed increases, melt pools shrink but elongate, except in lack-of-fusion conditions where low energy density produces shallow, compact pools. In keyholing, high energy density drives deep vapor cavities and strong oscillations, while balling—linked to Plateau–Rayleigh instability and Marangoni flow—produces disconnected molten segments and periodic bead-like tracks.
To quantify these dynamics, the team analyzed frame-to-frame intensity fluctuations and correlation coefficients. Severe keyholing showed the widest fluctuation range, with intensity oscillations in the 8–17 kHz range—higher than reported keyhole depth oscillations—though plume emissions can obscure direct depth measurements. Cross-correlation mapping across P–V space revealed how surface stability varies with processing parameters.
For automated classification, the researchers turned to vision transformers (ViTs), deep learning architectures originally designed for natural language processing but increasingly applied to computer vision. Using a tubelet embedding approach, video sequences were divided into nonoverlapping 3D patches spanning space and time, then projected with positional embeddings into transformer encoders. Given the small field of view and limited frame counts per sequence, the model relied on regularization rather than heavy data augmentation to preserve subtle temporal-spatial cues.
The system classified videos into four categories: desirable tracks and three defect types—keyholing, balling, and lack-of-fusion. Keyholing defects involve unstable, deep, narrow penetration that can trap pores and reduce fatigue life. Balling yields rough, undercut surfaces, while lack-of-fusion leaves unmelted powder and gaps. Experiments spanned SS316L stainless steel, Ti-6Al-4V titanium alloy, and IN718 nickel alloy. Cross-dataset evaluations showed strong generalization: a model trained on SS316L achieved 96.63% top-1 accuracy on IN718, but lower accuracy on Ti-6Al-4V, which produces denser plumes and has distinct thermophysical properties.
From classification results, process maps—plotting P–V combinations against print outcomes—were generated in situ. These matched ex situ characterization closely, aside from a single misclassification in Ti-6Al-4V balling. The researchers note that class boundaries are not absolute; keyholing can coexist with balling, and balling can occur in both shallow and elongated melt pools.
Comparisons with other architectures, including CNN-based VGG16, ResNet152, and MoViNet-A1, as well as transformer variants like TimeSformer, showed that ViT-based models outperformed CNNs. The pretrained ViViT-B model achieved over 90% accuracy with balanced performance across alloys. Visualization via t-SNE revealed clear class separations, and attention maps indicated that dynamic features at the keyhole and melt-pool tail were most influential.
Among ViViT-B variants, the factorized encoder model—separating spatial and temporal attention—excelled, particularly on the challenging Ti-6Al-4V dataset. Regularization further improved performance, underscoring its role in maintaining sensitivity to subtle melt-pool variations.
Beyond defect classification, the team explored process maps for morphological variability. Using Ti-6Al-4V single tracks printed at varying powers and speeds, they measured melt-pool width and area variability both in situ and via ex situ microscopy. High energy densities produced larger pools with greater variability, while the lowest area variability—indicating greater stability—aligned with the manufacturer’s recommended P–V settings. Notably, width variability was modest compared to area variability, with balling sometimes yielding smooth boundaries despite underlying instability.
By integrating high-speed optical diagnostics with advanced spatiotemporal deep learning, this approach offers a powerful tool for accelerating alloy process development. It enables real-time mapping of printability and defect regimes, potentially reducing the trial-and-error cycles that have long slowed the adoption of new materials in critical applications.
