Boston Dynamics Atlas Brings AI Language Prompts to Factory Picking

A factory floor is one of the least forgiving places: parts are cumbersome, bins are moved, there is change in light and one dropped part is a new challenge.

Boston Dynamics has been structuring its new Atlas work as the kind of muddiness, moving the humanoid beyond the choreographed demos to manipulation routines that more closely resemble real production activities. The main argument of the company is that useful humanoids will not be determined using a few predetermined movements, but rather by the ability to handle with a variety of objects and systems and be able to interact with feet, torso, arms, and hands at the same control challenge.

Boston Dynamics and Toyota Research Institute (TRI) have been in that direction training Atlas with a language-conditioned policy, and they call these models large behavior models (LBMs). Rather than considering planning, perception, locomotion and manipulation as clearly-defined modules each with explicit handoffs, the effort focuses more on policies to directly map sensor inputs and a language prompt to whole-body actions. The purpose of the proposed direction was clearly explained by Scott Kuindersma, the vice president of robotics research at Boston Dynamics: “Training a single neural network to perform many long-horizon manipulation tasks will lead to better generalization.”

Those policies have less to do with the workflow of a breakthrough algorithm, and more to do with constructing an industrialized loop of embodied data. Boston Dynamics outlines four repeatable processes, namely: collect behavior data by teleoperation (both on hardware and in simulation), curate and annotate it to train, train a single policy on a large set of tasks and then evaluate performance in a special test suite. The test suite is not published as a scorecard to be consumed by the outside world; it serves as an internal instrumentation – what failed, what almost failed, what seemed steady enough to climb – so the team can know what demonstrations they will gather next and which inference strategies to revise.

One of the most important engineering decisions is the requirement to regard task breadth as a design constraint at first order. Boston Dynamics designed a teleoperation stack which combines the model predictive control of Atlas with a bespoke VR interface, the goal of which is to collect demonstrations with finger-level dexterity and complete-body reaching with strides, crouches, and posture transitions. Practically, that implies that it is not just the hand motions, but also the foot positions and entire body rearrangements that allow the hand motions to be possible without falling, scratching or colliding with themselves.

That physical world manifests itself in the sensing and control loops that the policy has to exist within. Atlas employs head mounted HDR stereo operator awareness cameras and input at 30 Hz as well as vision input. Images, proprioception, and a language prompt are fed into the policy which, instead of outputting a single command on the spot, produces a portion of future actions, a strategy that has the potential to confer smooth motion and decrease the twitchiness that long-horizon tasks can make visible. Boston Dynamics credits the basic learning architecture to the LBMs of TRI, where the learning architecture is a 450 million-parameter diffusion- transformer framework and the objective is a flow-matching loss.

The tasks are more appropriate measures of what is considered to be factory-like than the model size. In a notable sequence, called by Atlas the Spot Workshop, Atlas clears a cart of quadruped parts: pick and pack legs, then set them on a shelf, draw out a bin and fill it with face plates, turn and empty a second bin into a tilt truck. It is not that the robot is able to pick things, it is that the routine combines regrasping, articulation, placement and body repositioning without position resetting between subtasks. The language prompt is the high level objective switch and the policy is supposed to provide continuity in the physical details.

Another emphasized aspect of Boston Dynamics is recovery, due to the fact that the floor consistently creates surprises that a laboratory bench does not. One of the parts falls, a lid snaps and something falls out of the projected work area. The frame by the company is that initial implementations of such policies failed to manage such upheavals and that the course towards betterment was not novel line of code paths but extra illustrations of recovery actions that are followed by retraining. In the long run, the policy will transfer state changes as determined by onboard sensors and react based on training patterns, which will not require expert robotics code. The team of Kuindersma writes that this changes the skills pipeline: it is no longer necessary to have an advanced degree and years of experience as a creative to write the new manipulation behaviors.

Taking the teleoperation interface itself as a product, it became necessary to make the data engine practical. The first VR system was a one-to-one mapping of the upper body of the operator to Atlas when the operator remained stationary- useful in bimanual tasks, but not in mobile tasks. The addition of foot trackers allowed stepping and stance control based on the intent of the operator, allowing Atlas to expand its support polygon, squat, and reposition itself to access into totes without hitting the walls of the containers. That is, the data starts to contain the ugly movements of the manufacturing process: reversing to get space, repositioning a load in front of a reach, and setting a load in place before turning.

Simultaneously, simulation is the multiplier of the speed of iteration. The company explains that it uses simulation to test teleoperation modifications, execute repeatable tests, and co-train policies on scale with the same equipment as hardware. Such focus is consistent with the industry perspective that a sim-first approach is less expensive, less risky, and provides higher dataset throughput, especially where humanoids have to develop coupled locomotion and manipulation controllers in varied layouts. Newer simulation toolchains have been extending the support of teleoperation devices, and the whole-body learning processes, such as more advanced whole-body control and more versatile demonstration capture pipelines.

The long-run assumption is not that Atlas is able to do one sequence, but that the behavior library is able to expand without being brittle. Boston Dynamics makes generalist training the mechanism: one policy trained on a large, diverse set of policies, as opposed to many highly specialized policies. The general foundation-model push of robotics is reflected in that direction, as organizations grow more and more interested in the combination of teleoperation, simulation, and cross-embodiment data to reduce overfitting and enhance transfer to novel objects and environments. In that regard, the factory floor is a testing ground since it condenses many more failure modes: the variability of contact, clutter, the workflow around people, etc. into more routine patterns.

Safety and deployment discipline are the other quiet drivers. Humanoids introduce risks that differ from caged industrial arms or slow collaborative robots, simply because they move through human-designed spaces with human-like reach and momentum. Industry-wide, the work of formalizing guardrails is underway, including global safety standards for humanoid robots being developed through ISO processes that focus on risk assessment and human–robot collaboration protocols. For companies aiming at production environments, those standards are not peripheral; they shape how tasks are specified, how speed is limited, and how recovery behaviors are validated.

Boston Dynamics’ most telling detail may be that it treats behavior development as a throughput problem. When policies can be sped up 1.5× to 2× at inference time without retraining, when demonstrations can be collected with whole-body fidelity through VR, and when simulation can be reused as both test bench and data source, the bottleneck shifts toward curation, coverage, and measurement. Russ Tedrake, TRI’s senior vice president of LBMs, summarized the scaling ambition: “Large behavior models address this opportunity in a fundamentally new way – skills are added quickly via demonstrations from humans.”

For factories evaluating humanoids, that framing matters because it suggests a different adoption curve. The promise is not that a humanoid arrives pre-trained for every station, but that adding a station resembles teaching rather than programming capturing high-quality demonstrations, validating them in a task suite, and iterating until recovery looks routine. The remaining work, as Boston Dynamics describes it, is to expand the “data flywheel” while improving tactile force control, dynamic manipulation, and reinforcement-learning upgrades to vision-language-action models. If those pieces hold together, “factory work” becomes less about a single spectacular demo and more about whether the same policy can keep learning across weeks of mundane, messy shifts.

Boston Dynamics Atlas Brings AI Language Prompts to Factory Picking

Like this:

相关

Pizza Delivery Drones Are Finally Carrying Family-Sized Orders

Why Emergency Drone Fleets Now Need Automatic Flight Records

FAA’s Faster Drone Penalties Raise the Stakes for Rule Breakers

Why Seeing SiFly Drones on Public Flight Maps Matters

Recomended

Pizza Delivery Drones Are Finally Carrying Family-Sized Orders

Why Emergency Drone Fleets Now Need Automatic Flight Records

FAA’s Faster Drone Penalties Raise the Stakes for Rule Breakers

Why Seeing SiFly Drones on Public Flight Maps Matters

Why Falcon Heavy’s Return Matters Beyond One Satellite

Why Russia’s ISS Cargo Runs Still Depend on Baikonur

Share this:

Like this:

相关

Boston Dynamics Atlas Brings AI Language Prompts to Factory Picking

Share this:

Like this:

相关

Recomended

Discover more from Aerospace and Mechanical Insider