At the Center for Functional Nanomaterials (CFN), part of the U.S. Department of Energy’s Brookhaven National Laboratory, Kevin Yager has developed a chatbot designed to work within a highly specialized scientific domain: nanomaterials research. Yager, who leads CFN’s electronic nanomaterials group, sought to address a common challenge in advanced research—how to quickly access, process, and apply deep technical knowledge without requiring collaborators to spend days reading through dense literature.

While general-purpose AI chatbots have demonstrated impressive capabilities in generating text and code, they typically lack the detailed, trustworthy knowledge needed for niche scientific fields. Yager’s approach was to combine a large language model with a document-retrieval system grounded in curated nanoscience publications. This allowed the chatbot to respond with contextually accurate information, avoiding the pitfalls of generic models that often produce plausible but incorrect statements.
“CFN has been looking into new ways to leverage AI/ML to accelerate nanomaterial discovery for a long time,” Yager noted. AI and machine learning are already embedded in CFN’s workflows, helping researchers identify and catalog samples, automate experiments, control precision equipment, and even discover new materials. Esther Tsai, another scientist in the group, is developing an AI companion to speed materials research at the National Synchrotron Light Source II (NSLS-II), another DOE facility at Brookhaven.
The novelty of Yager’s chatbot lies in its capacity to handle scientific text with precision. To achieve this, the system was supplied with domain-specific documents—peer-reviewed publications in nanomaterials science. This curated library gave the AI access to trusted facts, definitions, and emerging concepts. Instead of retraining the language model from scratch, Yager implemented an embedding-based retrieval process.
Embedding converts words and phrases into numerical vectors that represent their meaning. When a user submits a query, the system generates a vector for that query and searches a pre-computed database of embedded text chunks from the curated publications. It then retrieves semantically related excerpts and combines them with the user’s question into a prompt for the language model. This ensures the model’s output is rooted in relevant, factual source material.
“A challenge that’s common with language models is that sometimes they ‘hallucinate’ plausible sounding but untrue things,” Yager explained. “We don’t want it to fabricate facts or citations. This needed to be addressed. The solution for this was something we call ‘embedding,’ a way of categorizing and linking information quickly behind the scenes.”
By functioning like a reference librarian, the chatbot can interpret complex questions, locate the most pertinent information, and synthesize it into coherent responses. While Yager acknowledges that responses are not flawless, the system is already capable of tackling difficult queries and sparking new ideas during project planning.
The implications for research productivity are significant. CFN envisions AI/ML systems freeing scientists from repetitive, time-consuming tasks, allowing them to focus on high-value problem-solving. Such tools could classify and organize documents, summarize lengthy papers, highlight key findings, and help researchers rapidly gain familiarity with unfamiliar topics.
“There are a number of tasks that a domain-specific chatbot like this could clear from a scientist’s workload,” Yager remarked. “Classifying and organizing documents, summarizing publications, pointing out relevant info, and getting up to speed in a new topical area are just a few potential applications.”
The ethical and practical dimensions of this shift are still being explored. Scientists are actively discussing how to ensure AI tools are deployed safely, transparently, and in ways that enhance rather than undermine research integrity. Yager sees this as part of a broader transformation in how science is conducted, driven by the integration of advanced computational tools into everyday laboratory practice.
For those interested in experimenting with the technology, CFN has made the chatbot’s source code and associated tools publicly available via GitHub, enabling other researchers to adapt and extend its capabilities for their own specialized domains.
