A New Paradigm — How AI and Robotics are Changing Materials Discovery
A guest post by Melyne Zhou
Climate Capital has invested in 300+ climate startups since 2015. CC Insights shares what we are learning about the early stage climate tech ecosystem. Co-invest with us.
This week’s post is a little different — as a team, we’re constantly consuming analyses and perspectives from seasoned professionals across the energy transition. This week, we’re bringing you a post authored by someone who will be mid-career at a very different point in the energy transition: Meet Melyne Zhou.
Melyne is a 15-year-old dedicated to leveraging materials science for climate tech solutions. She is currently developing a proposal for bio-inspired ion gel membranes in redox flow batteries to enable long-duration energy storage applications. Melyne has previously written about nanomaterials in solar energy applications. In her free time, she enjoys reading biographies and diving down rabbit holes. She can be found writing on Medium and in her newsletter, Mind and Matter.
The next generation of climate innovators is looking bright, and we’re thrilled to have one of them share their perspective on CC Insights. Enjoy!
From the concrete in our skyscrapers to the silicon that powers our devices, materials are everywhere. They built up human civilization as it is today. They can make or break a growing list of climate technologies. Yet traditionally, they also take notoriously long to discover and develop, limiting the speed and scale required to solve the world’s biggest problems. Fortunately, with a materials revolution spearheaded by developments in AI and robotics on the horizon, this won’t be the case for long.
The Case for New Materials
According to Natural Resources Canada, the cost of many clean technologies can be attributed, by up to 50%, to the materials they are made from. Better batteries, large-scale carbon capture, materials for a circular economy — these are just some of the many technologies that rely fundamentally on the materials they are built with. The typical 20+ year development timeline and billions in investment are being dramatically reduced.
The global market for advanced materials is predicted to be $76.26 billion in 2023, on the rise in years to come. And a growing list of organizations and startups are excited to join the space.
So if accelerated materials discovery is something we needed yesterday, how have materials traditionally been discovered?
How are Materials Discovered?
The answer: slowly, and laboriously. Before the 1600s, most materials were discovered through pure experiment and coincidence. The discovery of metals laid the literal foundations of human civilization, bringing us from the Stone Age to the Iron Age. This is referred to as the First Paradigm of Materials Discovery — where experiments were conducted purely with observation and luck.
Then, scientists began to study the laws of science, and the Second Paradigm began. Experimental scientists used proven scientific theory to guide their exploration and discoveries, rather than relying on pure luck. Rather than fumbling around blindly, we began using our first scientific flashlights.
💡 Thomas Edison, often credited for the development of the lightbulb, wasn’t actually the first to invent the concept. Rather, he was the first to try thousands and thousands of filaments before settling on the bamboo filament that would make a long-lasting light bulb.
He once said of his work, “I have not failed 10,000 times — I’ve successfully found 10,000 ways that will not work.”
In the 1950s, we entered the Third Paradigm: computational materials science. With the development of simulation tools based on scientific theory, we could calculate the approximate properties of materials — unfortunately, these calculations required and still do require immense computational power to calculate properties of more complex materials.
Now, we’re in the midst of a fourth (and fifth) revolution. With more and more data at our disposal and the rise of AI leads us into data-driven, accelerated discovery of materials.
Major paradigm shifts in Science, source IBM
Machine Learning In Materials Science
There are a number of ways machine learning (ML) can be applied to materials discovery — and a number of models that can be used. But they all share one thing in common: saving significant time, financial investment, or both.
💡 But first, what even is machine learning? Machine learning (ML) is when a model “learns” by analyzing existing data and uses what it learns to complete tasks typically done by humans — such as identifying whether an image is of a cat or a dog, or the best way to win a chess game.
The same goes in materials science. Machine learning can be applied in a number of different ways, from predicting properties to discovering new materials!
High-level workflow of the general machine learning process, source Vijay Kanade (SpiceWorks)
Supervised learning
When we have large amounts of labeled data, the model recognizes patterns or correlations in the data and can predict a property y for datapoint x. For example, the model will train itself using pictures of cats labeled as cats, and dogs labeled as dogs. It can then look at new pictures of cats and dogs it has never seen before, and classify those, too.
Supervised learning is used mostly for predictive analysis. We can narrow down a larger list of potential materials to a much smaller pool of potential materials by predicting their properties.
Unsupervised learning
On the other hand, unsupervised learning is when the model analyzes unlabeled data and tries to identify patterns or structures without explicit instruction.
They can output new examples of x once they are trained on an unlabeled dataset. One way to do this is through clustering, which is grouping the data into clusters based on a particular metric. Another is when the model tries to identify the underlying structure of the data, then generate new samples similar to the training data.
💡 There is also semi-supervised learning, which lays at the intersection between supervised and unsupervised learning. It uses both labeled and unlabeled data — which is often the only choice, as data can be limited in volume or quality depending on the industry.
The Discovery Cycle
When new materials must be discovered, they go through a development process. Prospective materials must be identified, synthesized, scaled up for manufacturing, and tested under various conditions.
Rather than relying on trial and error aided by educated guesses, AI models leverage existing databases to pinpoint precisely why a material can conduct so much better than another. And it can do this quickly, at scale, at all of these stages.
Data Collection
The first step is to curate our dataset. Let’s say we want to find a non-toxic material for flexible batteries — we’ll need to gather data on the capacity, cycling stability, and charge/discharge efficiency of existing and known materials. This could include experimental data or computational calculations (using methods like DFT). There are also many databases containing information about the electronic to structural properties of the materials, such as the Materials Project.
One problem is that data about materials is often sparse, high-dimensional, biased, and noisy — which is why domain knowledge is still an essential part of materials discovery.
Feature Engineering
Raw data isn’t always in a format that can be read by the model, which is where feature engineering comes in.
💡 Features are measurable properties or characteristics of the data — they’re also known as variables or attributes. The model uses them as input to make decisions.
They can be numerical features, which are continuous values that can be measured (like the number of squares and circles). They can also be categorical, discrete values (such as the color of each shape).
Features must also be adjusted during the process to suit the dataset and purpose, such as by handling missing data and replacing certain values or scaling numerical features, preventing certain features from disproportionately influencing the results due to differences in magnitude. Feature engineering allows for the model to recognize patterns and relationships from the data.
Model Training and Evaluation
There are a number of machine learning models that have been explored for materials discovery. Each of them runs input data into the algorithm and compares the sample output to its own processed output.
What are the applications?
*KoBold Metals uses AI-based technology to search for critical minerals, source *KoBold Metals
Virtual Screening — The model sifts through a large number of prospective materials and narrows it down to a smaller list of candidates. These materials are then tested through high-throughput techniques and analyzed for development.
Predicting Properties — The model, usually supervised, analyzes the properties of materials in the dataset, such as their conductivity and hardness. It applies this information to predict the properties of other, prospective materials.
These aren’t the only ways machine learning can be used. *KoBold Metals combines geoscience excellence and AI to accelerate natural resources exploration, to find materials like lithium and cobalt.
Also UK-based, the goal of Materials Nexus is to accurately and rapidly model new material properties to accelerate the development of next-generation materials, with a current focus on rare earth magnets.
Materials Zone provides a platform to manage R&D data in materials development, and helps researchers communicate with factories and manufacturing facilities.
Large Language Models
Thousands of papers are released every day — around 5.14 million papers were published in 2022 alone!
With such a huge quantity of papers, researchers must pick and choose what to read — and they could miss out on important advances or other significant details. LLMs can summarize research in a particular domain and filter out what isn’t relevant, creating shorter and more concise lists of prospective materials.
They can identify patterns that may be challenging for an individual to discern manually. They can also identify information in previously published research on the synthesis or most commonly reported properties of materials of interest.
Generative models
“De novo” means “from scratch,” which is essentially what generative models do. Similar to predictive models, the model is trained on a dataset and learns to identify patterns and relationships. What makes them different is their aim to generate something new based on what it’s learned.
Generally, a molecular compound is generated, then screened to weed out unstable compounds. Unstable compounds are when, due to the laws of thermodynamics, materials react so unusually (such as by decomposing upon synthesis) that they cannot be synthesized to serve their purpose.
Traditional materials discovery is focused on predicting properties from structure. Rather than screening through known and existing materials, one goal is to generate structures given the desired properties — which is called inverse design.
Overview of architecture of Deepmind’s GNoME model, source GNoME
Graph neural networks (GNNs), which uses a graph-based representation of molecules, have recently garnered excitement. Google’s DeepMind recently released GNoME, which claims to have discovered 800 years’ worth of knowledge, discovering 2.2 million new crystals and at least 380,000 stable materials. The implications for cleantech are clear — 528 potential lithium ion conductors were also found.
META’s AI research team has also been working on the Open Catalyst Project, which aims to discover low-cost catalysts for energy storage. It combines computational calculations with ML models to screen for millions of potential catalysts. Microsoft, too, released MatterGen, which also leverages generative AI to discover stable materials that can satisfy a target property.
UK-based startup Orbital Materials aims to develop a foundation machine learning model to design green materials, starting with carbon capture materials.
Autonomous Experiments — Self-Driving Labs
Every AI-driven initiative in the materials discovery space shares one thing in common: they still require experimental validation.
The Acceleration Consortium, an organization started from the University of Toronto, received a $200 million grant from the Canada First Research Excellence Fund (CFREF) to deploy self-driving labs to accelerate materials discovery and development — largest federal research grant ever awarded to a Canadian university.
Machine learning models alone cannot replace experimental science — as Alàn Aspuru-Guzik, head of the Acceleration Consortium, says, “AI is kind of like the brain, but we are nothing without our hands.”
The Acceleration Consortium includes seven labs, each focused on a specific domain from small organic molecules to organs-on-a-chip. It also includes 100+ members and partners, with backgrounds ranging from academia to industry. The goal is to create “self-driving labs,” which combine robotics, AI systems, and human intelligence to accelerate the experimental discovery of new materials.
The need for SDLs
The AC aims to reduce the 20 years and billions in investment that it usually takes to bring new materials from lab to market. Results in the lab can be difficult to scale up to commercial production — a rate we can’t accept as the need for better materials in the face of the climate crisis grows.
Photograph: Johnny Guatto/University of Toronto
Self-driving labs can also remove errors human researchers might make, providing more reliable results along with a team member who can work 24/7. It’s also important to note that SDLs aren’t replacing human scientists. The goal is to reduce the amount of time researchers spend on tedious tasks so they can focus on what matters most — rather than pipetting solutions drop by drop, they can create new hypotheses or work on improving models.
How do SDLs work?
Self-driving labs are still in the early stages. ML models plan and control the experiments, while the robotic parts repeat their intended purpose until they are told to stop. Most are currently only able to perform basic tasks, and while faster and more accurate, many on the market are also very expensive.
Switzerland-based Chemspeed provides lab automation and digitalization solutions — called high-throughput tools. They offer a number of robotic tools and workspaces, with functions from mixing solutions to twisting bottle caps on and off. Kebotix offers a closed-loop discovery platform that combines computational chemistry, machine learning and a self-driving robotic lab.
Conclusion
It’s clear that material science is needed for a cleaner future — and the industry doesn’t have time to waste. The demand for new materials is only growing, and so much is left undiscovered in the field of materials science.
_______
Notes:
Companies marked with an asterisk (*) are Climate Capital portfolio companies.
The Author’s opinions are their own and not necessarily representative of Climate Capital.
Disclaimer: Under no circumstances should any information or content in this email be considered an offer to sell or solicitation of interest to purchase any securities, including any securities advised by Climate Capital or any of its affiliates or representatives. Further, no content or information herein is or is intended, nor should it be construed as, an offer to provide any investment advisory service, financial advice, legal, tax, accounting, investment, or other advice from Climate Capital or any of its affiliates (collectively "Climate Capital”). Under no circumstances should anything herein be construed as fund marketing materials by prospective investors considering investing in any Climate Capital investment fund. Content contained herein does not constitute an offer to sell — or a solicitation of an offer to buy — any securities and may not be used or relied upon in evaluating the merits of any investment. Information regarding companies highlighted herein has been provided by third parties, and Climate Capital makes no representations or warranties as to its accuracy, as to the viability of any company listed herein, or the results of any investment in a listed company.