Cannabis data lacking, but machine learning could help fill the gap

By Lisa Marshall • Published: Sept. 28, 2020

THC and CBD.

Anyone who has used, sold, studied or even read much about marijuana likely recognizes these acronyms as active ingredients in the plant.

But beyond intoxicating tetrahydrocannabinol (THC) and therapeutic cannabidiol (CBD), there exists a diverse array of chemicals believed to quietly interact 鈥� a phenomenon known as the 鈥榚ntourage effect鈥� 鈥� influencing how each unique cannabis strain makes people feel.

To date, the cannabis industry has collected remarkably little data about those lesser-known compounds, new 葫芦娃视频 research shows. But that same study, , suggests that a surprising scientific field could play an integral role in filling the knowledge gap.

鈥淭his paper provides a very early example of how applying advanced data science techniques could give us new insight into how this plant works,鈥� said senior-author Brian Keegan, an assistant professor in the Department of Information Science.

A problem of missing data

Ask a dispensary bud tender for advice and it鈥檚 not uncommon for them to make generalizations, recommending, for instance, Cannabis sativa varieties for an energetic high, or Cannabis indica for a relaxing effect.

Variety names like Girl Scout Cookies or Gorilla Glue give the impression of standardization 鈥� buy it in one place and you鈥檒l get the same product as if you buy it elsewhere, many assume.

Daniela Vergara

Biologist Daniela Vergara studies the genetics of cannabis.

But that鈥檚 often not the case, says study first-author Daniela Vergara, a research associate in the Department of Ecology and Evolutionary Biology.

Different flavonoids and terpenes can make seemingly similar varieties taste and smell different, and secondary cannabinoids may influence whether it鈥檚 relaxing or stimulating, sedating or creativity-inspiring.

The only way to truly know what鈥檚 in a variety is to measure the chemicals.

鈥淏ut because regulations only require reporting on a few compounds like THC and CBD, there鈥檚 very little data being collected on these other compounds or how they interact,鈥� said Vergara. 鈥淲e鈥檙e not getting the whole picture.鈥�

With medical or recreational marijuana now legal in 39 states, and sales in Colorado alone topping $1.7 billion in 2019, filling those knowledge gaps is more important than ever, potentially leading to product standardization or new therapies based on the entourage effect, the authors said.In hopes of getting the full picture on the plant, Vergara teamed up with Keegan to analyze a dataset of more than 17,600 cultivars of cannabis flower, supplied by one of the country鈥檚 largest cannabis testing companies, over eight years.

When assessing how much data was available on seven different cannabinoids, the researchers found 鈥� not surprisingly 鈥� that only 1.4% of cultivars were missing data about THC and 38% percent were missing data about CBD. Only 153 samples contained data on all seven cannabinoids, and some were almost never measured.

For instance, only 597 samples, less than 4%, contained information about CBDV (cannabidvarin), a non-psychoactive compound believed to quell seizures. And 62% of samples were missing data bout CBN (cannabinol), a compound often recommended for sleep.

Enter machine learning.

鈥淲e thought that data science methods could help with what is fundamentally a missing data problem,鈥� said Keegan. 鈥淐ould we use the data we have about the chemical profiles of some strains to impute, or guess, the values of those where we have no data?鈥�

The trouble with names

Using algorithms and statistical methods, the team set out to uncover hidden patterns found in the data. Quickly, they learned that one of their key assumptions was wrong.

In the plant, THCA and CBDA (acidic forms of the cannabinoids that convert to THC and CBD with heat) both compete for the same precursor molecule, Cannabigerolic acid (CBGA). So the researchers assumed strains high in THC would be low in CBD, or vice versa.

鈥淚t didn鈥檛 turn out that that way,鈥� said Keegan, noting that some strains were high in both. 鈥淭his suggests we don鈥檛 know as much about these chemical pathways as we thought we did.鈥�

Using a method called dimensionality reduction, they were able to cluster strains into four distinct categories based on chemical properties, each of which corresponded with different use cases (medicinal, recreational, combined, industrial).

Curiously, some varieties with the same name showed up in different clusters.

鈥淭his study reaffirms the misnaming of Cannabis varieties by the industry,鈥� the authors noted. 鈥淪train name is not indicative of potency or overall chemical makeup.鈥�

Filling the blanks

Going forward, Keegan will continue using machine learning to fill gaps in the data. But to do it right requires widespread cannabis industry collaboration.

Data scientist Brian Keegan is applying machine learning to fill in gaps in understanding about cannabis.

Brian Keegan

Data scientist Brian Keegan.

鈥淚f more people would share more of their data, we could make better inferences about how these different cannabinoids work or interact with each other,鈥� he said.

He envisions a day when custom products could be developed for medical use based on the complex entourage effect of interacting compounds. Dispensary customers could review an ingredient panel, much like the nutrition facts panel on food, before buying. And names would mean something.

鈥淢achine learning has played a huge role in shaping other industries, from Facebook and Twitter to Target,鈥� said Vergara. 鈥淚t can help fill in the blanks for the cannabis industry as well.鈥�

听

Categories:

葫芦娃视频