On May 27, 2026, a team of researchers published Epicure: Food Ingredient Embeddings, a paper describing a language model trained on 4.1 million recipes across seven languages: English, Chinese, Russian, Vietnamese, Spanish, Turkish, and Indonesian. The model is called Epicure and it works by treating ingredients as tokens, recipes as sentences, and the entire corpus as a document. It learns ingredient representations the way word2vec learns word representations: by predicting what appears near what.
The contribution that distinguishes Epicure from prior work is the integration of chemical compound data. The authors built a graph that combines two types of information: recipe co-occurrence edges (which ingredients appear together in which recipes) and compound edges from FlavorDB (which ingredients share which flavor molecules). They then trained their embeddings on random walks over this combined graph. The result is an ingredient representation that encodes both what people cook together and what is chemically related, whether or not cooks are aware of the relationship.
This guy was inspired by the above research and used the data to look for ingredients that are similar chemically, which would imply they're found in recipes together, but aren't found in recipes together. At least, they aren't found in the 294,000 recipes available in hugging face data.
Oh and I found this visualization of the Epicure data: https://epicure-data.kaikaku.ai/
The 25 pairings:
- Soy sauce + vanilla
- Tomato + vanilla
- Vanilla + salmon
- Apricot + miso
- Cocoa + mushroom
- Cocoa + cauliflower
- Coffee + garlic
- Anchovy + chocolate
- White chocolate + caviar
- Grapefruit + cumin
- Grapefruit + coriander
- Cardamom + mango
- Dill + mango
- Black pepper + strawberry
- Coconut + beef
- Raspberry + lamb
- Tarragon + peach
- Star anise + peach
- Banana + parsley
- Banana + nutmeg
- Cucumber + elderflower
- Fennel + strawberry
- Passion fruit + garlic
- Egg + passion fruit
- Fish sauce + butter