Search for a beer using the search bar below. Start tying to return matches in realtime - click on the desired entry and 10 beers will be recommended to you based on the results of a K-Means clustering algorith.
This web app is the result of the final project for the data analytics course. Initally, a dataset from Kaggle
(Beer Information - Tasting Profiles)
was used for the model; however, after initial EDA (including a preliminary K-Means clustering algorithm),
we determined that additional information could help with model
accuracy. The Kaggle dataset initally contained approximately 5,500 beers scraped from
BeerAdvocate. The scraped entries contained counts of key words from 25
reviews that were grouped into various 11 taste profiles (e.g., the "fruity" taste profile contained key words such as
"berries, "fruit", "juice", and "tropical") to provide scrores for each beer. The flavor pofiles are: fruity,
hoppy, spices, malty, bitter, sweet, sour, salty, astringent, body, and alcoholic.
After communication with the Kaggle dataset uploader, they provided us (and uploaded it to the Kaggle page linked above)
with the key words used to calculate each profile. We then created a web scraping script to scrape data from the
top ranking beers from each substyle
(e.g., "Stouts" contain the substyles "American" and "Irish Dry") that contained at least
75 reviews. We determined that more reviews would help differentiate the flavor profiles better for each beer and reduce
clustering overlap.
After obtaining the newly scraped data, we re-ran the K-Means clustering algorithm and received better silhouette scores
for each cluster, along recommended beers that felt more similar to the input beer. The dataset was testing using both
the min/max scaler and standard scaler - both returned similar silhouette scores; however, the min/max returned a
slightly higher score, so min/max was chosen for the final model. The data were then clustered into 3 main clusters (classes), then
the main classes were then clustered into subclusters (subclasses), ranging from with each of the main clusters containing 7, 8, and 2
subclasses, respectively. Figure 1 (interactive) displays the number of beers in each class and subclass. Figure 2 (also interactive)
presents the distribution of beer ratings.