Optimizing the London Restaurant Industry with NLP and Big Data Insights from Google Maps Reviews


In my dissertation project, I utilize the Google Maps API, Natural Language Processing and Big Data Techiques to analyze restaurant reviews in London. By examining both quantitative and qualitative data, I uncover valuable insights for economic observers, restaurant owners, and customers alike. This analysis offers a procedural framework for deriving meaningful conclusions from Google Maps reviews, benefiting various stakeholders in the restaurant industry. 


Disclaimer: The purpose of this project is to focus on academic pratice, not for economic or trading purposes. The information provided is approximate and not 100% accurate. Because the data used in this project was collected in June 2023,


To view coding of this project, please follow this link


Note: Please, click on the below images to view them with better clarity.


I. OBJECTIVES: 

- To analyze restaurant distribution patterns in London, which contribute to a deeper understanding of the city's culinary geography

- To identify key factors driving positive and negative reviews, which can uncover recurring themes and aspects that significantly shape customer perceptions and influence their ratings.

- To create a categorization algorithm to classify restaurant dimensions: food, service, ambiance, and price. This enables owners to identify strengths and areas for improvement, aiding strategic decisions and enhancing customer experiences.


II. PROCEDURE:





2.1 Data Collection:



- Obtain and integrate a Google Maps API key to authenticate requests. 

- Read 3 lists of queries from a text file for flexibility (one mimicking user behavior with diverse keywords, another using specific region names, and a third using postal codes for comprehensive coverage). Click the link to see querries. 

- Make iterative API requests to retrieve restaurant details, including: Restaurant Name, Address, Average Rating, Number of Ratings, User, Review, Review Rating, and Time 

- Implement error handling to ensure the script could continue processing queries despite any exceptions. 

- Store the retrieve data in a Pandas DataFrame and export it to an Excel file for easy access.

Click here to view coding for data collection


2.2 Data Pre-processing

- Combine 3 datasets retrieved from Google Map API (view coding)

- Remove duplicate reviews (view coding)

- Extract postal codes from variable “Address” and convert to region names (view coding, the third cell)

- Generate Unique IDs were for restaurants and users (view coding, the fourth cell)

(view dataset after pre-processing)

- Split dataset into quantitative and qualitative subsets for analysis 


2.3. Quantitative Analysis: 

- Analyze the distribution of restaurants across different regions in London

- Calculate the average ratings for each region

- Investigate the correlation between review length and rating

- Identifie the most popular restaurants based on the number of ratings

- Analyse distribution of numbers of reviews and average ratings

(view coding for quantitative analysis part)


2.4. Qualitative Analysis: 

- Split long reviews into single sentences by using natural language processing. 

- Perform sentiment analysis on each single sentences by using the VADER tool. 

- Generate WordClouds to visualize the most frequently used words in positive and negative sentences. 

(view coding for qualitative analysis)


2.5. Categorization Algorithm: 

- A custom algorithm was developed to categorize reviews into dimensions like food quality, service, ambiance, and price. 

- This categorization process involved text pre-processing, keyword extraction, category definition, and category assignment. 

- The categorized data was analyzed, and category-specific insights were generated for individual restaurants. 

Click here to view coding


III. FINDINGS





3.1. Qualitative Analysis: 

3.1.1. Spatial Distribution of Restaurants: 
The analysis revealed significant variations in the number of restaurants across different regions in London.  South East had the highest restaurant density with 2,193 restaurants, followed by South West and East. The West, North, and Central regions had similar restaurant densities around 800 establishments.  North West had the lowest restaurant density with 691 restaurants.

3.1.2. Regional Average Ratings: 

The graph shows the average ratings by region for a product or service. The North region has the highest average rating of 4.34, while the North East region has the lowest at 4.18. The ratings are relatively consistent across regions, with a difference of only 0.16 between the highest and lowest.





3.1.3 Correlation between Review Length and Rating:

 A negative correlation (-0.187) was found between review length and review rating. Longer reviews tended to be associated with lower ratings, while shorter reviews were linked to higher ratings. This finding implies that customers who write lengthier reviews may do so to express dissatisfaction or to provide more extensive feedback on negative experiences. Conversely, customers who leave shorter reviews may be more inclined to rate their experiences positively or have less to critique.


3.1.4 Most Popular Restaurants: 

The top-rated restaurants tend to have a substantial volume of ratings and maintain high average scores. A majority of these establishments, determined by the number of reviews, are predominantly situated in the Central region.


3.1.5 Restaurant Ratings and Reviews Distribution Insights


The data shows a predominantly positive rating distribution, with an average star rating of 4.24 out of 5. The majority of ratings fall within the 3 to 5-star range. In the analysis of customer engagement and feedback, a clear pattern emerges when considering a restaurant's average number of ratings alongside its overall rating score. High-rated restaurants tend to have a significant number of reviews, indicating positive experiences. Conversely, lower-rated ones struggle to attract customers and receive fewer reviews, often signaling issues with food or service. 


Is there any restaurant that boasts a high number of reviews despite maintaining a low average rating? Remarkably, 22 branches of popular fast food restaurant chains, maintain over 300 reviews each, despite average ratings below 3. This phenomenon highlights their strong branding and widespread popularity, driving customer loyalty despite lower scores. Further investigation is warranted to explore how these branches can leverage their loyal customer base and strong branding to enhance ratings and overall dining experiences





3.2. Qualitative Analysis: 

3.2.1 Sentiment Analysis
Following sentence segmentation, sentiment analysis was conducted for each sentence. This analysis categorized sentences into positive, neutral, or negative sentiments based on the language and expressions used by customers. The objective was to gain a deeper understanding of the specific aspects of the dining experience that customers felt strongly about, whether positively or negatively.



3.2.2 Word Cloud Visualization
In the word cloud generated for sentences with positive sentiments, the term "Food" emerged as the most prominent. This observation underscores the significance of the culinary aspect of the dining experience. Customers who expressed positive sentiments tended to focus their feedback on the quality, taste, and overall satisfaction with the food offerings




Conversely, in the word cloud representing sentences with negative sentiments, the focus shifted to the term "Service." This finding highlights that negative sentiments were often associated with service-related aspects of the dining experience. Customers who expressed dissatisfaction tended to articulate concerns related to the quality of service, responsiveness, or staff behaviours.


Implications and Discussion: 

These findings provide valuable insights for restaurant owners and managers. To enhance customer satisfaction, it is essential to prioritize both food quality and service excellence. 



3.3 Categorization Algorithm

In real-life implementation, the algorithm scans restaurant review text data, identifying sentences containing keywords related to food, service, ambience, and price, then categorizing and storing them in separate columns within a DataFrame, which can be exported to excel file.

This process yields several benefits: it enables insight generation by allowing owners to extract targeted feedback, enhances efficiency by automating data sorting, and provides decision support for strategic adjustments. 

However, the algorithm has weaknesses, such as its dependence on keyword accuracy, sensitivity to context, and potential for sentence fragmentation. 

In the future work, I am continueing to implement advanced NLP techniques in this categorization algorithm to enhance its accuracy further. Improvement solutions include optimizing keywords, integrating contextual analysis through machine learning models, incorporating additional data sources for comprehensive feedback, and implementing advanced NLP techniques like named entity recognition and topic modeling












 

Comments

Popular posts from this blog

Unlock Powerful Survey Insights with Automated Analysis