It’s been a year since Google announced some new exciting features for G Suite Enterprise, including a small gem for companies that want to get the most out of their data: Gmail logs integration in BigQuery. Here’s what Google wrote on February 2017:
Today, we are making it easier to analyze Gmail logs with a preconfigured BigQuery integration so administrators can run sophisticated, high-performing custom queries, analyze their data and build custom dashboards.
That sounds great! At Email Meter, we’re always looking into better ways to bring you new powerful statistics, so this is very interesting! We’ve been doing tests for a while. Let’s see:
[warning: tech stuff ahead]
How it works:
- Gmail logs are stored in realtime into a preconfigured BigQuery integration with the following schema. (NOTE: logging starts after you enable the feature in the G Suite Enterprise admin panel, read more here)
- BigQuery organizes the data in daily tables.
- You can run queries directly on the BigQuery console or write your own app. There are some libraries out there for different languages.
- Queries should be made using SQL Standard, although Google uses their own SQL (“Legacy”) in their examples.
Aaaaand, that’s all… In fact, this is all the info you can find on the internet about this matter. For real. No one seems to have published anything else about this, or shared any code, apart from these two posts in this blog (thanks Mannetje!) – Link + Link
Querying – getting real stats!
We ran a query over the whole domain, for received emails. Each action that occurs when Gmail is receiving an email logs a single entry in BigQuery, so we must filter a little bit.
/* received emails */ (message_info.destination.service == 'gmail-ui' AND message_info.source.service == 'smtp-inbound') /* we only need the final entry + not spam */ AND message_info.action_type == 3 AND message_info.is_spam == FALSE
Wildcards are permitted here so we’re checking all the daily tables for the last 7 days.
Voilá! We got 4713 emails, each one with extensive data. We turned some of this data into a nice email report for one single day, and used some processing to throw reply time right into the mix. Also, we generated the report for the whole domain and also for one user. Useful, isn’t it?
In our tests, minimum response time was above 1s, reaching peaks of 5-6s or higher. This is due to Google’s BigQuery inner processes (it does scale with you)
That might not seem impressive at first, but think of a pretty big company (+2000 employees) and a year’s worth of email. That’s a lot of mail. Well, it still should take a few seconds for most queries! Now we’re talking. Google BigQuery is scalable, flexible and fast even for GBs of data.
Throughout our research we discovered some stuff that wasn’t well documented in the schema, and we also found some data limitations (for example, thread IDs are not logged).
There’s also a lot of data available in the logs about deliverability, server spam rules, and email behaviour.
Conclusions on integrating Gmail logs into BigQuery
Google released a year ago a feature for G Suite Enterprise that allows administrators to automatically log Gmail logs to BigQuery.
It seems that some bigger companies are already utilizing this feature, and BigQuery is already an established product, so the future for this product is pretty exciting.
Good news: if your company is using G Suite Enterprise already, or is thinking about upgrading, you can leverage Big Data techniques to analyze and get stats for your email usage data using a cloud-powered database. It’s powerful, fast and cheap.
Not so good news: there’s very little publicly available info about how to use this feature and the Google docs are incomplete. Intensive research and development is needed.
We’d love to hear your feedback and experiences. Please add your comments below or get in touch (email@example.com).