Analytics anyone can use.
Data prep anyone can use.
Analytics for organizations.
Cloud analytics for organizations.
At Tableau, we believe in enabling people of all skill levels to gain insight from their data. Our platform is built on an intuitive interface that empowers anyone to ask and answer questions of their data, regardless of volume or complexity. As advanced and smart analytics are increasingly critical components of business intelligence, business users require tools that simplify their workflows while empowering them to ask deeper questions. When Google Cloud informed us that they were releasing a new way to leverage machine learning technologies, I immediately wanted to know how Tableau could connect.
Machine learning has been a hot topic in analytics for some time. By definition, machine learning is a method of data analysis that uses statistical techniques to give computers the ability to “learn,” without being explicitly programmed. Those who work with machine learning know that it’s a powerful method, but that it can also be shrouded in confusion due to its interpreted complexity. At its core, however, there are many aspects to machine learning that are applicable to anyone that works with data.
Google Cloud just announced a new machine learning feature of BigQuery called BigQuery ML. BigQuery ML allows users to use embedded machine learning technology to train models based on data stored in BigQuery. This essentially means that with a little understanding of SQL and basic machine learning practices you can create and use your own machine learning models. However, just like working with any other data, directly querying a database isn’t always the ideal method to explore the output of your model.
This is where Tableau comes in. With it’s BigQuery connector, Tableau allows you to easily manipulate the results of your predictive models in a way that facilitates an intuitive understanding of the data. By minimizing the amount of SQL you have to write, and leveraging BigQuery’s astoundingly simple syntax, Tableau is simplifying machine learning for the hundreds of thousands of workers who could benefit from it, but haven’t had the time to learn how. Additionally, Tableau allows users an easy way to share their model and its results with others so that they can benefit from all your work.
Anyone who has built machine learning models knows that whatever you can do to simplify the practice will be a huge step toward bringing machine learning into the masses. This fits naturally with Tableau’s mission to help people see and understand data. Google Cloud and Tableau are working together to enable you to access cutting edge technologies using easy-to-understand techniques.
Below, we’ve written a guide on how to use Google BigQuery ML to create a map of predicted housing prices for all the zip codes in the Seattle area. There are online services, like Zillow, that will make this type of prediction for you, but this process is repeatable for any problem you have. This is especially useful if a company doesn’t already exist that solves the problem for you.
Imagine you're a real estate agent and a potential buyer approaches you with various requirements for a home. We’ll assume you've got a dataset in Google BigQuery and you know that a machine learning model could help your reporting. You might want to take existing housing price data and build a model to predict home values in a particular area given a set of features (or predictors). Using Google BigQuery ML, you can now build predictive models and visualize them directly in Tableau for an interactive experience.
In this example, there’s already housing data loaded into BigQuery under a project called king_county_housing. The dataset contains a list of houses that were sold, the price at which they were sold, and some useful features of the house like the number of bedrooms, bathrooms, etc. I extracted this dataset from Kaggle and imported it in to BigQuery.
Determine the fields you want to train your model on, and the variable you're trying to predict.
In this case, I want to predict housing price based on five major features of the house that clients are likely to set as requirements—bedrooms, bathrooms, square feet of living space, whether or not the house is near the water, and the location of the house.
Note that the more features I'd like the model to consider before determining a price, the more features I'll need to tell the model to consider every time I want a prediction. Since I don't want clients to have to fill out a long questionnaire, I want to keep my model relatively simple. This means that while my predictions may not be as accurate, I'll have an easier interface to interact with.
Create a new model using the special syntax:
Note here that I'm casting zipcode as a text field rather than leaving it as an integer. This is because the model will automatically treat any integer as a numerical value rather than a categorical value. If you leave zipcode as a numerical value, the model will treat zipcode as if it were something like square feet, where the larger the zipcode the more expensive the house. Since we know zip codes are not directly correlated with housing price, we need to cast the value to a string, where each zip code will be treated like a label and will get it's own coefficient associated with it.
The special syntax is composed of 4 parts:
Note that I added in (OR REPLACE) because it took me a couple tries to get this model right, and it's easier to simply update the model from an existing one using the same query.
Select Google BigQuery from the left hand navigation under to a Server and enter your credentials.
Select the relevant following information.
Open the New Custom SQL box.
Query the newly created table using the key phrase
Here you just enter a query that creates a table that has all the values that are in the prediction table. If you don't know the value, enter NULL.
Because we're going to be asking the user for input data, create the following parameters and set reasonable default values for them:
The query that I'm going to use selects all of the zip code values from the table that we used for training and simulates test data rows by capturing user inputs via parameters. This will be useful later when we want to create a map of all the price predictions for every zip code.
If you want to check yourself, try running your query in the BigQuery UI with dummy data in place of the parameters.
Create a map:
Display the price by zip code:
Make the colors easy to interpret:
Make the text size change based on value:
While custom SQL is necessary in this exercise for invoking a machine learning algorithm in Google BigQuery ML, there are performance considerations to be aware of when using custom SQL in lieu of Tableau’s native connections in other scenarios. When possible, leverage Tableau's native data source connections for optimized performance. Read more here.
Acknowledgments: Special thanks to Abhishek Kashyap (Google), Vaidy Krishnan (Tableau) and Riley Maris (Tableau) for their guidance.