Our on-prem to cloud database migration: A collaborative effort
In our last cloud migration post, we outlined the needs and evaluation criteria that drove us to look to cloud database options. Now, we’re going to discuss the first stage of the migration: moving our data into Snowflake so we could take advantage of its many benefits.
Self-service analytics is a delicate balance between enabling users with the data and insights they need to do their work while maintaining effective data governance enterprise-wide. This delicate balance between individual empowerment and centralized control extends to our physical migration of data and Tableau content from one platform to another, as well. Our migration timeline and process framework guided each team so they knew exactly when to join in and transition their data sources from SQL Server to Snowflake. Adhering to this timeline was essential because it was costly to the business, both in infrastructure resources and people hours, to keep SQL Server running in parallel with Snowflake.
We intentionally started with the Netsuite pipeline belonging to our Finance Analytics team—a well-governed, well-defined data domain with clear owners to migrate. Starting there, we knew we would benefit from a strong partnership and robust testing scenarios, and that we could iron out the kinks for the rest of Tableau before we performed our full migration.
A new way of thinking about data management
As we reimagined data management across all of Tableau, we identified five pillars for the migration process framework that dovetailed well with our Snowflake selection criteria, and would thereby increase trust and confidence in the data that everyone uses. These pillars are: staffing, governance, authentication, communication, and documentation.
We’ll first discuss staffing, governance, and authentication in this post, highlighting some key lessons learned, unexpected issues with responses, and recommendations to consider when migrating and tackling data sets—large or small, simple or complex.
We don’t want to sugar-coat the complex undertaking of any migration at enterprise scale. We started by forming a small, core migration team and quickly realized more assistance was needed to update approximately 9,500 workbooks and 1,900 data sources, and address any downstream content effects caused by differences at the database level.
The core team possessed some essential skills we suggest that organizations who make the same journey have: project management; development expertise with Python or a similar scripting language for modifying semistructured data like XML; and Custom SQL savvy. Recruiting talent with the right mix of data and programming skills that we needed was time consuming; we ended up reviewing upwards of 300 resumes and placing dozens of calls. Our central migration team counted at seven people—1.5 full-time program managers for six months, 0.25 server admins, approximately three full-time engineers, and two contractors—supporting upwards of 15-20 domain experts across sales, finance, and marketing.
The extended team—data scientists, engineers, analysts, and subject matter experts who work in business teams and helped move or transform data in Snowflake—were the first line of defense when questions or concerns surfaced from business users. These “stewards” of our data were able to answer questions ranging from data access and permissions, to process and timeline questions.
“We were the bridge between IT and finance business users since many data sources were managed by our team,” explained Dan Liang, formerly Manager of Finance Analytics at Tableau (now Manager, Finance Data Office, Salesforce). IT provided the centralized platform and standardization across the enterprise, but Finance Analytics tailored communication for their end users. “It was all hands on deck for a month as we handled content conversion, testing, and validation of data sources for our team’s migration. Tableau Prep was an integral part of our validation strategy to automate reconcile key measures between Snowflake and SQL Server.”
- Identify and define roles and responsibilities: Without clear roles, there will be confusion about who is responsible for what specific aspects of the process. In our case, we had data stewards and consumers test the data with specific experts designated to sign off on the data.
- Automate (where possible): We could have allocated more time to better automate this process, especially around workbook and data source XML conversion, as well as data testing.
- Know that you’re comparing apples and oranges: We provided summary statistics like row and column counts and data types to our testers to help them compare the two data sets. But because of many factors (differing ETL and refresh times, plus potential latency), it was very difficult to pin down significant differences versus noise.
Our cloud migration was a golden opportunity to strengthen competencies around governance. Everything from Tableau Server content to naming conventions in Snowflake received fresh scrutiny in an effort to improve user experience and ensure scale. Those teams that invested in governance by establishing single sources of truth (through well-curated and certified, published data sources) had a more straightforward content migration experience. Those that didn’t invest as much in governance struggled with unclear ownership and expectations around data, and their users encountered surprise effects downstream during the migration like broken data pipelines and dashboards.
Because we had many different languages used by data engineers over time, we also conducted thoughtful upfront discussion about standardizing code patterns, including outlining which characters were and weren't allowed. Acting on these discussions, “We implemented Continuous Integration and Continuous Deployment (CI/CD) on our source control tool (GIT), so we could more efficiently peer-review code and transfer work between members of the team as needed,” said Isaac Obezo, a software engineer. “This was much easier than having domain experts do everything in a pipe.”
Further strengthening governance, built-in Snowflake features enable transparency into database metadata, including the ability to see and save all queries processed. Since that history is typically only stored for a week, we built a pipeline to store all of the historical data so we could provide more targeted support to end users, create new data curations, and promote our single sources of truth. In finance, we used this data to proactively reach out to users who experienced query timeouts and other errors. It also helped us maintain user access controls around Sarbanes-Oxley (SOX) compliance.
- Use data quality warnings: These can communicate the status of a data source to users quickly and easily so they know when migration will happen and what will change.
- Recognize data management is a marathon—not a sprint: Progress and value deliverables are iterative. We concentrated on delivering smaller, but valuable changes as we migrated to the best model or data. We also benefited from using data to monitor performance of our cloud solution. Below is a sample visualization we built to monitor usage and performance of Snowflake.
- Minimize tech debt: Tableau Catalog gave us visibility into our data, including lineage and impact analysis to identify content owners. We were able to easily communicate to people what data and content could be deprecated and what was critical to move to Snowflake because of its usage or downstream impact. Consider leveraging an enterprise data catalog to help your end-users build knowledge and trust in data assets.
- Establish a clear cutoff: Appropriately budgeting time to complete a cloud migration is key; we took an informal survey and estimated an average of five hours per data source and one to two hours per workbook migration. Eventually, a final cutoff must be established where employees no longer have support from the legacy database or from the central migration team. If someone didn’t migrate their data when given ample time and assistance, they likely no longer needed it.
Changing to a cloud based database required changing the database authentication method employed by users, apps, and connected systems. We went from an all On-Premise world of Active Directory (AD) identity management and automatic Windows authentication through AD for users, apps, and systems to the reality of the cloud where identity management across different apps or systems is not seamless or integrated out of the box. The best option is a federated Identity Provider (IdP) with Single-Sign On (SSO) capabilities across different cloud vendors and apps. If you are planning on having multiple cloud based apps, or want users to have a SSO experience, selecting the IdP that works best for you should be done before or in conjunction with your Snowflake adoption.
Initially, we connected directly to Snowflake with SAML via our IdP. This works fine, but has pain points, especially coming from the automated world of Active Directory, namely: SAML IdP password changes will require manual embedded password changes in all Tableau content using embedded credentials. In the time between a user changing their password and updating connections in Tableau, any extract refreshes using them would fail and workbooks using their embedded password would not render. The only way to have a seamless password change experience with nothing breaking in Tableau was to switch to OAuth use. Be sure to check if your IdP can be used with OAuth for Snowflake!
One important lesson learned here was the power of a Tableau email alert. We worked with IT to automate an email that assists users in the password rotation. One month out from a required password rotation, users receive an email from their Tableau Server Admins reminding them that their password needed to be updated, as well as a link to just that content on Tableau Server.
- Be prepared to communicate and document changes: When changing authentication types you can expect to receive and have to answer many questions about how it works and how it differs, keeping in mind users’ different degrees of technical understanding.
- Strategically manage your storage driver: When you’re conducting an enterprise deployment to a platform like Snowflake, it’s important to push out the driver for everyone’s machines to maintain version control and updates.
Supporting end-users and content migration
Beyond staffing, governance, and authentication, communication and documentation were equally important to guarantee everyone was aligned throughout all phases of our migration to Snowflake. In our next blog of the series, we will explore those critical pillars to enable a better end-user experience and transition so no critical workbooks were left behind.
We also hope that sharing some of the individual experiences of our business teams helps other organizations and our customers better understand what it takes for an enterprise migration. Centralized coordination is mandatory, but business teams and their end-users must be equal partners, contributing from beginning to end.
"We knew we needed a landing place for our data, but didn’t realize how valuable it would be as a platform for collaboration because it was simple and brought everyone, including components across the business, feeding into the same thing,” concluded Sara Sparks, Senior Data Scientist at Tableau. Tableau is now in-tune as our people and data sources are more unified.