Tech and T-Accounts

Tableau Server - download for local backup

WINNER of Tableau 2019 DataDev Virtual Hack-a-thon!

Maybe I'm paranoid, or just cautious.  In any case, I often want to download all the workbooks across all the sites I have access along with username, group and permission lists.  This would be a daunting task were it not for the Tableau REST API and Microsoft PowerShell.  As described in other earlier posts, I prefer to use PowerShell because it comes installed with Microsoft operating systems, and until recently, running Tableau desktop or server was predominately performed in a Microsoft Windows environment.  Also, as I work predominately with PC's supplied by my employers, those PC's will have PowerShell installed and may have issues with my installing programming languages like Perl, Python, etc.  I find that using tools that already exist is the quicker path for my productivity.

I've tested this script against both installed Tableau servers and Tableau Online -- both work, but there can be some hidden 'gotchas' lurking in your PC configuration.  The first has to do with TLS protocols -- all three protocols (TLS, TLS1.1, TLS1.2) need to be enabled in order for the PowerShell methods to work, AND, should you wish to run the script from a saved file, then the PC's ExecutionPolicy must be set to enable that to happen.  Both of these changes can be done via the PowerShell ISE, if it has been opened with the 'Run as Administrator' option (obviously, the person executing this option must have signed on to the PC with credentials that are part of the administrators security group).

Check the current execution policy and save to a variable:   $original_policy = Get-ExecutionPolicy
Set the policy for local scripts and signed internet scripts to run:   Set-ExecutionPolicy remotesigned

Should you want to revert to the saved policy after running scripts, simply manually run in the same script tab as the above were run: Set-ExecutionPolicy $original_policy

The next step is to set up a file location on your local drive to save the downloaded workbooks, csv and XML files along with a text file that saves the list of downloaded items and time taken.

Here's an overview of what the script does:

  1. Logs into the server and sets the api version appropriate to the server.  The REST api version used in the commands must align with the REST api resident on the server being called.  Since the api version can change with each upgrade, this script pulls the proper api version number from the server.
  2. Logs into the server with your credentials -- in the script these credentials are simply typed in as clear text, which when the script is saved on a network drive may violate corporate IT security policies.  There are ways to avoid saving/using credentials as readable text, but that is covered in another blog post (here).
  3. The 'sites' section is forbidden on Tableau Online, so that section is commented out, but provided for those that run their own Tableau server.  To enable that section of code, remove the less-than symbol on line 71 '<# get list of sites that login identity has access to' and the greater-than sign found on line 106 '#>'.
  4. Obtains list of workbooks that a person has permission to access (Tableau permissions are in place here).  A tabcmd approach is also noted, but not used as the REST api is able to grab the list directly to a variable used by PowerShell.
    1. obtain workbook and project. 
    2. Based on the project designation, create a sub-directory to save the files (project names with a / in them will be have their name split into different sub-directories as the / character is used to determine the creation of sub-directories and the save path for the project's files). Sub-projects will not appear as sub-directories of their parent project. 
    3. Permissions for the project's workbooks and datasources are saved to XML files.
    4. The workbook twb/twbx is retrieved and saved (script determines which is being downloaded and saves appropriately).  Permissions for the workbook is saved as an XML file as are a list of revisions.
    5. A similar process is followed to retreive datasource files and save them as tds/tdsx along with an XML of the permissions and revisions.
    6. Retrieve a list of subscriptions for the site, save as XML.
    7. Retrieve a list of projects for the site, save as XML.
    8. Retrieve a list of views save as XML (used to determine which sheets/dashboards were published as viewable).
    9. Get a list of Users, Groups and Groups' users, save as XML.
    10. Save a text file of downloaded items and time to execute the script.

Data Scraping for a Beach Vacation

With the proliferation of different ways to rent a beach house, finding he ideal property for a week at the beach has become an exhaustive exercise in researching multiple local beach rental properties, then checking on various owner managed efforts like VRBO and Airbnb. The filter conditions for each site vary such that searching for properties that are in a particular area (a map based search) sometimes isn’t available. Same is true for using the site to check on desirable options be it a pool, beach proximity, bedding, grill, etc.

The objective is to scrap the data from the various websites using free tools, consolidate those results into a spreadsheet or csv file, then use Tableau for filtering geographically as well as for amenities, price, location, etc.


  • Emerald Isle, NC
  • Week of June 13 – June 20 (Saturday checkin/checkout)
  • 4 bedrooms with sleeping for 3 under age 6 and 3 sets of adults

Websites to consider:

  • Airbnb,, VRBO (all these seem to already be consolidated by
  • (which seems to consolidate Airbnb,, VRBO, HomeAway,, TripAdvisor, e-domizil)

WEB Scrapers

The free webscrapers seem to be straightforward to configure (and have similar interfaces), so learning one scraper enables some carryover experience to use other scrapers. A different scraper was chosen when the first scraper chosen didn’t work for the next site. Wishing that one scraper would work on every site seems to be unattainable, but then again, I also wasn’t going to spend considerable time trying to figure out the peculiarities why a chosen scraper didn’t work (although I did spend some time going through tutorials to get comfortable that the approaches I was taking should have worked). I wanted to avoid writing code, so the scrapers I considered had to be suitable for non-programmers.

Summary of Scraping efforts (once a site had useful results, other approaches not attempted), so lots of blanks in table: (have to use Chrome, Firefox or Safari) (chrome extension)
Google Sheets (using IMPORTHTML or IMPORTXML functions)
Colab (Python in cloud)

Consolidating data

The data from the various scrapers, while similar, had column variations – some included extra columns of URL links, some placed all the amenities in one cell, data and the element name was sometimes included in the same cell, etc. Getting the data into a consistent representation was done either in Excel or using Tableau Prep.

Adding missing Geocode data

The data from the websites had street address names and while the websites had map representations of the searched data, those maps didn’t expose the latitude and longitude used for each property.

Data Fields (as received)

The data model I've settled on requires splitting the data such that one table has a single row for each property, and for those fields which have multiple items (like amenities) to have those as a table with two columns – the property and the item. This setup will enable showcasing the new Tableau data model, forgoing the need for many LOD calculations.


SunSurf – Parsehub parsed data well, used Texas A&M to geocode. But, after reviewing results, had to re-parse list using for missing data. turned out to be the better option for this site.

BlueWater – Since worked well for SunSurf, I started there, but it defied getting more than 7 rows of data through parsing (it would get stuck on the opening site, then when AJAX loaded the list of properties, my selection process or it as a tool got lost). Bluewater exposes a list of Property Names with URL to those properties, so if I could just get that list exposed via a URL, then I could use another tool to extract the data from it. So, I created my own version of the exposed list using F12 developer tools to copy html that had list of property names. Then loaded that html to a website so Parsehub could grab the list of property names and their associated URLs to an Excel sheet. When clicking on a listed name, the URL in the browser changed to a URL that included the property name. I then used that to model building the list of URLs for all the properties. This URL list was then used in Octoparse which permits loading up to a series of 10,000 webURLs to scrape. However, Octoparse free is throttled, so the 555 lines took 5 hours and 17 min to parse. The result, however provided me with street addresses, again, not quite clean as there was a carriage return (no space) before the city name making parsing out the address elements requiring using the substitute function in Excel to get the street address, city, state on one line, ready for further splitting needed to send to Geocoding. Once parsed, there were many rows that were duplicate addresses as would be expected for condo complexes. The duplicates were removed to get a unique list of addresses to submit to Texas A&M Geocoding, which in a few minutes geocoded the 385 records.

Emerald Isle Realty – Since Octoparse worked well for SunSurf, figured I’d start with that one. Caution -- I didn’t spend a lot of time trying to figure out whether a tool would work – if it didn’t work with limited instruction, then I just moved to the next tool. On this site, the street address was not a unique element on the page, but was sometimes buried in the description., so I used Excel to split out street address. Some descriptions lacked an address in the description, but did expose a mapped location on the website. While the site had a ‘mapped’ location for a property, the actual latitude/longitude pair used to place the marker on the map was not found for scraping. So, for these properties, I manually opened the property URL to compare to a second browser opened to google maps. By visually matching Google Maps to property website map, I then used Google Maps to obtain street address and Lat/Long coordinates. This process got quite tiresome, but fortunately there weren’t that many properties needing to fill in the missing address.

BlueWater Footnote – after scraping Emerald Isle Realty data, I noticed I didn’t get the image URL in my first attempt. worked OK for Emerald Isle properties and was the easiest to set up (it figures out what you are likely wanting and build the scraping model for you, which you can then adjust to add / remove columns). When I used this the second time, I figured out that to ‘train’ the process, choosing similar URL’s is absolutely necessary for it to be able to scrape properly. Also, their auto process picked up a bunch of URL’s very quickly and then presented a checkbox list of which group of URL’s were to be included in the scraping loop. There was a choice to pick up only the rental-property related URL’s, which when I selected only that subset, the training went exceptionally well, as were the results. Downside of – only 1000 webpages can be scraped per login per month, so after running it once incorrectly against 600+ sites (I had two groups selected, so built a model based on one webpage that didn’t work for almost all of the pages – and thus, that was all I could do this month using that login identity).

Final Cleanup

Whew! Got the data downloaded, which even though was hoping that the data was similarly structured, it wasn’t between sites. So, the next step was to decide on a data table structure (data model) and load the scraped data into that model. This meant not including some data which was unique as the other sites didn’t have that same info (like customer or agency ratings and customer reviews). Some sites listed a few amenities into a scraping area, which was nice, except that they listed more amenities in their long description of the property. How amenities were identified meant that the data had to be conformed – that is ping-pong table and ping pong table would need to be adjusted to read one way or the other. Amenities by property are listed in their own tab. The idea is that some amenities may be considered a requirement (like pets allowed), while others are a nice to have. Those amenities which commonly are considered a requirement have a separate flag counter field in the RentalProperties tab. These are pets, pool, grill, game room, elevator, pier/dock, community pool, hot tub, boats allowed.

Prices – Some sites had prices for various rental options – weekly, two week or month rates, all different for different rental ranges. Others had just weekly rates for different ranges. I was only interested in a particular week, so that week’s rate is included as a column in the main RentalProperties tab. Even then, some properties scraping didn’t pick up a price, so a manual verification of those properties revealed three conditions – the property no longer was listed by the rental agency (it was listed when the list of property scrape occurred, but no longer existed a few weeks later when the price verification was being done). Second, the property is listed, but prices aren’t – instead ‘Advance Hold’ is listed where price was expected. I figure these are properties that had listed in the past, but haven’t signed the contract for this year yet. Third, the property is listed, but there is a note that no weekly rentals are available. This condition likely occurs when the property has been rented by the month, or because the contract hasn’t been signed for the period. All of these non-priced properties are included, but no pricing is available.

After sending the data for geocoding, there were many properties with the same street address (condominiums, for instance). I wanted Tableau to map those properties as separate map-points so when hovering, a person could get individual property information in a tooltip rather than a * indicating multiple responses.  So, in the data, I modified the Lat / long by 5/1000000 in order to nudge the mark enough to make it separately identifiable.

Recap - Recommendations - Results shared

I started preferring what I thought would be easiest - using the extension to Google Chrome.  While this was useful, it also revealed that some sites resist using this type of scraper -- i needs to build a list of URL's on the fly from the webscraping activity.  If that can't be accomplished, another tool / approach needs to be used.  Parsehub and Octoparse are similar, but not the same.  The interface on Parsehub took some getting used to, and quite frankly, I found myself backing out what I clicked to try again.  When you figure out it's very distinct clicking pattern to identify page elements to scrape, then this tool worked well.  It too, however, wanted to build the list of URLs to scrape itself, so when you have a long list of URL's already known, then Octoparse was the preferred tool.  Octoparse allowed for a long list of URLs to be copy/pasted into the tool, then used one of those URL's for you to set up the elements to scrape.  I also tried Google Sheets, but this seemed no to fit my particular need (again, I didn't spend much time there, only enough time to figure out it didn't get what I wanted with what I knew about the website).  You'll likely notice that Airbnb, Vrbo, etc. type sites aren't part of the results.  It isn't that we forgot about them, rather our limited knowledge of how these tools work and those sites extensive use of Ajax foiled our initial attempts and I ran out of patience/time.  I even sent the request to another person who applied Python to the HometoGo website, only to come away equally frustrated.  I'll likely look into getting this data into the dataset eventually, but not in the first exposure to the results.

I shared the results with the Piedmont Triad Tableau User group on February 6th 2020, showing them this PowerPoint presentation (click here to download).  The workbook has been loaded to Tableau Public for consumption, and looks like this (click on image to go to active workbook):

Next Steps:

My son wants to know if renting early gets either a better price or a better selection.  I'll check again in a few months to compare the results (which ones are available and what the prices are at the time).  Also, I'd like to get the missing VRBO, Airbnb, etc. properties into the data.


SQL ETL Flip-Flop to deliver no data outage experience

I know that we were all taught to delete then replace as the standard practice for ETL. I personally don’t prefer that approach for these reasons:

  1. When an ETL goes foul, the prior table’s data was truncated or the table dropped. Reverting is possible if a backup exists, but it is a painful process most often avoided.
  2. The drop/truncate then select/insert into approach typically takes many minutes to perform. During that time, queries will return incomplete information (either none at all, or just the rows that have been ETL’d so far when query ran).
  3. During the ETL, while the end table is in flux, end users are without good data.

The approach I prefer involves swapping a table from one schema to another (I refer to it as a flip-flop) – this swap process takes under a second, so the benefits are:

  1. The end user can always rely on the table data – it is always a complete table from the ETL.
  2. IT can easily revert to the prior table if the ETL results are found to be wanting for whatever reason. The end user effect is that they can continue using the older data while the ETL result is reviewed/repaired.

To set up the schema swap process, I establish two additional schema – new and old (but call them what you want). The new schema is where the table from the ETL is built and indexes added. The old schema is used to move the current dbo (or whatever schema it resides in) table so simple reversion can occur if needed.

Steps (entire flip-flop process takes under a second, no matter what the size of the table):

  1. Either create table or select into ETL target table in the 'new' schema. Create indexes against as needed. These steps are basically the same as what is typically done in an ETL process, only done against a target table that isn't the currently active data.
  2. Prepare for what I refer to as the 'flip-flop' process.
    a. Get rid of the table if it exists in the 'old' schema
         if exists (select * from INFORMATION_SCHEMA.TABLES where TABLE_SCHEMA = 'old' and TABLE_NAME = 'ETL Target Table') Drop Table [old].[ETL Target Table];
    b. Place the current table into the 'old' schema
         Alter Schema old TRANSFER [dbo].[ETL Target Table];
    c. Move the 'new' ETL Target Table into the desired schema
         if exists (select * from INFORMATION_SCHEMA.TABLES where TABLE_SCHEMA = 'new' and TABLE_NAME = 'ETL Target Table') Alter Schema dbo TRANSFER [new].[ETL Target Table];
Page 1 of 7 1 2 3 4 5 6 7 > >>
Copyright© Brad Earle
Site Map | Printable View | © 2008 - 2022 KB Earle Associates LLC | |