Posts in Category: Tableau

Tableau Server: Extracts, Data Connections, Data Sources

A post on the Tableau Linked In forum got me thinking about how to explain about Tableau Server (version 6-8) Data Sources.  A Data Source published to Tableau Server is a convenient method to share the data connection instructions (meta-data) including dimensions, measures, sets and any calculated fields.  Tableau covers some of the issues related to Data Sources in its knowledge base article found here.

But, the differences are quite perplexing and subtle, and which combination of tactics is used depends on the speed of the actual data repository connection and the reason for using Tableau server access rights to expose a data source rather than a file based solution (save a .TDS file to a network drive).  Below is an explanation of the terms as well as insights into considerations of the various elements.

Extracts - A copy of the data is placed in a proprietary database engine.  Use extracts when the direct connect to the data is cumbersomely slow, the data is a snapshot, or the features of the underlying data source lack functionality (like the MS Jet engine)

Data Connection -- When authoring a workbook, data connections are created to obtain data from a specific data repository.  Repetitive data connections can be made to the same data repository (a global filter applies to a specific data connection.  A use case for repetitive connections is when there are some worksheets created that use a global filter and others in the same workbook that shouldn't have the global filter applied).  A data connection is a SQL statement that is comprises of either a single table, multiple tables with join conditions or custom SQL.

Data Source -- This is a Tableau term for the xml instructions that can be saved either to a file or to the server.  The actual data is not contained in these files or loaded to the server. Data Sources can include instructions about how and where an extract can be found. A data source only includes the metadata that tells Tableau how to create the data connection and what formulas and native data elements exist in that data connection. It does not include formatting.

You can make it seem that the data is on the tableau server. When an extract is saved to a network location (use the UNC filename, not mapped drives) that the server and user can also access, then when a user selects 'Import Data Source' from the server, the data, dimensions, measures and sets appear in the workbook because the user has access to the data extract that is defined in the data source, not because the data resides on Tableau Server.

Version 6 permits saving extract files to a network drive, while prior versions required extracts to reside on a local drive.  Extracts on a network drive can impact performance.

Publishing Data Sources to a server is a method to share a common set of formulas, etc. with other server users (the publisher chooses who to give permissions).  If the connection requires database permissions, then those credentials will have to be entered by the user to access the actual data.  However, come caveats: 
1. Read only vs. update permissions to the extract located on a network drive have to be managed by network drive access permissions. 
2. If the network extract file is open by any process, then the extract cannot be updated.  A network drive extract file can have several workbooks read the file at the same time.
3. A workbook with update file rights can add dimensions, measures, sets to the data source and re-generate the extract.  The new dimensions aren't shared, however, unless the data source is re-published to the server.  When other workbooks open, they see a message "A newer version of the published data source is available.  Would you like to use the latest version?"  Answering no means you get to the data source as before, but use the old XML definition that was in the workbook. Some elements could be broken in your workbook due to renamed or removed items from the extract file. Answering yes brings in the new elements.

If the data connection was saved with instructions to use it as an extract, then a popup appears to either:
 1. locate the  the extract (which could be on a network drive)
 2. remove the extract (it will then require appropriate permission to use the data connection)
 3. Deactivate the extract (keeps the extract info in the xml file, but doesn't use that information -- instead requires appropriate permission to use the data connection)  If the extract file didn't exist before, an error will occur when attempting to regenerate the extract.
 4. Regenerate the extract (careful!  this regenerates the extract in the same location where it was originally generated - so to do this you will need permission to use the data connection and permission to save the extract wherever it was - it shows you a shortened version of the save location, but you don't have a choice to change this location)

So, lets complicate it some more:
When you publish a workbook to the server and it uses an extract, the extract is pulled from the network/local drive and placed on the server as part of a packaged workbook.  This is a copy of the extract and not the local/network located data source.  So, if the loc data source is modified and saved to the network, the Tableau server version of the workbook doesn't pull from that location, instead it pulls from the extract copy.  The meta-data about the data source is frozen as of the time that the server version of the workbook was published.  To get the server version to recognize changes, the workbook needs to be re-published with the updated data source information to be embedded into a packaged workbook.

I recently had an issue where I needed to have both a live connection to an Excel workbook and a MS SQL Server connection as a data extract in the same workbook.  By saving the workbook as a twb file, the live connection to the local file remained active.  If I saved the workbook as a twbx, then a copy of the Excel file was embedded in the workbook and the live connection was lost.  So, here comes the tricky part.  I wanted to publish this workbook to Tableau Server with a live connection to the Excel file.  The problem happens when publishing, Tableau automatically 'packages' the workbook for consumption to the server when there is an extract present.  Packaging is another term for what happens when a twbx is created.  So, when I published the workbook to the server, the Excel file connection no longer pointed to the local resource (I used UNC naming convention to point it to a network share that the Tableau server could access).  To overcome this issue, I had to publish the Excel workbook as a datasource to the Tableau Server -- but unchecking the box 'Include External Files', thereby making the datasource a live read to the Excel file.  Last, I created a new data connection to the this Tableau server datasource, then replaced the original Excel Datasource with this new one.  Now, when I publish the workbook to the server, the live connection doesn't break, as it is defined by the published datasource, not by the workbook.  Even as a packaged workbook, the live link remains working.

Gravatar

TEDx Greensboro

TEDx Greensboro is a local event of TED Talks (TED = Technology, Entertainment, Design).  I was fortunate to attend this inagural event in Greensboro (attendence was choosen through lottery).  The theme was 'Dreamsboro' -- with the idea that presentations would focus on actions that make Greensboro, NC a place where folks want to live, work, play, retire, and do other good things.

In the first few minutes of the TEDx event, the moderator, Justin Catanoso asked that cell phones, etc. be turned off.  So, in the first few minutes, there wasn't much activity.  Then, slowly, folks couldn't contain their enthusiasm any longer and the tweeting began in earnest.

The viz disects the words used by the Tweeters as well is data about who tweeted, which speakers were mentioned, what tags were used.  A user has the ability to see the Tweeters' profile, the tweet itself or open links mentioned within a tweet. 

Now, let's talk about how this viz was built --

First, it was inspired by Andy Cotgreave's efforts that culled Tweets and used at the Tableau Customer Conference 2012, as well as other conferences (directions found here).  To obtain the raw data, I installed Python and ran the Python script to create a CSV file of the tweets.  That was the easy part.

I wanted to extend Andy's approach so that I could use Tableau 8's new word cloud viz.  To do that, I needed to split the tweets into separate words.  Likely this could have been done in Python, as likely could have the rest of the tasks, but I used what I knew well -- Excel.  I created a macro that splits the tweets into words - each word placed into a separate column on the same line as the tweet.  What did I have now?  The basic structure needed to use the Tableau Reshaper Tool Add-In (found here).  This then gave me a row for each word, along with the full tweet info.

Next, I found that many words were noise.  Adjectives, pronouns, mis-spellings, etc. existed.  A little manual effort identified these noise words and set them aside in a separate worksheet tab.  I then connected Tableau public to the Excel, created a join condition that excluded rows with the noise words.  From there it was just a matter of creating the word clouds, tweet timeline, and some parameter driven choices for how to present the data.

Did I mention that all the tools used for data acquisition, reshaping and vizualization are free?  Python, Tableau Data Reshaper, and Tableau Public will handle up to ONE MILLION rows of data!  You can use the above guidance to recreate a viz of your own against your own media event.  I entered this into a 2013 Tableau User Conference contest, please tweet about the viz using the hashtag #TableauTEDxGSO.

Hope you enjoy!

Gravatar

SAS ODBC and Tableau

SAS ODBC Driver setup instructions for creating a DSN that can be used with any ODBC reading software (like Excel, Access, Tableau):

I worked with Robert Morton of Tableau Software for the past year to get this SAS connectivity working and debugged several issues. Here are some notes about what I found out:

- 32 BIT ODBC drivers from SAS need to be installed on the local PC. A SAS license is not required to use this driver.
- Set up a DSN using windows to establish a connection through the SAS ODBC driver (you must use the 32 bit version, so to get the ODBC setup for the 32 bit, and you have a 64 bit operating system, don't open the data connections from Control Panel as that will create a 64-bit connection.  To open a 32 bit connection (at least for Windows 7), place this into your start menu search C:\Windows\SysWOW64\odbcad32.exe. To eliminate the concern whether the connection issues are Tableau related, test your DSN setup using a product that SAS ODBC does recognize, like Microsoft Excel or MS Access. This will enable you to have a conversation with SAS technicians without muddying the issue with Tableau.
- Once the DSN connectivity works (you will need to set up different ones for SAS/SHARE versus locally accessed SAS7bdat files), then connect with Tableau.
- While you can connect live, date fields are not properly interpreted unless it is an extract. Explore this and test!
- When setting up to connect to SAS/SHARE, the SAS instructions are somewhat obtuse. Names of the SAS/SHARE as recognized by the server must be included in your services file and match the DSN Server name. Here is the setup (with some names modified to maintain confidentiality):

Setup Windows so it will know where to route the port:

c:\windows\system32\drivers\etc\services (be sure to insert this line in numeric ascending order of the port #'s)
share 1624/tcp #SAS odbc

DSN setup:
General tab:
   Data Source Name: My SAS connection (this is freeform)
   Description: (can be left blank)
   Server: share.share
   Records to Buffer: 100000 (I picked more than Robert mentions because I'm dealing with large datasets)

SQL Options to check:
   Support VARCHAR
   Disable_0 override parsing
   Fuzz numbers at 12 places
   Infer type in min/max functions

Servers Tab:
   Name:share.share (must be the same as on General tab)
   Password: (left blank)
   Configure button:
   Server Address: servername (this is the actual server name that is used within SAS to connect -- no // or other UNC conventions, just the name)
   UserName: myproduser (we use a Unix user name that is for production reporting access, but whatever user you use, it must have access to the server named above)
   UserPassword: xxxxxxx (we enter the myproduser password so that it doesn't have to be entered each time that you want to access sas/share)

Libraries tab:
   Name: Mylibrary (this is freeform)
   Host File: /export/appl/theexposeddirectory/ (beginning and ending slash are important -- the user must have access to this directory, which is where the sas7bdat files should reside).

Tableau specifc data connection file (only for Tableau versions 6.14 and before)
You may also need to create a tdc file to be placed in the "My Tableau Repository\datasources" subdirectory, but generally this isn't required for version  6.1.4  and after. Create genericodbc-sas.tdc with a text editor with the following content:

<connection-customization class="genericodbc" enabled="true" version="7.7">
<vendor name="SAS"/>
<driver name="SAS"/>
<customizations>
<customization name="CAP_CREATE_TEMP_TABLES" value="no"/>
<customization name="CAP_ODBC_USE_NATIVE_PROTOCOL" value="yes"/>
<customization name="CAP_QUERY_BOOLEXPR_TO_INTEXPR" value="yes"/>
<customization name="CAP_QUERY_GROUP_BY_ALIAS" value="no"/>
<customization name="CAP_QUERY_GROUP_BY_DEGREE" value="yes"/>
<customization name="CAP_QUERY_JOIN_ACROSS_SCHEMAS" value="no"/>
<customization name="CAP_QUERY_JOIN_REQUIRES_SCOPE" value="no"/>
<customization name="CAP_QUERY_SUBQUERIES" value="yes"/>
<customization name="CAP_QUERY_SUBQUERIES_WITH_TOP" value="no"/>
<customization name="CAP_QUERY_TOP_N" value="no"/>
<customization name="CAP_SELECT_INTO" value="no"/>
<customization name="CAP_SELECT_TOP_INTO" value="yes"/>
<customization name="CAP_SKIP_CONNECT_VALIDATION" value="yes"/>
<customization name="CAP_SUPPRESS_DISCOVERY_QUERIES" value="yes"/>
<customization name="CAP_ODBC_FORCE_SINGLE_ROW_BINDING" value="yes"/>
<customization name="CAP_ODBC_BIND_SUPPRESS_INT64" value="yes"/>
</customizations>
</connection-customization>  


Other resources that might be useful:

 Howard Plemmons, SAS Institute paper 156-28, V8 SAS help on ODBC, v9.3 SAS help on ODBC, SAS Help on ODBC:

Gravatar
Page 4 of 5 << < 1 2 3 4 5 > >>
Copyright© Brad Earle
Site Map | Printable View | © 2008 - 2022 KB Earle Associates LLC | |