Wednesday, February 11, 2015

Metadata Management - Consistent Commenting

The situation: a multitude of workbooks.

It's easy to make lots and lots of Tableau workbooks. In fact, they tend to sprout up all over the place. This is a good thing. Tableau is most valuable when it's being thoroughly exercised, when people are putting it through its paces, exploring data, trying this and that, experimenting with approaches and techniques.

On the other hand, this very wonderfulness has its consequences and costs. This post is about one of them, and a solution to the problem that arises. (problem here in the technical sense, without any value judgement)

The problem: inconsistent data semantics.

There are multiple connections to the same data scattered through the workbooks, and there's a business need to make sure that the data's semantics are consistently presented to the workbooks' users via informative comments on the Tableau connections' fields.

The solution: make the data semantics—comments consistent.

This post presents a way to make sure that all of the references to the same field in a set of workbooks have the same comments.

The short version is that it uses a standard Tableau workbook containing the appropriate comments and a Ruby script to apply those comments to the fields in other workbooks. The details follow.

A connection without comments.

A connection with comments.

The mouse pointer isn't visible in the screen caps above, but it's hovering over Continent on both cases. The one on the right shows the comment "What, no Antartica?", the one on the left has no comment.

Our goal is to have the comment, and all the fields' comments from the commented connection applied to them in the other workbook. Here's the procedure:

  1. Rename or copy the commented workbook to "TableauReference.twb".
  2. Put it in the directory where the uncommented workbook(s) are.
  3. Have Ruby installed (directions below).
  4. Have the Ruby script "ApplyReferenceComments.rb" below available—it doesn't really matter where, but it's more convenient to have it in the same directory.
  5. Run "ApplyReferenceComments.rb".
  6. Watch the magic happen.
The Ruby Code

ApplyReferenceComments.rb is inline below and can be copied from there, or you can download it from here.


# ApplyReferenceComments.rb - this Ruby script Copyright 2014,5 Christopher Gerrard require 'nokogiri' require 'Set' $csvFileName = 'AppliedFieldComments.csv' $csvFile = File.open($csvFileName,'w') $csvFile.puts 'Workbook,Data Source,Field Caption,Field Name,Field Title,Comment' unless $csvFile.nil? $CSVHeader = <<CSVHEADER.gsub("\n",'') Workbook, Data Source, Field Caption, Field Name, Field Title, capComment.nil?, nameComment.nil?, refComment.nil? CSVHEADER $activityCSV = File.open('activity.csv','w') $activityCSV.puts $CSVHeader unless $activityCSV.nil? $errorFile = File.open('ReferenceComments.err','w') $referenceTWB = 'TableauReference.twb' $refFieldComments = {} $referenceFields = File.open('ReferenceFields.csv','w') $referenceFields.puts('Data Source,Field #,Name Type,Name,Comment') def docUsage(arg1="",arg2="") puts arg1 puts " USAGE:" puts " " puts " run: ruby ApplyReferenceComments.rb -go {twbFileSpec}" puts " twbFile.twb - explicit file name" puts " 'f*.twb' - name with wild card(s)" puts " " puts " help: ruby ApplyReferenceComments.rb -help " puts " " puts " tech: ruby ApplyReferenceComments.rb -tech " puts arg2 end def docHeader(arg1="",arg2="") puts arg1 puts " #{$0}" puts " " puts " - Applying Field Comments to Tableau Workbooks, using '#{$referenceTWB}' " puts " as the comments\' reference source." puts arg2 end def docNotes(arg1="",arg2="") puts arg1 puts " help information:" puts " " puts " - Ruby must be in your PATH, or executed as {path}\\ruby.exe" puts " " puts " - This script must be in this dir, or executed as {path}\\#{$0}" puts " " puts " - Identifying TWBs for commenting." puts " The TWBs to be processed may be declared on the command line after \'-go\'" puts " " puts " Default - all the TWBs will be processed. e.g. -go " puts " (those not already commented)" puts " A single TWB - can be identified by name, e.g. -go BigData.twb" puts " the name can be quoted or not." puts " Multiple TWBs - can be identified via wild cards, e.g. -go 'Big*.twb'" puts " this must be quoted, if not only " puts " the first matching TWB counts." puts " " puts " - Commented TWBs" puts " TWBs that have reference comments added to any of their fields will be " puts " saved with '.comment' added to their names, e.g.: " puts " " puts " BigData.twb => BigData.commented.twb " puts " YabbaDabbaDo.twb => YabbaDabbaDo.commented.twb " puts " " puts " This ensures that no TWBs will be harmed during this process." puts arg end def docTech(arg1="",arg2="") puts arg1 puts " technical information:" puts " " puts " - Identifying Reference Comments " puts " " puts " - '#{$referenceTWB}'" puts " must be present to serve as the Fields\' comments source." puts " The reference fields' comments will be compiled from the fields found" puts " in '#{$referenceTWB}' into a list of fields and their comments." puts " Any problems locating or loading comments from '#{$referenceTWB}'" puts " will raise an error message, and no TWB processing will occur." puts " " puts " '#{$referenceTWB}' Data Connections" puts " - In many cases there will be a single canonical table/view used as" puts " the source of reference comments." puts " - However, it's possible that '#{$referenceTWB}' may contain" puts " more than one data connection, in which case the connections will be" puts " processed in the order in which they occur in '#{$referenceTWB}'," puts " and this order is unpredictable." puts " - In the case of the same field occurring in more than one reference" puts " Data Connection, the last one wins, i.e. the last field identified" puts " during the compilation process provides the reference comment." puts " " puts " - Applying Reference Comments" puts " - Fields in the TWBs that match one of the reference fields will be provided" puts " with the reference field's comment." puts " - Existing field comments will be discarded." puts " " puts " - Field matching occurs using either of the field's names:" puts " - Technical - the field name in the database, or " puts " - Presentation - the 'Tableau' field name, i.e. the field has been renamed. " puts " " puts " Fields that reference the same database field may not have the same Tableau" puts " name in different workbooks. This process considers that matching either the " puts " Technicalor Presentation name identifies a candidate field as matching and " puts " it will then be commented with the reference comment." puts " " puts " The candidate and reference fields' data types need not match, this is deliberate. " puts " " puts " - Commented fields will be recorded in #{$csvFileName}, which can " puts " be examined (with Tableau, naturally) to see which fields have been commented." puts arg2 end def docCaution(arg1="",arg2="") puts arg1 puts " USE WITH CAUTION" puts " ---------------------------------------------------------------------------------" puts " This process will add comments to all Db fields in the local TWBs that match the " puts " reference fields." puts " " puts " This is a destructive process - any existing comments will be discarded and cannot" puts " be recovered." puts " " puts " Run this process on copies of your workbooks, and check before and after running" puts " this process to make sure that everything is as it should be." puts arg2 end # IMPORTANT # Last comment wins - if more than one reference field exists with the same name # the last one encountered will be used as the reference field def loadReferenceComments refTwbExists = File.file?($referenceTWB) # puts "Reference twb Exists? :: #{refTwbExists}" return false unless refTwbExists doc = Nokogiri::XML(open($referenceTWB)) dataSources = doc.xpath("//workbook/datasources/datasource").to_a dataSources.each do |ds| dsName = ds.xpath('./@name').text dsCap = ds.xpath('./@caption').text dsName = if dsCap == '' then dsName else dsCap end fieldNodes = ds.xpath('./column').to_a fieldNodes.each do |col| colCap = col.attr('caption') colName = col.attr('name') colTitle = if colCap == '' or colCap.nil? then colName else colCap end comment = col.at_xpath('./desc') recordReference(dsName,colCap,colName,comment) unless comment.nil? end end puts " - Reference fields loaded.\n\n" return refTwbExists end $refFieldNum = 0 def recordReference(dsName,colCap,colName,comment) return if comment.nil? # -- $refFieldComments[colName.upcase] = comment unless colName.nil? or colName == '' $refFieldComments[colCap] = comment unless colCap.nil? or colCap == '' # -- text = comment.text.gsub(/[ ]*\n[ ]*/, ' ').gsub(',', ' ').gsub(/\t/, ' ') $refFieldNum += 1 $referenceFields.puts("#{dsName},#{$refFieldNum},Caption,#{colCap},#{text}") $referenceFields.puts("#{dsName},#{$refFieldNum},Name,#{colName},#{text}") $referenceFields.puts("#{dsName},#{$refFieldNum},Name - UC,#{colName.upcase},#{text}") end def addComment(colNode, commentNode) # Tableau only recognizes the first node as a comment so this # method removes any/all existing nodes before adding the new one descNodes = colNode.xpath('desc') descNodes.each { |dNode| dNode.remove } colNode << commentNode return true end def writeTWB(doc, twb) commentedTWB = twb.sub(/.twb$/,'.commented.twb') newTwb = File.open("#{commentedTWB}",'w') newTwb.puts(doc) newTwb.close puts " #{commentedTWB}" end # These are the standard data Tableau column attributes for the Teradata # types identifed by inspection in the TWB metadata-record elements. # This script is written to handle Teradata sources, may need adjustment # for other data sources. $mdRecProps = { 'date' => {'datatype' => 'real' , 'role' => 'measure' , 'type' => 'quantitative' }, 'integer' => {'datatype' => 'integer', 'role' => 'measure' , 'type' => 'quantitative' }, 'real' => {'datatype' => 'real' , 'role' => 'measure' , 'type' => 'quantitative' }, 'string' => {'datatype' => 'string' , 'role' => 'dimension', 'type' => 'nominal' } } def getMetadataRecordFields(twb, doc, dsName, dataSource, twbFields) mdFields = dataSource.xpath('./connection/metadata-records/metadata-record[@class="column"]') mdFields.each do |f| localName = f.at_xpath('local-name').text seen = twbFields.include?(localName) if !seen then refComment = $refFieldComments[localName.upcase] # - get a reference comment for the column name, if any if !refComment.nil? then type = f.at_xpath('local-type').text attrs = $mdRecProps[type] if !attrs.nil? then colNode = Nokogiri::XML::Node.new('column', doc) colNode.set_attribute('name',localName) attrs.each do |attr, value| colNode.set_attribute(attr, value) end colNode << refComment dataSource << colNode else $errorFile.print "ERROR getMetadataRecordFields - no attributes found for metadata-record:" $errorFile.puts "'#{localName}' in dataSource:'#{dsName}' in TWB:'#{twb}'" end end end # puts end end # ====================== /resolving metadata, non-column fields ====================================== def processTWB twb twbFields = Set.new # use to keep track of fields found, by their technical/DB names return if twb =~ /.*[.]commented.twb$/ # don't process already processed TWBs puts "\n - #{twb}" doc = Nokogiri::XML(open(twb)) dataSources = doc.xpath("//workbook/datasources/datasource").to_a # puts " - loaded" isCommented = false dataSources.each do |ds| dsName = ds.xpath('./@name').text dsCap = ds.xpath('./@caption').text dsName = if dsCap == '' then dsName else dsCap end # puts " - DS: #{dsName}" fieldNodes = ds.xpath('./column').to_a fieldNodes.each do |col| colCap = col.attr('caption') colName = col.attr('name') colTitle = if colCap == '' or colCap.nil? then colName else colCap end # puts " - col: #{colName}" twbFields.add(colName) # register seen Columns for ref during handling capComment = $refFieldComments[colCap] nameComment = $refFieldComments[colName.upcase] refComment = !capComment.nil? ? capComment : nameComment $activityCSV.print "#{twb},#{dsName},#{colCap},#{colName},#{colTitle}" $activityCSV.puts ",#{capComment.nil?},#{nameComment.nil?},#{refComment.nil?}" unless refComment.nil? then # puts " - comm:\t #{colName} " isCommented = addComment(col,refComment) if isCommented then twbFields.add colName title = colTitle.sub(/^\[/,'').sub(/\]$/,'') comment = refComment.text.gsub(/[ ]*\n[ ]*/, ' ').gsub(',', ' ').gsub(/\t/, ' ') $csvFile.puts "\"#{twb}\",\"#{dsName}\",#{colCap},#{colName},\"#{title}\",\"#{comment}\"" end end end getMetadataRecordFields(twb, doc, dsName, ds, twbFields) end # puts "\n\n====" # twbFields.each { |f| puts f } writeTWB(doc, twb) if isCommented end def process(arg1='*.twb') docHeader "\n\n\n" if loadReferenceComments then puts "\n\n Processing TWBs named '#{arg1}'...\n" Dir.glob(arg1) { |twb| processTWB twb unless twb == $referenceTWB } $csvFile.close unless $csvFile.nil? puts "\n\n Done.\n\n\n" puts "\n That's all, Folks.\n\n" else puts " ERROR" puts " ---------------------------------------------------------------\n" puts " Either:" puts " - The Comments Reference TWB: #{$referenceTWB} is not present." puts " or " puts " - A problem occurred loading the reference comments from it." end end def usage docHeader "\n\n\n" docUsage "\n\n\n", "\n\n\n" end def help docHeader "\n\n" docUsage "\n" docNotes "\n\n" docCaution "\n\n", "\n\n\n" end def tech docHeader "\n\n" docUsage "\n" docNotes "\n\n" docTech "\n" docCaution "\n", "\n\n\n" end system "cls" # what to do? case ARGV[0] when /[-]+[g][o]?$/i ARGV[1].nil? ? process : process(ARGV[1]) when /[-]+(h|help)$/i help when /[-]+(t|tech)$/i tech else usage end #$csvFile.close unless $csvFile.nil?

Running ApplyReferenceComments.rb

Notice that running "ApplyReferenceComments.rb" requires using "-go" on the command line. Not providing "-go" will cause "ApplyReferenceComments.rb" to provide usage information, including how to see help and technical information. These are shown below.

Commenting in action.

Note that whenever a Workbook is found ApplyReferenceComments.rb lists it, and then if there are any comments added to it the original Workbook is saved with ".commented" appended to its name. This ensures that the original TWB is left unchanged.

Recording comments applied.

A record of the comments that are applied is kept as data in the CSV file "AppliedFieldComments.csv". Here's an image of a Tableau workbook showing the results of the scenario above.

There's only one Workbook and one Data Source, so they're shown in the sheet's title do the sheet can be narrow enough to display the whole worksheet.

The usual caveats apply.

This script has worked well for my situation and needs.
It may or may not work as well for you in yours.
There are no guarantees that it will do anything useful.

Tableau 9

Rumor has it that Tableau has told one Tableau tools vendor that Workbook XML will not change in any ways to break existing tools and utilities. This would be good, there's been no public commitment to this that I know of and Tableau 9 could break everything.

Evolution.

These utilities evolve fairly rapidly as needs and conditions change. I envourage you to take this code and use it as the basis of your work, and hope that if you do you'll leave a comment and a link to your new stuff. The more tools the community creates the better for all of us.

Comments welcome.

I'm very interested in hearing about any experiences with this, good and bad. Please leave a comment if you give it a try.

1 comment:

  1. thx for info... keep writing and giving us an information... glhf for ur day!!!

    ReplyDelete