Tableau: Delineate Belgian provinces explained

Tableau Software is a self-service BI-tool that allows data visualization, meaning that even business users should be able to easily visualise their data (without needing help of IT). You can check out the Tableau website for more information about this great tool.

As you may have already seen in several guides or tutorials, Tableau is able to link certain dimensions like postal code, countries, etc. to a certain geographical area. Unfortunately, most of these tutorials use data related to the United States of America. As of today, the Belgian geographical support is at a low point. One thing we can easily do though, is show the data based on provinces using their borders.

In this blog we’ll be showing the Belgian population per province in the year 2012 on a map. The data I used (and which you can see in the image below) can be downloaded from the Belgian Open Data website.

Data used in Excel

I’ve slightly altered the data so that we would have the English name for each province (else Tableau will not recognize it – you can also choose French or German).

I’ve opened Tableau and loaded the data. Make sure that province is set as a geographical data type (State/Province). If this is not the case you can change it by right clicking on province and then selecting “Geographical Role” -> “State/Province”.

Geographic Role - State Province

When using filled maps (=a type of map visualization that Tableau offers which will fill the area according to your chosen data), you can only use one measure. Therefore I’ve added a new calculated field “Total” (see “Analysis” -> “Create Calculated Field”) based on the male and female amount of people.

calculated field - Total population

Now we will select both the province and the total and click on “Filled maps”.

Select province and total for map2

Tableau will automatically colour the provinces according to the amount of people who live there.

Now the only thing left to do is format your layout and then you’re done! I have coloured the provinces based on the % of the total (Red -> High %, Green -> Low %). I’ve also put the % of total in the label because I think showing the normal total would be unclear. The last thing I did was add the amount of males, females, total and total percentage to the label.

end result

You can view the dashboard that I made for this blog on Tableau public. Tableau public is a free tool that Tableau offers which allows you to publish your data on the web for other people to see.

Extra tip: you can control click on multiple provinces to view the sum of the total % for the selected provinces.

Thank you for reading!

Talend MDM: How to use validation rules.

When creating a MDM data model, Talend offers you standard constraint possibilities.
You can choose whether fields are mandatory or not by setting the minimum and maximum occurrence. You may also set fixed values for a field.
Sometimes these options aren’t enough: you also want to use custom validations like email validation and URL validation. To solve this kind of requirements, Talend gives the possibility to use validation rules.

To illustrate this, I’ve created the entity ‘Customer’. As you can see FirstName and LastName are mandatory fields. However, for my business case, this data model doesn’t meet all my requirements: I also want validate the PostalCode(Belgian format: ‘9999’) and the Email before it saves the record.

Datamodel

How can you solve this within Talend?

You can create a validation rule by right clicking on the Customer entity and select Set the Validation Rule as shown below.

set validation rules

A window will pop up where you need to fill in a name for your validation rule. After you’ve chosen a name the following window will be shown.

set validation rule builder

You can add rules by clicking on:select xpath

For each rule you’ve to select an XPath, create an expression and set a message. You can add these by clicking on “…” in the field.

When creating an expression you can use the expression builder by clicking on “…”. The builder provides lots of predefined functions.

expression builder

After you have created and set your validation rules you’ve to deploy your model to the MDM server. After you’ve deployed the data model, we may test the validation rule by creating a new record. When you enter an invalid postalcode or an invalid emailaddress you’ll get the following message:

invalid postalcode

When entering the correct information you’ll get the following message:

save successfully

As you can see we’ve created a single validation rule set with different rules. The record can only be saved if all the rules of the validation set are met.

Talend: Schema compatibility check

Most of the time when talking about Talend jobs, people think of standard ETL (Extract, Transform, Load). But in some cases there’s the need to check the incoming data before loading them into the target rather than just transforming it. We refer to this process as E-DQ-L (Extract, Data Quality, Load).

One of the things that you might want to check before loading is schema compatibility. For example: you expect to get a String that’s 5 long. If you, for any reason, receive a String that is larger than 5, it will generate an error. Or perhaps you expect a percent (in format BigDecimal like 0.19), but you receive it as a string (“19%”). This example will result into a failing job with an error saying “Type mismatch: cannot convert from dataType to otherDataType”.

Before I continue this blog I would like to emphasize that all the solutions below are possible with the Data Integration version of Talend, except for the last one. The last option requires a Talend Data Quality license.

Let’s create an example case: We want to extract data on a regular basis from a third-party source which we cannot fully trust in terms of schema-settings. We know how many columns we can expect and we have a rough idea of what it contains, but we do not fully trust the source to not give incompatible data. We want to load the records that are valid and we want to separately store the ‘corrupt’ data for logging purposes. I’ve gathered several solutions for this problem:

  1. Use rejected flow on an input-component

One thing you can do is reject the records as soon as you import them. Disable “die on error” on the basic settings tab of you input-component and then right-click it and select “Reject”. The rows will be rejected based on the schema of the file. In the example below we put phone number as an integer and as you can see 1 records is begin rejected. This is because the phone number contains characters and therefore cannot be read as an integer. If you did not disable the “die on error”-option then this component would make the job fail.

reject on input

  1. In case of the target being a database: use rejected links

You can also choose to directly input the data into your database, but to reject any rows that would create an error. You can then create a separate flow to determine what to do with these rejected records.

In your database output component (for example tOracleOutput) change the following:

  • Basic settings: Uncheck “Die on error”
  • Advanced settings: Uncheck “Use batch size”

Now, right-click on your component and select “Row-Reject” and connect it to an output-component. The output you’ll receive will be the rejected rows and what error would have been generated if you tried inserting them, as you can see in the picture below.

rejected rows databank

  1. Use a tFilter-component

You can make the data go through a filter-component before inserting it into your target. You can (manually) decide what’s allowed to go through. This can be useful when your destination is not a database, in which case option 1 is most likely not available.

Schema compatibilty check with tFilterRox

A tFilterRow-component also has the possibility to output the rejected rows, including the reason why they got rejected. You can enable this by right-clicking on your filter and selecting “Row-Reject”. An example of rejected rows by the filter:

rejected rows tFilterRow

Note – You can also use self-defined routines in the tFilterRow-component by checking “Use advanced mode”. This can be useful when you want to check whether or not converting is possible. For example: you could define a routine called “isInterger” that returns true if the conversion is valid and false if it’s impossible.

  1. Use a tSchemaComplianceCheck-component

Another way of making sure that your schema is compatible is by using the tSchemaComplianceCheck-component. Unfortunately, this component is only integrated in the Data Quality version of Talend.

It’s a very easy component to use. The only thing you have to do is connect the incoming data to the tSchemaComplianceCheck-component and then continue its flow to the destination source. You can get the rejected rows the same way as previously (by right clicking on it and then selecting “Row->Reject”).

tSchemaComplianceCheck job

The rejected rows and their error message look like this:

rejected rows tSchemaCompatibilty

That’s it for now. There’s probably a lot of other ways of checking schema compatibility. Feel free to comment if you know any. Thank you for reading!

Change the owner at runtime in Apex

One of our clients needed an application for the follow-up of the production in their plants. The main feature of the application was to show all the data of the current year, but the client also wanted to be able to consult the historical data. Unfortunately the historical data was not in the same schema as the current data.

The two schemas:

  • SCHEMA_A
  • SCHEMA_A_HISTORY

As you could already guess, the second schema contained the historical data. Each year, the old data was moved from SCHEMA_A to SCHEMA_A_HISTORY.

As SCHEMA_A was our parsing schema, we had all the privileges on the tables with the current data, but we didn’t have access to the historical data. We could have created separate screens for the historical data, but we decided to switch owner at runtime when the user wanted to see the historical data.
For this, we also need the select privilege on schema_a_history

In APEX we can do this with just a single line of code:

apex_application.g_flow_owner := 'SCHEMA_A_HISTORY';

This line is best placed under “shared components > security > database session > initialization code”; this way it is fired every time APEX takes up a database session from the session pool:

Knipsel

In older versions of APEX this field was called “Virtual Private Database”.

As you can see in the screenshot, we have used an application item. When the users asks to see the historical data, we fire a process that changes the value of this item from ‘SCHEMA_A’ to ‘SCHEMA_A_HISTORY’.

That’s it, just a quick tip that can save you a lot of time!

Oracle Forms 12c?

Some people wonder whether there will be a 12c version of Oracle Forms or not.

Instead of making guesses, the best way is to ask the people who know best.
So, after getting this question, I asked Michael Ferrante, Oracle Forms Product manager.

Here’s his answer:

Version 12 of Forms and Reports is tentatively planned for releases sometime in CY2015.  At this time, we (Oracle) are not permitted to share exact dates, but I can tell you that developers are hard at work with the new version.  Currently, Forms is in very early stages of its development cycle, but already contains many new features and enhancements to existing features.  This new version, of course will also offer updated certifications against newer Operating Systems, newer Java versions, newer browsers, and newer Oracle database versions.

If you want to stay updated on Oracle Forms, check the following sources on a regular base:

In case you visit Oracle Open World, check the following session:

Oracle Forms sneak peek (Moscone South, room 306 - 9/29/14, 17:15 – 18:00)

New in Java 8 : Consumers and Predicates : a simple introduction

The java.util.function package is new in Java 8 and contains interfaces that are used for lambda expressions and functional references. In this blog, I give a brief introduction of  2 interfaces of this package :

  • Consumer
  • Predicate

For the examples in this blog, we have a list of invoices with name, amount and category :

public class Invoice {
   private String name,amount,category;
   public Invoice(String name,String amount,String category) {
     super;
     this.name=name;
     this.amount=amount;
     this.category=category;
   }
   public String getName() {
     return this.name;
   }
   public String getAmount() {
     return this.amount;
   }
   public String getCategory() {
     return this.category;
   }
}

To generate a list of invoices, we’ll use the following method:

public static List<Invoice> generateInvoices()  {
   List<Invoice> list = new ArrayList<Invoice>();
   list.add(new Invoice("Oracle","1000","SOFTWARE"));
   list.add(new Invoice("Microsof","30000","HARDWARE"));
   list.add(new Invoice("Apple","5000","SOFTWARE"));
}

Consumer

A Consumer is an interface that ‘consumes’ an object. It takes an argument and does something with it. It does not return a result.

The Consumer interface has 2 methods :

  • void accept(T t) : contains the code that is executed on t
  • default Consumer<T> andThen(Consumer<? super T> after) : This method returns a consumer that is executed after the previous one and enables you to execute a chain of consumers.

For this demo, we are using the (new in Java8) method of the Collection API :

Collection.forEach(Consumer<? super T> action)

This method executes the consumer ‘action’ on every item of the collection.

First we create 2 methods that each return a Consumer object. The first will print the name of the invoice, the second prints the amount.

Finally we use these 2 methods in a Collection.foreach method.


public static Consumer<Invoice> printName() {
    return new Consumer<Invoice>() {
         public void accept(Invoice invoice) {
           System.out.println(invoice.getName());
         }
    };
}

public static Consumer<Invoice> printAmount() {
    return new Consumer<Invoice>() {
         public void accept(Invoice invoice) {
           System.out.println(invoice.getAmount());
         }
    };
}

generateInvoices().forEach(printName().andThen(printAmount());

As you can see in the last line, first the printName() is executed, and then the printAmount(). This line will print the following :
Oracle
1000
Microsoft
3000
Apple
5000

When an error occurs in the foreach method, an exception is thrown, and further processing of the List stops.

Predicate

A Predicate is an interface that is used to assign lambda expressions. It has a functional method :

boolean Test(T t )

Predicates are used as stream operations. Stream operations can be executed on Collections in order to execute complex data processing queries. But in this blog we’ll keep it simple, we just want to select all the invoices with category=’HARDWARE’, and put them in a new List.

Using a predicate in combination with the new Streams API, will simplify and shorten our code and make it more readable.

First we define our predicate, and then we’ll use it on our List of invoices. The stream method will filter the List using our predicate, and then collect the items that fulfill the predicate in a new List.

public static Predicate<Invoice> isHardware() {
     return i -> i.getCategory().equals("HARDWARE");
}

List<Invoice> listHardware = generateInvoices().stream.filter(isHardware()).collect(Collectors.<Invoice>toList());

Our new list will now contain 1 invoice, the one from Microsoft which has ‘HARDWARE’ as category.
As you can see, Predicate is a class that contains a function that we can pass to other classes. Actually it is just a reference to a function, AKA ‘a function reference’.
With Streams, you also sort and map data, before collecting, but that’s for another blog.

So that’s it for now. I hope this blog has shown that, by using Consumers and Predicates, our code will become shorter, cleaner and more readable.

 

Talend: tips and tricks part 2

In the first part of these entries we discussed how to test your expressions, the importance of optimizing the appearance of a tLogRow component and how to handle windows and views within Talend. This time around, we will be talking about the different ways to get components into your job, how to trace your dataflow and how to easily sync columns. As last time, this post will be useful for both starting and experienced users.

4. Getting components into your job

There are many ways to get components into your job. Most people search the palette (by either the search-function or by manually exploring the folders) and drag/drop the components into their job. You can achieve the same thing by simply clicking on a random place in your job and then type the name of the component. Obviously this is only recommended once you’re familiar with the different components and their names.

insert component

When working with metadata, you can use certain shortcuts to save a bit of time. Usually people just click on the metadata and then drop it onto their job. This will pop up a window allowing you to choose which type of component you want to use. Holding the Control-key while dragging the component will directly create an Output-component. Holding Control+Shift will result into an Input-component.

5. Syncing columns

Occasionally, you may have to change the schema of a certain component in the middle of development. This might affect other components in your job. In some cases, Talend asks if you want to propagate the changes you’ve made (to the other components).

propogate changes

You may accidently close this window, click “No” or not get this message at all, resulting in the following error: “The schema from the input link “youroutputlink” is different from the schema defined in the component”.

different schema error

When this happens, you can go to the basic settings of the component that has the error and click on “Sync columns”. The error should now be gone.

schema sync columns

6. Tracing your dataflow (Debug Run)

Lastly, I would like to say a few words about the debug run. In some cases we want to closely watch our dataflow in order to get a better understanding of what’s exactly happening. You can achieve this by running your job in debug mode. This can be done by clicking on the Run-window, then click on the “Debug Run” tab on the left side of the window and start it by clicking on “Traces Debug”.

debug run

The moment you open the “Debug run” tab, you’ll immediately see extra icons in your job. These magnifying glass icons indicate that details will be shown when you debug-run your job. The result should look something like this:

debug run example

You can Pause and Resume the run at any time. You can also add breakpoints if you like. Do this by right-clicking on a dataflow and then selecting “Show Breakpoint Setup”.

show breakpoint setup

This brings you to the “Breakpoint” tab of the data flow you clicked on. You can also go there by clicking on the specific flow and manually selecting “Breakpoint”. Let’s add a breakpoint to pause our run whenever we come across a record with “Bloom” as last name. Firstly, make sure to check the “Activate conditional breakpoint” option. After that, click on the plus-icon underneath the conditions. Then select the InputColumn we want to put our condition on, in our case this is “Last_name”, and add a value (“Bloom” in this example). The default Operation is “Equals”, which is the one we want. You can also specify an Operation if you need to, but this is unnecessary for this case.

component breakpoint properties

You can add multiple breakpoints if you like. Whenever you debug run your job now, it will stop at a record where the Last_name is “Bloom” (if any exist).

That’s it for now. Thank you for reading!