Talend: tips and tricks part 1

This blog contains some convenient tips and tricks that will make working with the open source tool Talend for data integration a lot more efficient. This blogpost will be especially useful for people who are just discovering this amazing tool, yet I am sure that people who have been using it for a while will also find it very helpful. These series of tips will be spread over multiple blog entries so make sure to check back often for future tips!

1. Testing expressions in the tMap component

Using the tMap component, you have the possibility to test your expressions. This way you can easily see whether or not the result is what you expected it to be. You can also use this to determine whether or not your expression will error. Let’s create an example.

We’ve got details of employees as input for our tMap. We would like the first name to be shown in uppercase. First of all, go into the expression builder by clicking the ellipsis next to your expression.

Ellipsis expression builder

To convert the first name to uppercase, we have to use the StringHandling function “UPCASE”. This will result in the following expression: StringHandling.UPCASE(employee.First_name)

After you’re done filling in test values, click on the “Test!” button and wait for the result. If everything goes as expected, you should see your first name in uppercase on the right side of the window.

Test expression builder tMap

2. Optimizing the appearance of the tLogRow component output

tLogRow is one of the most frequently used components. It is recommended that you learn how to optimize its use. Firstly, make sure that you always have the right appearance selected for your output. You can find this property in the basic settings of your tLogRow-component.

tLogRow modes

There are three types of Modes that you can choose between:

  • Basic

Basic will generate a new line for each record, separated by the “Field Separator” you’ve chosen (see image above). When using basic mode, I highly recommend to check the “Print header” option when working with multiple column records or multiple outputs, purely for visibility reasons.

basic mode output

  • Table (print values in cells of a table)

The table mode shows the records and their headers in a table-format, including the name of the component that generated this output (in our case: “tLogRow_1”). This emphasizes the importance of properly naming everything, especially when you have multiple components that generate output. In this case, it would have been better to rename our component to “EMPLOYEES”. Personally, I prefer this mode.

table mode output

  • Vertical (each row is a key value/list)

Vertical mode will show a table for each one of your records.

vertical mode output

The output mode you decide to use depends on what you’re trying to visualize. For example, when your goal is to show a single string, I would recommend using the basic mode. But when you have multiple table outputs (for example: departments, customers and employees in a single output), I’m certain the table mode would be the best option.

Sometimes your data is spread over multiple lines, resulting in an unclear output, like shown in the image below.

output with wrap

To force the output to put all the data on one single line, you can uncheck the “Wrap” option. This option is located underneath your output and will enable a horizontal scrollbar.

output without wrap

Do you also want to be able to get data regarding tweets using Talend, as shown in the image above? Read my previous blogpost and find out how!

3. Resetting windows and maximizing/minimizing them

Sometimes you accidently close a window and have a hard time finding a way to get it back. You can very easily reset your environment by clicking on “Window” – “Reset Perspective”.

reset perspective

You can see all of the views by clicking on “Windows” – “Show View” – “Talend”. Some of the views are not shown by default, such as “Modules”. Modules can be used to import .jar-files without having to restart your studio, which will most likely save you some time.

Lastly, because Talend is Eclipse-based, you have the possibility to maximize and minimize windows. I personally use this function when examining the output of a tLogRow-component including a lot of data. You can achieve this by either double-clicking on the window or by right-clicking on it and selecting “Minimize”/”Maximize”.

That’s it for now. I hope you enjoyed reading this blog and make sure to return soon for future blogs!

Use of contexts within Talend

When developing jobs in Talend, it’s sometimes necessary to run them on different environments. For other business cases, you need to pass values between multiple sub-jobs in a project. To solve this kind of issues, Talend introduced the notion of “contexts”.

In this blogpost we elaborate on the usage of contexts for easily switching between a development and a production environment by storing the connection data in context variables. This allows you to determine on which environment the job should run, at runtime, without having to recompile or modify your project.

To start using contexts in Talend you have two possible scenario’s:
1) you can create a new context group and its corresponding context variables manually, or
2) you can export an existing connection as a context.
In this example we’ll go over exporting an existing Oracle connection as a context.

Double click an existing database connection to edit it and click Next. Click Export as context

Image

NOTE There are some connections that don’t allow you to export them as a context. In that case you’ll have to create the context group and its variables manually, add the group/variables to your job, and use the variables in the properties of the components of your job.

After you’ve clicked the Export as context button you’ll see the Create/Edit context group screen. Enter a name, purpose and description and click Next.

Image

Now you’ll see all the context variables that belong to this context group. Notice that Talend has already created all the context variables that are needed for the HR connection. If you want to change their names you can simply click them and they become editable.

Click the Values as table tab.

Image

In the Values as table tab you can edit the values of the context variables by simply clicking the value and changing it. To add a new context, click the context symbol in the upper right corner.

Image

The window that pops up is used to manage contexts. To create a new context, click New, enter the name of the context, in our example Production, and click Ok. To rename the Default context, select it, click Edit, enter Development and click Ok. When you’re done editing, click Ok.

Image

After the window closes, you’ll see that an extra column appeared. Enter the connection data of the production environment in the Production column and click Finish.

Image

In the connection window it’s possible to check the connection again, but this time you’ll be prompted which connection you want to check.

Image

Verify that both the connections work and click Finish.

Now that we’ve exported the connection as a context, it’s possible to use it in a job. Create a new job, use the connection that has been exported as a context and connect it to a tLogRow component. Your job should look something like this

Image

When using a connection that has been exported as a context in a job, you have to include the context variables in order for your job to be able to run. Go to the context tab and click the context button in the bottom left.

NOTE When using one of the newer versions, Talend proposes to add missing context variables whenever you try to run a job, because of this you don’t need to add them manually as described in this example.

Image

Select the context group that contains the context variables, in our case the HR context group.

Image

Select the contexts you want to include and click OK

Image

NOTE A context group can also be added to a job by simply selecting the context from the repository, dragging it towards the context tab of the job, and dropping it there.

Once you’ve added the context group to the job, it’s possible to run the job for both the development and production environment by selecting the context in the dropdown menu of the Run tab.

Image

Data Integration Services – Consolidate your data (sources)

When talking about data integration services and depending on the audience you’re talking to, this could be:

  • Data needs to flow between different applications, instead of persisting the data on different data containers => Data Consolidation, Data messaging
  • Extract, transform and load => data warehousing, data consolidation
  • Services that are offered within an enterprise, or enterprise-wide, to share data => web services, messaging services, Service Bus, …

In other words, data integration services can hold a number of key performance indicators which are defined by the business metrics and requirements.

Besides the integration-aspect, the data-aspect is even more crucial such as defining the Enteprise Information Model so every consumer perceives the data in the same manner.

In regards to this aspect and the retail-sector, there’s an interesting blog-post regarding the Oracle Retail Data Model (ORDM).