Saturday, 5 November 2016

Aggregator Transformation in Informatica

The aggregator transformation performs aggregate calculations like sum, average, etc.
For example, if you want to calculate the sum of salaries of all employees department wise, we can use the Aggregrator Transformation.
The aggregate operations are performed over a group of rows, so a temporary placeholder is required to store all these records and perform the calculations.
For this, aggregator cache memory is used. This is a temporary main memory which is allocated to the aggregator transformation to perform such operations.
In this example, we will calculate the sum of salaries department wise. For this, we require a new column to store this sum. So, first of all, we will prepare a new column.
Step 1 – Create a new database target table, for example, say "sum_sal_deptwise", using the below script. You will see the new database target table is created under Target folder in next step.
CREATE TABLE sum_sal_deptwise (
  DEPTNO NUMBER(2), 
  SUM_SAL NUMBER(5)
);
Step 2 – Create a New mapping "m_ sum_sal_deptwise".
In order to create new mapping, we need source table (EMP) and target table (sum_sal_deptwise) both in mapping designer for that we need to
  1. Import the target table "sum_sal_deptwise" in the mapping.
  2. Import the source table "emp".

Step 3 – In the mapping,
  1. From the Source Qualifier, delete the columns empno, ename, job, mgr, hiredate & comm so leaving only the columns deptno and sal.
  2. Create a new aggregator transformation using the toolbox menu as shown in screen shot. When you click on the aggregator icon, a new aggregator transformation will be created.

Step 4 - Drag and drop SAL & DEPTNO columns from source qualifier (SQ_EMP) to the aggregator transformation


Step 5 – Double click on the aggregator transformation to open its properties, and then
  1. Add a new port in the transformation
  2. Rename the port name to SUM_SAL
  3. Change the data type of this new port to double
  4. Make this port as output port by selecting the checkbox of the output port.
  5. Click on the expression option

Step 6 – In the expression window
  1. Add expression- sum(SAL), you have to write this expression.
  2. Select Ok Button, this will bring back the edit transformation window.

Step 7 – In edit transformation window, select option "GroupBy" by marking the check box against the deptno column and Click Ok ( by selecting group by against the deptno, we are instructing Informatica to group salaries by deptno)


Step 8 – Link the deptno and sum_sal columns from aggregator transformation to the target table


Now save the mapping and execute it after creating a new session for this mapping. The target table would contain the sum of salaries department wise. In this way, we can use aggregator transformation to calculate aggregate results.

Filter Transformation in Informatica

Using the filter transformation, we can filter the records based on the filter condition. Filter transformation is an active transformation as it changes the no of records.
For example, for loading the employee records having deptno equal to 10 only, we can put filter transformation in the mapping with the filter condition deptno=10. So only those records which have deptno =10 will be passed by filter transformation, rest other records will be dropped.


How to use filter transformation-
Step 1 – Create a mapping having source "EMP" and target "EMP_TARGET"
Step 2 – Then in the mapping
  1. Select Transformation menu
  2. Select create option

Step 3 - Then in the create transformation window
  1. Select Filter Transformation from the list
  2. Enter Transformation name "fltr_deptno_10"
  3. Select create option

Step 4 – The filter transformation will be created, Select "Done" button in the create transformation window

Step 5 – In the mapping
  1. Drag and drop all the Source qualifier columns to the filter transformation
  2. Link the columns from filter transformation to the target table

Step 6 – Double click on the filter transformation to open its properties, and then
  1. Select the properties menu
  2. Click on the Filter condition editor

Step 7 – Then in the filter condition expression editor
  1. Enter filter condition – deptno=10
  2. Select OK button

Step 8 – Now again in the edit transformation window in Properties tab you will see the filter condition, select OK button

Now save the mapping and execute it after creating session and workflow. In the target table, the records having deptno=10 only will be loaded.
In this way, you can filter the source records using filter transformation.

Transformations in Informatica

Transformations are the objects in Informatica which creates, modifies or passes data to the defined target structures (tables, files or any other target).
The purpose of the transformation in Informatica is to modify the source data as per the requirement of target system. It also ensures the quality of the data being loaded into the target.
Informatica provides various transformations to perform specific functionalities.
For example, performing tax calculation based upon source data, data cleansing operation, etc. In transformations, we connect the ports to pass data to it, and transformation returns the output through output ports.
Classification of Transformation
Transformation is classified into two categories, one based on connectivity, and other based on the change in no of rows. First we will look the transformation based on connectivity.
Types of transformation based on connectivity
  • Connected Transformations
  • Unconnected Transformations
In Informatica, during mappings the transformations which are connected to other transformations are called connected transformations.
For example, Source qualifier transformation of Source table EMP is connected to filter transformation to filter employees of a dept.
Those transformations that are not connected to any other transformations are called unconnected transformations.
Their functionality is used by calling them inside other transformations like Expression transformation. These transformations are not part of the pipeline.
The connected transformations are preferred when for every input row, transformation is called or is expected to return a value. For example, for the zip codes in every row, the transformation returning city name.
The unconnected transformations are useful when their functionality is only required periodically or based upon certain conditions. For example, calculation the tax details if tax value is not available.


Types of transformations based on the change in no of rows
  • Active Transformations
  • Passive Transformations
Active Transformations are those who modifies the data rows and the number of input rows passed to them. For example, if a transformation receives ten number of rows as input, and it returns fifteen number of rows as an output then it is an active transformation. The data in the row is also modified in the active transformation.
Passive transformations are those who does not change the number of input rows. In passive transformations the number of input and output rows remain the same, only data is modified at row level.
In the passive transformation, no new rows are created, or existing rows are dropped.

Informatica: Important Concepts

Informatica Domain
The overall architecture of Informatica is Service Oriented Architecture (SOA).
  • Informatica Domain is the fundamental administrative unit in Informatica tool
  • It is a collection of nodes and services. Further, this nodes and services can be categorized into folders and sub-folders based on the administration requirement.
Node is a logical representation of a machine inside the domain. Node is required to run services and processes for Informatica.
You can have multiple nodes in a domain. In a domain, you will also find a gateway node.
The gateway node is responsible for receiving requests from different client tools and routing those requests to different nodes and services.
There are two types of services in Domain
  • Service Manager: Service manager manages domain operations like authentication, authorization, and logging. It also runs application services on the nodes as well as manages users and groups.
  • Application Services: Application service represents the server specific services like integration service, repository service, and reporting service. These services run on different nodes based upon the configuration.

PowerCenter Repository

PowerCenter repository is a relational database like Oracle, Sybase, SQL server and it is managed by repository service. It consists of database tables that store metadata.
There are three Informatica Client tools available in Informatica Powercenter. They are Informatica
  • Designer
  • Workflow Monitor
  • Workflow Manager
These clients can access to the repository using repository service only.
To manage a repository there exists an Informatica service called Repository Service. A single repository service handles exclusively only one repository. Also, a repository service can execute on multiple nodes to increase the performance.
The repository services use locks on the objects, so multiple users cannot modify the same object same time.
You can enable version control in the repository. With the version control feature, you can maintain different versions of the same object.
Objects created in the repository can have following three state
  • Valid: Valid objects are those objects whose syntax is correct according to Informatica. These objects can be used in the execution of workflows.
  • Invalid: Invalid objects are those who does not adhere to the standard or rules specified. When any object is saved in Informatica, it is checked whether its syntax and properties are valid or not, and the object is marked with the status accordingly.
  • Impacted: Impacted objects are those whose child objects are invalid. For example in a mapping if you are using a reusable transformation, and this transformation object becomes invalid then the mapping will be marked as impacted.

Domain Configuration

As mentioned earlier, domain is the basic administrative control in Informatica. It is the parent entity which consists of other services like integration service, repository service, and various nodes.
The domain configuration can be done using the Informatica admin console. The console can be launched using web browsers.

Powercenter client & Server Connectivity

PowerCenter client tools are development tools which are installed on the client machines. Powercenter designer, workflow manager, a repository manager, and workflow monitor are the main client tools.
The mappings and objects that we create in these client tools are saved in the Informatica repository which resides on the Informatica server. So the client tools must have network connectivity to the server.
On the other hand, PowerCenter client connects to the sources and targets to import the metadata and source/target structure definitions. So it also must have connectivity to the source/target systems.
  • To connect to the integration service and repository service, PowerCenter client uses TCP/IP protocols and
  • To connect to the sources/targets PowerCenter client uses ODBC drivers.

Repository Service

The repository service maintains the connections from Powercenter clients to the PowerCenter repository. It is a separate multi-threaded process, and it fetches, inserts and updates the metadata inside the repository. It is also responsible for maintaining consistency inside the repository metadata.
Integration Service
Integration service is the executing engine for the Informatica, in other words, this is the entity which executes the tasks that we create in Informatica. This is how it works
  • A user executes a workflow
  • Informatica instructs the integration service to execute the workflow
  • The integration service reads workflow details from the repository
  • Integration service starts execution of the tasks inside the workflow
  • Once execution is complete, the status of the task is updated i.e. failed, succeeded or aborted.
  • After completion of execution, session log and workflow log is generated.
  • This service is responsible for loading data into the target systems
  • The integration service also combines data from different sources
For example, it can combine data from an oracle table and a flat file source.
So, in summary, Informatica integration service is a process residing on the Informatica server waiting for tasks to be assigned for the execution. When we execute a workflow, the integration service receives a notification to execute the workflow. Then the integration service reads the workflow to know the details like which tasks it has to execute like mappings & at what timings. Then the service reads the task details from the repository and proceeds with the execution.
Sources & Targets

Informatica being an ETL and Data integration tool, you would be always handling and transforming some form of data. The input to our mappings in Informatica is called source system. We import source definitions from the source and then connect to it to fetch the source data in our mappings. There can be different types of sources and can be located at multiple locations. Based upon your requirement the target system can be a relational or flat file system. Flat file targets are generated on the Informatica server machine, which can be transferred later on using ftp.
Relational– these types of sources are database system tables. These database systems are generally owned by other applications which create and maintain this data. It can be a Customer Relationship Management Database, Human Resource Database, etc. for using such sources in Informatica we either get a replica of these datasets, or we get select privileges on these systems.
Flat Files - Flat files are most common data sources after relational databases in Informatica. A flat file can be a comma separated file, a tab delimited file or fixed width file. Informatica supports any of the code pages like ascii or Unicode. To use the flat file in Informatica, its definitions must be imported similar to as we do for relational tables.

Informatica Architecture

Before starting with Informatica, we need to understand what are the important components of Informatica and how it works.
Informatica tool consists of following services & components
Repository Service – Responsible for maintaining Informatica metadata & providing access of same to other services.
  1. Integration Service – Responsible for the movement of data from sources to targets
  2. Reporting Service - Enables the generation of reports
  3. Nodes – Computing platform where the above services are executed
  4. Informatica Designer - Used for creation of mappings between source and target
  5. Workflow Manager – Used to create workflows and other task & their execution
  6. Workflow Monitor – Used to monitor the execution of workflows
  7. Repository Manager – Used to manage objects in repository

Why do we need Informatica?

Informatica comes to the picture wherever we have a data system available and at the backend we want to perform certain operations on the data. It can be like cleaning up of data, modifying the data, etc. based on certain set of rules or simply loading of bulk data from one system to another.
Informatica offers a rich set of features like operations at row level on data, integration of data from multiple structured, semi-structured or unstructured systems, scheduling of data operation. It also has the feature of metadata, so the information about the process and data operations are also preserved.

Introduction to Informatica

Informatica is a Software development company, which offers data integration products. It offers products for ETL, data masking, data Quality, data replica, data virtualization, master data management, etc.
Informatica Powercenter ETL/Data Integration tool is a most widely used tool and in the common term when we say Informatica, it refers to the Informatica PowerCenter tool for ETL.
Informatica Powercenter is used for Data integration. It offers the capability to connect & fetch data from different heterogeneous source and processing of data.
For example, you can connect to an SQL Server Database and Oracle Database both and can integrate the data into a third system.
The latest version of Informatica PowerCenter available is 9.6.0. The different editions for the PowerCenter are:
  • Standard edition
  • Advanced edition
  • Premium edition
The popular clients using Informatica Powercenter as a data integration tool are U.S Air Force, Allianz, Fannie Mae, ING, Samsung, etc. The popular tools available in the market in competition to Informatica are IBM Datastage, Oracle OWB, Microsoft SSIS and Ab Initio.
Typical use cases for Informatica can be
  • An organization migrating from existing legacy system like mainframe to a new database system. So the migration of its existing data into a system can be performed.
  • Enterprises setting up their Data Warehouse would require an ETL tool to move data from the Production system to Warehouse.
  • Integration of data from various heterogeneous systems like multiple databases and file-based systems can be done using Informatica.
  • Informatica can be used as a data cleansing tool.
Informatica is better than its competitors as it offers a wide range of product editions. So the user can opt for a specific edition based upon the requirement. Informatica is constantly featured as Data Integration product leader in the Gartner Magic Quadrant listing.
Informatica is available for all the popular platforms. It offers cloud-based services so that with minimal setup an industry can use this tool. Informatica offers real-time data integration, web services integration, Business to business data integration (B2B), Big data edition, Master Data Management and connectors for social media and Salesforce. Forbes has quoted Informatica as the next Microsoft, this itself reflects the market share Informatica is having over its competitors.