Saturday 5 November 2016

Informatica: Important Concepts

Informatica Domain
The overall architecture of Informatica is Service Oriented Architecture (SOA).
  • Informatica Domain is the fundamental administrative unit in Informatica tool
  • It is a collection of nodes and services. Further, this nodes and services can be categorized into folders and sub-folders based on the administration requirement.
Node is a logical representation of a machine inside the domain. Node is required to run services and processes for Informatica.
You can have multiple nodes in a domain. In a domain, you will also find a gateway node.
The gateway node is responsible for receiving requests from different client tools and routing those requests to different nodes and services.
There are two types of services in Domain
  • Service Manager: Service manager manages domain operations like authentication, authorization, and logging. It also runs application services on the nodes as well as manages users and groups.
  • Application Services: Application service represents the server specific services like integration service, repository service, and reporting service. These services run on different nodes based upon the configuration.

PowerCenter Repository

PowerCenter repository is a relational database like Oracle, Sybase, SQL server and it is managed by repository service. It consists of database tables that store metadata.
There are three Informatica Client tools available in Informatica Powercenter. They are Informatica
  • Designer
  • Workflow Monitor
  • Workflow Manager
These clients can access to the repository using repository service only.
To manage a repository there exists an Informatica service called Repository Service. A single repository service handles exclusively only one repository. Also, a repository service can execute on multiple nodes to increase the performance.
The repository services use locks on the objects, so multiple users cannot modify the same object same time.
You can enable version control in the repository. With the version control feature, you can maintain different versions of the same object.
Objects created in the repository can have following three state
  • Valid: Valid objects are those objects whose syntax is correct according to Informatica. These objects can be used in the execution of workflows.
  • Invalid: Invalid objects are those who does not adhere to the standard or rules specified. When any object is saved in Informatica, it is checked whether its syntax and properties are valid or not, and the object is marked with the status accordingly.
  • Impacted: Impacted objects are those whose child objects are invalid. For example in a mapping if you are using a reusable transformation, and this transformation object becomes invalid then the mapping will be marked as impacted.

Domain Configuration

As mentioned earlier, domain is the basic administrative control in Informatica. It is the parent entity which consists of other services like integration service, repository service, and various nodes.
The domain configuration can be done using the Informatica admin console. The console can be launched using web browsers.

Powercenter client & Server Connectivity

PowerCenter client tools are development tools which are installed on the client machines. Powercenter designer, workflow manager, a repository manager, and workflow monitor are the main client tools.
The mappings and objects that we create in these client tools are saved in the Informatica repository which resides on the Informatica server. So the client tools must have network connectivity to the server.
On the other hand, PowerCenter client connects to the sources and targets to import the metadata and source/target structure definitions. So it also must have connectivity to the source/target systems.
  • To connect to the integration service and repository service, PowerCenter client uses TCP/IP protocols and
  • To connect to the sources/targets PowerCenter client uses ODBC drivers.

Repository Service

The repository service maintains the connections from Powercenter clients to the PowerCenter repository. It is a separate multi-threaded process, and it fetches, inserts and updates the metadata inside the repository. It is also responsible for maintaining consistency inside the repository metadata.
Integration Service
Integration service is the executing engine for the Informatica, in other words, this is the entity which executes the tasks that we create in Informatica. This is how it works
  • A user executes a workflow
  • Informatica instructs the integration service to execute the workflow
  • The integration service reads workflow details from the repository
  • Integration service starts execution of the tasks inside the workflow
  • Once execution is complete, the status of the task is updated i.e. failed, succeeded or aborted.
  • After completion of execution, session log and workflow log is generated.
  • This service is responsible for loading data into the target systems
  • The integration service also combines data from different sources
For example, it can combine data from an oracle table and a flat file source.
So, in summary, Informatica integration service is a process residing on the Informatica server waiting for tasks to be assigned for the execution. When we execute a workflow, the integration service receives a notification to execute the workflow. Then the integration service reads the workflow to know the details like which tasks it has to execute like mappings & at what timings. Then the service reads the task details from the repository and proceeds with the execution.
Sources & Targets

Informatica being an ETL and Data integration tool, you would be always handling and transforming some form of data. The input to our mappings in Informatica is called source system. We import source definitions from the source and then connect to it to fetch the source data in our mappings. There can be different types of sources and can be located at multiple locations. Based upon your requirement the target system can be a relational or flat file system. Flat file targets are generated on the Informatica server machine, which can be transferred later on using ftp.
Relational– these types of sources are database system tables. These database systems are generally owned by other applications which create and maintain this data. It can be a Customer Relationship Management Database, Human Resource Database, etc. for using such sources in Informatica we either get a replica of these datasets, or we get select privileges on these systems.
Flat Files - Flat files are most common data sources after relational databases in Informatica. A flat file can be a comma separated file, a tab delimited file or fixed width file. Informatica supports any of the code pages like ascii or Unicode. To use the flat file in Informatica, its definitions must be imported similar to as we do for relational tables.

No comments:

Post a Comment