Informatica is one of the most powerful ETL tool with a simple visual interface. You do most of the work by simply dragging and dropping with your mouse in the Designer. This graphical approach makes it also very easy to understand what is going on (it is "self-documenting" in a sense).
It can communicate with all major databases, can move/transform data between them. It can move huge volumes of data in a very effective way. It can throttle the transactions (do big updates in small chunks to avoid long locking and filling the transactional log).
It can effectively do joins between tables in different databases on different servers. The tasks are performed by Informatica Server (Unix or MS Windows). You get a client application called "Server Manager" to work with the server.
You design your processes in a client application called "Designer". This is where you you tell what the source databases and tables will be, what will be the targets, and how you move/transform the data.
Informatica uses its own database called "Metadata Repository Database", or simply a Repository. Repository stores the data (rules) needed for data extraction, transformation, loading, and management. You get a client application "Repository Manager" to work with the repository.
General Layout of mapping :
SOURCE --> SOURCE QUALIFIER -->TRANSFORMATIONS ---> TARGET
Comprised of the flagship Informatica PowerCenter® and Informatica PowerMart® software products, as well as expanded connectivity solutions that extend even across firewalls, this data integration platform helps companies integrate and analyze data from virtually any corporate system, including ERP, CRM, supply chain, procurement, eBusiness, mainframe and legacy systems. The Informatica platform provides real-time performance, scalability and extensibility to handle the analytic requirements of even the largest enterprises.
They have different products.
- Power Center
- Power Mart
- Power Connect
- Power Channel
- Power Plug
Power Center & Power Mart :
The most 2 important products are Power Center & Power Mart. Power Mart is a subset of Power Center. Power Center has better load balancing.
Power Center : is the world-standard enterprise data integration platform for the creation and real-time management of data warehouse infrastructures and analytic applications.
- Distributed metadata (Repository)
- Global & Local Repositories (can register more than one Informatica server & can share the information in the global repository). Multiple repositories, many servers.
- Can connect to varied sources like Peoplesoft, SAP etc.
- Cost is more
- Has bridges which can transport meta data from other tools (like Erwin)
Power Mart : is a departmental data integration platform and can be expanded as a company's needs grow to an enterprise level with Informatica PowerCenter 5
- Only Local Repository. (Can’t register other repositories). Single repository, one server.
- Cannot connect to varied sources like Peoplesoft, SAP etc.
- Cost is less
Mappings : defines the flow of data from source to target database. A mapping contains more than one transformation. After getting the Source & Target, we define the mappings. The definitions of the mappings are stored in the repository.
Informatica Repository Server : A server used for managing the information with in the meta data repository. All the clients will be communicated with the Repository server. The information in the repository is managed by Repository Server.
When ever we are working with client tool, the first task is logon to repository. We need to have a valid user to logon. The communication between the client tools & Repository Server is TCP/IP
Informatica Server : is responsible for loading of data.Parts of the Informatica Server :
1.. Load Manager
2. Data Transformation Manager
Informatica Repository : There are 2 types of repositories. Global & Local repositories(with reference to Power Center). With Power Mart only one local repository. The information of all local repositories are located in Global repository. Shared information can be put in global repository.
Folders : To organize the files (like containers). Use to organize the information with in the repository. There can be one or more folders with in a repository. Each folder is independent of the other. Whenever we are going to work, work with in the folder. A folder contains a pre-defined set of sub folders.
- Sources (Contains the source database object definitions)
- Targets (Target db object definitions)
- Dimensions (Dimension definitions)
- Cubes (Cube definitions)
- Transformations (contains reusable transformation definitions)
- Mapplets (A mapplet is a reusable part of a mapping which contains one or more reusable transformations. Contains the mapplet definitions)
- Mappings (Mapping definitions)
- Business Components (Business is classified as far as the business usage)
- Sessions (contains session definitions. A session is a task based on a mapping)
Informatica PowerCenter : has different tools
Can create and manage repository and repository objects(Users& Folders). Can take back up and perform recoveries. Can perform security related issues. Permissions etc.
Two different admin users created for repository are Administrator and the database user of repository schema
Two default user groups created in the repository are Administrator and Public
- define / create the source database objects
- define / create target database objects
- define transformations
- define mapplets
- define mappings
- validate the mappings & transformations
- debug the sessions
- compare different objects
- define the dimensions & cubes
Source Database : Basically from where we are going to retrieve the data. Source is an existing one. The source is read only. Ex : Flat files, XML files, Cobol files, Oracle, SQL Server, Sybase, Informix, DB/2, ERP(SAP, People Soft), CRM, ODBC.From the client we connect to the source through ODBC and get the meta data definitions of the source and then store in the repository. If a connection is not possible, we has to be manually defined.
Target Database : Actual Warehouse database. Objects may/may not exists. Can have read, write access to the target database.
Connection : defines the communication path. (ODBC, TCP/IP, Native)
Transformations : are the programs which are used to transform the data from one form to other. 14/15 transformations.Here are some transformations:
Aggregator - to do things like "group by".
Source Qualifier - ...
Expression - to use various expressions.
External procedure - ...
Filter - to filter data.
Joiner - to make joins between separate databases, file, ODBC sources.
Lookup - to create local copy of the data.
Normalizer - to transform denormalized data into normalized data.
Rank - to select only top (or bottom) ranked data.
Sequence Generator - to generate unique IDs for target tables.
Source Qualifier - to filter sources (SQL, select distinct, join, etc.)
Stored Procedure - to run stored procedures in the database - and capture their returned values.
Update Strategy - to flag records in target for insert,delete, update (defined inside a mapping).
Meta Data : Data about data / information about the data.
MetaData Repository : basically where the meta data definitions are stored. It can be located in any RDBMS (Oracle, SQL Server, DB/2, Sybase, Informix, ODBC).We call this Repository as Power Center / Power Mart / Informatica Repository).