Friday 12 December 2014

Informatica PowerCenter Session Partitioning-Type of Informatica Partitions


Informatica provides you the option of enhancing the performance of the Informatica session by the The PowerCenter® Partitioning Option. After tuning all the performance bottlenecks we can further improve the performance by addition partitions.




We can either go for Dynamic partitioning (number of partition passed as parameter) or Non-dynamic partition (number of partition are fixed while coding). Apart from used for optimizing the session, Informatica partition become useful in situations where we need to load huge volume of data or when we are using Informatica source which already has partitions defined, and using those partitions will allow to improve the session performance.
The partition attributes include setting the partition point, the number of partitions, and the partition types.

Partition Point:
There can be one or more pipelines inside a mapping. Adding a partition point will divide this pipeline into many pipeline stages. Informatica will create one partition by default for every pipeline stage. As we increase the partition points it increases the number of threads. Informatica has mainly three types of threads –Reader, Writer and Transformation Thread.

The number of partitions can be set at any partition point. We can define up to 64 partitions at any partition point in a pipeline. When you increase the number of partitions, you increase the number of processing threads, which can improve session performance. However, if you create a large number of partitions or partition points in a session that processes large amounts of data, you can overload the system.



You cannot create partition points for the following transformations:
• Source definition
• Sequence Generator
XMLParser
• XML target
• Unconnected transformations

The partition type controls how the Integration Service distributes data among partitions at partition points. The Integration Service creates a default partition type at each partition point. 

Type of partitions are :
1. Database partitioning,
2. Hash auto-keys
3. Hash user keys
4. Key range
5. Pass-through
6. Round-robin.

Database Partitioning
For Source Database Partitioning, Informatica will check the database system for the partition information if any and fetches data from corresponding node in the database into the session partitions. When you use Target database partitioning, the Integration Service loads data into corresponding database partition nodes.
Use database partitioning for Oracle and IBM DB2 sources and IBM DB2 targets. 

Pass through
Using Pass through partition will not affect the distribution of data across partitions instead it will run in single pipeline.which is by default for all your sessions. The Integration Service processes data without redistributing rows among partitions. Hence all rows in a single partition stay in the partition after crossing a pass-through partition point.



Key range
Used when we want to partition the data based on upper and lower limit. The Integration Service will distribute the rows of data based on a port or set of ports that we define as the partition key. For each port, we define a range of values.Based on the range that we define the rows are send to different partitions



Round robin partition is used to when we want to distributes rows of data evenly to all partitions

Hash auto-keys: The Integration Service uses a hash function to group rows of data among partitions. The Integration Service groups the data based on a partition key.

Hash user keys: The Integration Service uses a hash function to group rows of data among partitions. We define the number of ports to generate the partition key.

Saturday 22 November 2014

Inline Prompts in OBIEE 11g


A prompt that is created at the OBIEE analysis level is called an inline prompt. This is referred as inline prompt because the prompt is embedded in the analysis and is not stored in the Oracle BI Presentation Catalog. That means the prompt is created for that particular analysis and therefore cannot be added to other analyses.

Monday 27 October 2014

Apache AVRO - Data Serialization Framework

AVRO is an Apache open source project for data serialization and data exchange services for Hadoop .Using Avro (which functions similar to systems such as Apache Thrift, Protocol Buffers- Google's) data can be exchanged between programs written in any language. Avro is gaining new users compared to other popular serialization frameworks, for the reason that many Hadoop based tools support Avro for serialization and De-serialization.


Before we get into the features lets understand about Serialization & De-serialization.
Serialization means turning the structured objects into a bytes stream for transmission over the network or for writing to persistent storage.
De-serialization is the opposite of serialization, where we read the bytes stream or stored persistent storage and turns them into structured objects.
The serialized data which is in a binary format is accompanied with schemas allowing any application to de serialize the data.

Some Features Of Avro
  • Avro serialized data doesn't require proxy objects or code generation (unless desired for statically-typed languages). Avro uses definitions at runtime during data exchange. It always stores data structure definitions with the data making it easier to process rather than going for code generation.

Friday 5 September 2014

Application Roles in OBIEE


Default OBIEE Application Roles
Application Role
LDAP Group
Description
BIConsumer
BIConsumers
Base-level role that grants the user access to OBIEE analyses, dashboards and agents.
BIAuthor
BIAuthors
All BIConsumer rights, grants and permissions + allows users to create new analyses, dashboards and other BI objects
BIAdministrator
BIAdministrators
All BIAuthor and BIConsumer rights, grants and permissions + allows the user to administer all parts of the system.

By default, OBIEE 11g have three application roles:
  • BIConsumer : The base-level role that grants the user access to existing analyses, dashboards and agents, allows them to run or schedule existing BI Publisher reports, but not create any new ones. That means user can view the existing reports and dashboards.
  • BIAuthor : A role that allows users to create new analyses, dashboards and other BI objects. This role will recursively granted the privileges of BIConsumer role.
  • BIAdministrator : This role has the privileges of BIAuthor &BIConsumer roles. In addition to that this role allows the user to administer all parts of the system, including modifying catalog permissions and privileges. This user is the most privileged user in OBIEE environment.
In some cases, a need may arise to add another role that suits well between the BIConsumer and BIAuthor roles; one called BIAnalyst, that allows users to create and edit analyses, but not creates new dashboards.

Thursday 4 September 2014

Informatica Scenario: How to Load Data in Cyclic Order using Informatica

To Load source records from DEP table to target in a cyclic order.
In this scenario  we will load the records present in the DEP table to Flat files in a cyclic order such that the first record will go to TargetFile01 ,2nd to TargetFile 02 ,3rd to Target File03 and 4th record again to TargetFile01.



For implementing this we can make use of sequence generator  with properties as given below.


You can route the records based on the NEXT VAL of sequence Generator using a Router as given below.



The overall mapping will look like this.



Below is the expected format in which target files will be created.



Thursday 21 August 2014

How To Add Auto Increment In Oracle- IDENTITY column in Oracle 12c

Lets create employee table for this purpose.
SQL> CREATE TABLE EMP01
(
EMPID NUMBER,
NAME VARCHAR2(50),
DEPT VARCHAR2(20)
);

Next steps is to create oracle sequence to generated the id values.
SQL> CREATE SEQUENCE EMPID_SEQUENCE
START WITH 1
INCREMENT BY 1;

Next Step is to create Trigger to assign the values from sequence to EMPID column.

CREATE OR REPLACE TRIGGER EMPID_TRIGGER
BEFORE INSERT ON EMP01
FOR EACH ROW
BEGIN
SELECT EMPID_SEQUENCE.nextval INTO :NEW. EMPID  FROM dual;
END;
/
Now we will try insertng few values:

SQL> INSERT INTO EMP01 (NAME, DEPT) VALUES ('RON',’ABC’);
1 row created.
SQL> INSERT INTO EMP01 (NAME, DEPT) VALUES ('VICTORIA',’XYZ’);
SQL> SELECT * FROM EMP01;
EMPID NAME DEPT
---------- ------------------------------
1 RON ABC
2 VICTORIA XYZ


You can find that the EMPID getting incremented by 1.
Now there is a new feature available in on Oracle 12c version:IDENTITY column using which we can implement the same auto increment feature.

Tuesday 15 July 2014

How to Unlock the Locked Table in ORACLE


Oracle puts locks while performing any DDL or DML operation on oracle tables.When table locks is present on any tables in Oracle we cannot run DDL on those tables.

Some of the locks automatically set by oracle are RS and RX Locks.
SELECT … FOR UPDATE execution results in RS (row share) table lock. When you execute an INSERT, UPDATE or DELETE Oracle puts RX (row exclusive) table lock.

We have to kill the session which holds the lock in order to execute further operations. Follow the below steps to kill the session and forcibly unlock the table.

Let’s assume that 'EMP' table is locked,

Thursday 3 July 2014

How to Delete All Files In a Directory Except Few Files in LINUX /UNIX

First let’s go through the RM command in UNIX:
The rm command is used in Unix / Linux to remove a file or a directory. The syntax is as follows:
Syntax:
rm [OPTION] [FILE] 
OPTION’s include
> rm –f {file-name} – To remove files forcefully without prompting to the user.
> rm –i {file-name} – To remove files by prompting to the user for confirming.
> rm –r {file-name} – To remove files in directories recursively.


Now we will see how to delete All Files in Directory except Few files>
1) USING RM COMMAND:
To remove all files from directory /home/opt/class/ except .log and .zip
> rm /home/opt/class/ !(*.zip|*.log)

To remove all files except a specific file (file123)
> rm  !(file123)

Thursday 12 June 2014

Splunk Overview-For Operational Intelligence & Log Management

What is Splunk:
It’s a popular Advanced IT Search Tool  used by many companies which derives information from machine data. To make it more lucid, Splunk has the ability to search, monitor and analyze through all machine generated data such as log data generated by applications, servers, and network devices across an organization. This product further indexes structured as well as unstructured data and helps in diagnosing the problems, making it easy for administrators, business analysts and managers to detect requisite information.

Splunk’s web interface makes its easy to examine data as an alternative of going through each log files individually. Splunk helps in easier monitoring and log aggregation thus allowing non experts and beginners to extract important data from the log entries.

Tuesday 3 June 2014

Main Features of Teradata Database

Teradata has a Massive Parallel Processing  system, Shared Nothing Architecture thereby eliminating resource contention. Teradata is mainly designed to accommodate large data warehouse implementation and has many data warehousing features embedded into the core of the database. With linear scalability (Software can scale linearly with hardware), unconditional parallelism, multi-faceted parallelism, intelligent data distribution, parallel-aware optimizer makes Teradata is capable of handling large data and complex queries. 

Sunday 11 May 2014

Oracle Database 12C New Feature-Fetch& Offset


In this post we will discuss about a new feature that was introduced in Oracle 12c to limit the rows returned in an easier and effective manner compared to methods used in previous releases by using ROWNUM or using Top-N rows or using analytic functions like RANK or DENSE_RANK. Using this Feature  Oracle 12c provides improved support for top-n analysis.
This new Row limiting Clause in Oracle allows us to select the Top N like queries without much effort. Let’s take a look into the syntax now.

SELECT * FROM Table_Name
ORDER BY column_name DESC
[ OFFSET offset { ROW | ROWS } ]
[ FETCH { FIRST | NEXT } [ { rowcount | percent PERCENT } ]


{ ROW | ROWS } { ONLY | WITH TIES } ]

Features of The New Row limiting Clause In Oracle

Fetch Clause is to limit the number of rows returned in the result set. Fetch Keyword is followed by FIRST or NEXT which can be used interchangeably and is for semantic clarity only. Fetch clause can specify the number of rows or a percentage of rows to return which we will see by using some examples.  

Thursday 24 April 2014

What is Massively Parallel Processing (MPP)


With the advent of Big Data sets mainstream technologies like Massively Parallel Processing (MPP) systems is experiencing vital growth. Let’s discuss about what is MPP or Massively Parallel Processing (MPP)
Related Posts Plugin for WordPress, Blogger...

ShareThis