Oracle Business Intelligence : OBIEE Interview Questions Part

Does OBIEE support ragged hierarchies?

Yes.

You are trying to open a repository using Admin tool and when you click to say “Open Online”; a dialogue box pops up saying “Your RPD is available in read-only” mode. How can you edit this repository by opening online?

One of the reasons you get this message because: If you are getting problem, because last time when RPD is open you restarted the BI Server. Never ever restart the server when the RPD is open in ONLINE or OFFLINE. However as a best practice close the RPD whenever you want to restart the BI services, every time.

How do you set up usage tracking in OBIEE? Is the Usage Tracking mechanism in OBIEE configurable to capture new identified metrics by your department?

Read about Usage Tracking

What is the default configuration for caching in NQSConfig.ini file? How method does the OBIEE use for clearing its cache?

YES. In Admin tool, ManageàCacheàPurge

In PS, Manage session à Close all cursors.

What is MUDE/ MUD in OBIEE? On what basis would you create projects?

Read about MUDE

Two people (Developer A and Developer B) are assigned to the same project inside MUD and opened the same project simultaneously. Developer A made some changes to the project and merged his changes to the original repository. Developer B also made some changes and committed his changes to the original repository? Does the MU environment preserve both Developer A and Developer B changes?

YES but check it.

In MUDE, Can two resources checkout the same project simultaneously?

In MUDE, what happens if the physical tables are shared between projects?

What are the different types of utilities provided in OBIEE? Explain any two utilities used in your previous projects and what are they used for?

Check Utilities

What are the different documentation mechanisms available in Admin tool? How do you create documentation for your Answers users against all objects available in your subject areas?

Check Utilities

How can we create nested presentation folders (nested presentation tables) in your presentation catalog? Let’s say we have ‘Facts’ all lumped together in one folder and sub divide these facts as Facts – Logical and Facts – Strategic folders? How would you create this nested structure in presentation catalog?

To give the appearance of nested folders in Oracle BI Answers, prefix the name of the presentation folder to be nested with a hyphen and a space and place it after the folder in which it nests.

Alternatively, you can enter a hyphen and a greater than sign in the Description field. For example, to nest the SalesFacts folder in the Facts folder, place the SalesFacts folder directly after Facts in the Presentation layer and rename it “- SalesFacts.” When Answers displays the folder, it omits the hyphen and space from the folder name. It is possible to nest multiple folders under a single folder; however, only one level of nesting is possible.

What are logical keys? Why would you need to create them? Does the physical key get automatically converted to logical key when the table is moved from physical layer to business model?

After creating tables in the Business Model and Mapping layer, you specify a primary key for each table. Logical dimension tables must have a logical primary key. Logical keys can be composed of one or more logical columns.

NOTE: Logical keys are optional for logical fact tables. However, it is recommended that you do not specify logical keys for logical fact tables.

When you drag physical tables (with key and foreign key relationships defined) to a business model, logical keys and joins are created that mirror the keys and joins in the physical layer. This occurs only if the tables that you drag include the table with the foreign keys. Additionally, if you create new tables or subsequently drag additional tables from the Physical layer to the Business Model and Mapping layer, the logical links between the new or newly dragged tables and the previously dragged tables must be created manually.

Let’s say you have a report with 4 dimensional attributes and 2 fact measures in the report. Whats the default sort behavior of OBIEE when you try to run the report? On what column/columns does it sort? How do you know this?

Default sort order will be from left to right columns. It sorts on Dimension columns not on facts. It is seen in the results. (Elaborate it later)

Maximum of Connection pools

The maximum number of connections allowed for this connection pool. The default is 10. This value should be determined by the database make and model and the configuration of the hardware box on which the database runs as well as the number of concurrent users who require access.

NOTE: For deployments with Intelligence Dashboard pages, consider estimating this value at 10% to 20% of the number of simultaneous users multiplied by the number of requests on a dashboard. This number may be adjusted based on usage. The total number of all connections in the repository should be less than 800. To estimate the maximum connections needed for a connection pool dedicated to an initialization block, you might use the number of users concurrently logged on during initialization block execution.

Define pipeline. Did you use it in your projects?
Yes, pipelines are the stages in a particular transaction. Assessment, finance etc.

How do you create filter on repository?
Where condition on content tab.

How do you work in a multi user environment? What are the steps?

Create a shared directory on the network for Multi-user Development (MUD).
Open the RPD to use in MUD. From Tools->Options, setup the MUD directory to point to the above directory.
Define projects within the RPD to allow multiple users to develop within their subject area or Facts.
Save and move the RPD to the shared directory setup in point 1.
When users work in the MUD mode, they open the admin tool and start with
MUD ->Checkout to check out the project they need to work on (not use the File open as you would usually do).
After completely the development, user checking the changes back to the network and merge the changes.

Where are passwords for userid? LDAP, external table authentication stored respectively?
Passwords for userid are in OBIEE server repository LDAP authentication in LDAP server external database in a table in external database

Can you bypass OBIEE server security? if so how?
Yes you can by-pass by setting authententication type in NQSCONFIG file in the security section as:authentication_type=bypass_nqs.instanceconfig.xml and nqsconfig.ini are the 2 places

Where can you add new groups and set permissions?
You can add groups by going to manage>security>add new groups> You can give permissions to a group for query limitation and filter conditions.

What are the things you can do in the BMM layer?
Aggregation navigation, level base metrics, time series wizard, create new logical column, complex join.

What is ragged hierarchy and how do you manage it?
Ragged Hierarchy is one of the different kinds of hierarchy.
A hierarchy in which each level has a consistent meaning, but the branches have inconsistent depths because at least one member attributes in a branch level is unpopulated. A ragged hierarchy can represent a geographic hierarchy in which the meaning of each level such as city or country is used consistently, but the depth of the hierarchy varies.
For example, a geographic hierarchy that has Continent, Country, Province/State, and City levels defined. One branch has North America as the Continent, United States as the Country, California as the Province or State, and San Francisco as the City. However, the hierarchy becomes ragged when one member does not have an entry at all of the levels. For example, another branch has Europe as the Continent, Greece as the Country, and Athens as the City, but has no entry for the Province or State level because this level is not applicable to Greece for the business model in this example. In this example, the Greece and United States branches descend to different depths, creating a ragged hierarchy.

What is the difference between Single Logical Table Source and Multiple Logical Table Sources?
If a logical table in BMM layer has only one Table as the source table then it is Single LTS.
If the logical table in BMM layer has more than one table as the sources to it then it is called Multiple LTS.
Ex: Usually Fact table has Multiple LTS’, for which sources will be coming from different Physical tables.

Can you let me know how many aggregate tables you have in your project? On what basis have you created them?
As per resume justification document

How do you bring/relate the aggregate tables into the OBIEE Logical layer?
One way of bringing the Aggregate Tables into the BMM layer is by bringing them as Logical Table sources for the corresponding Fact table.
This is done by dragging and dropping the aggregate table into the corresponding fact table. After doing that establish the column mappings and the set the aggregation levels.

How do you know which report is hitting which table, either the fact table or the aggregate table?
After running the report, go to Administration tab and go to click on Manage Sessions. There you can find the queries that are run and in the View Log option in the Session Management you can find which report is hitting which table.

Suppose I have report which is running for about 3 minutes typically. What is the first step you take to improve the performance of the query?
Find the sql query of the report in Admin->manage Session-> run the sql query on toad ->read the explain plan output ->modify the SQL based on the explain plan output

Suppose you have a report which has the option of running on aggregate table. How does the tool know to hit the Aggregate table and for that what the steps you follow to configure them?

Explain the process of Aggregate navigation?

Have you heard of Implicit Facts? If, so what are they?
An implicit fact column is a column that will be added to a query when it contains columns from two or more dimension tables and no measures. You will not see the column in the results. It is used to specify a default join path between dimension tables when there are several possible alternatives.
For example, there might be many star schemas in the database that have the Campaign dimension and the Customer dimension, such as the following stars:
Campaign History star. Stores customers targeted in campaign.
Campaign Response star. Stores customer responses to a campaign.
Order star. Stores customers who placed orders as a result of a campaign.
In this example, because Campaign and Customer information might appear in many segmentation catalogs, users selecting to count customers from the targeted campaigns catalog would be expecting to count customers that have been targeted in specific campaigns.
To make sure that the join relationship between Customers and Campaigns is through the campaign history fact table, a campaign history implicit fact needs to be specified in Campaign History segmentation catalog. The following guidelines should be followed in creating
segmentation catalogs:
Each segmentation catalog should be created so that all columns come from only one physical star.
Because the Marketing module user interface has special features that allow users to specify their aggregations, level-based measures typically should not be exposed to segmentation users in a segmentation catalog.

What is aggregate navigation? How do you configure the Aggregate tables in OBIEE?
Aggregate tables store pre-computed results, which are measures that have been aggregated (typically summed) over a set of dimensional attributes. Using aggregate tables is a very popular technique for speeding up query response times in decision support systems.
If you are writing SQL queries or using a tool that only understands what physical tables exist (and not their meaning), taking advantage of aggregate tables and putting them to good use becomes more difficult as the number of aggregate tables increases. The aggregate navigation capability of the OBIEE Server, however, allows queries to use the information stored in aggregate tables automatically, without query authors or query tools having to specify aggregate tables in their queries. The OBIEE Server allows you to concentrate on asking the right business question; the server decides which tables provide the fastest answers.

(Assume you are in BMM layer) We have 4 dimension tables, in that, 2 tables need to have hierarchy, then in such a case is it mandatory to create hierarchies for all the dimension tables?
No, it’s not mandatory to define hierarchies to other Dimension tables.

Can you have multiple data sources in OBIEE?
Yes.

How do you deal with case statement and expressions in OBIEE?
Use expression builder to create case when…then... end statement

Do you know about Initialization Blocks? Can you give me an example where you used them?
Init blocks are used for instantiating a session when a user logs in.

To create dynamic variable you have to create IB to write sql statement.

What is query repository tool?
It is utility of Seibel/OBIEE Admin tool
allows you to examine the repository metadata tool
for example: search for objects based on name, type.
Examine relationship between metadata objects like which column in the presentation layer maps to which table in physical layer

What is JDK and why do we need it?
Java Development Kit (JDK), a software package that contains the minimal set of tools needed to write, compile, debug, and run Java applets.

Oracle doesn’t recommend Opaque Views because of performance considerations, so why/when do we use them?
An opaque view is a physical layer table that consists of select statement. An opaque view should be used only if there is no other solution.

Can you migrate the presentation layer to a different server?
No we have to migrate the whole web & RPD files

How do you identify what are the dimension tables and how do you decide them during the Business/Data modeling?
Dimension tables contain descriptions that data analysts use as they query the database. For example, the Store table contains store names and addresses; the Product table contains product packaging information; and the Period table contains month, quarter, and year values. Every table contains a primary key that consists of one or more columns; each row in a table is uniquely identified by its primary-key value or values

Why do we have multiple LTS in BMM layer? What is the purpose?
To improve the performance and query response time.

What is the full form of RPD?
There is no full form for RPD as such, it is just a repository file (Rapidfile Database)

How do I disable cache for only 2 particular tables?

In the physical layer, right click on the table there we will have the option which says cacheable

How do you split a table in the RPD given the condition? (The condition given was Broker and customer in the same table) Split Broker and customer.

We need to make an alias table in the physical layer.

What type of protocol did you use in SAS (Statistical Analysis System)?
TCP/IP

Did you create any new logical column in BMM layer, how?

Yes. We can create new logical column in BMM layer.

Example: Right click on fact table -new logical column-give name for new logical column like Total cost.
Now in fact table source, we have one option column mapping, in that we can do all calculation for that new column.

Can you use physical join in BMM layer?
Yes, we can use physical join in BMM layer. When there is SCD type 2 we need complex join in BMM layer.

Can you use outer join in BMM layer?

Yes we can. When we are doing complex join in BMM layer, there is one option type, outer join is there.

What are other ways of improving summary query reports other than Aggregate Navigation and Cache Management?

1. Indexes

2. Join algorithm

3. Mat/view query rewrite

4. Web proper report design it’s optimal by making sure that it is not getting any addition column or rows

What is level-based metrics?
Level-based metrics means, having a measure pinned at a certain level of the dimension. For Example, if you have a measure called Dollars, you can create a Level Based Measure called Yearly Dollars which (you guessed it) is Dollars for a Year. This measure will always return the value for the year even if you drill down to a lower level like quarter, month… etc. To create a level based measure, create a new logical column based on the original measure (like Dollars in the example above). Drag and drop the new logical column to the appropriate level in the Dimension hierarchy (in the above example you will drag and drop it to Year in Time Dim
o A LBM is a metric that is defined for a specific level or intersection of levels.
o Monthly Total Sales or Quarterly Sales are the examples.
o You can compare monthly sales with quarterly sales. You can compare customer orders this quarter to orders this year

What is logging level? Where can you set logging levels?
You can enable logging level for individual users; you cannot configure a logging level for a group.
Set the logging level based on the amount of logging you want to do. In normal operations, logging is generally disabled (the logging level is set to 0). If you decide to enable logging, choose a logging
level of 1 or 2. These two levels are designed for use by OBIEE Server administrators.

Set Logging Level
1. In the Administration Tool, select Manage > Security.
2. The Security Manager dialog box appears.
3. Double-click the user’s user ID.
4. The User dialog box appears.
5. Set the logging level by clicking the Up or down arrows next to the Logging Level field

What is variable in Siebel?
You can use variables in a repository to streamline administrative tasks and modify metadata content dynamically to adjust to a changing data environment. The Administration Tool includes a Variable Manager for defining variables

What is system variable and non-system variable?
System variables
System variables are session variables that the OBIEE Server and OBIEE Web use for specific purposes. System variables have reserved names, which cannot be used for other kinds of variables (such as static or dynamic repository variables, or for non-system session variables).
When using these variables in the Web, preface their names with NQ_SESSION. For example, to filter a column on the value of the variable LOGLEVEL set the filter to the Variable NQ_SESSION.LOGLEVEL.
Non-system variables.
A common use for non-system session variables is setting user filters. For example, you could define a non-system variable called SalesRegion that would be initialized to the name of the user’s sales region. You could then set a security filter for all members of a group that would allow them to see only data pertinent to their region.
o When using these variables in the Web, preface their names with NQ_SESSION. For example, to filter a column on the value of the variable SalesRegion set the filter to the Variable NQ_SESSION.SalesRegion.

What are different types of variables? Explain each.
There are two classes of variables:
1. Repository variables
2. Session variables.
Repository variables.
A repository variable has a single value at any point in time. There are two types of repository variables:
Static: This value persists, and does not change until a OBIEE Server administrator decides to change it.
Dynamic: The values are refreshed by data returned from queries. When defining a dynamic repository variable, you will create an initialization block or use a preexisting one that contains a SQL query. You will also set up a schedule that the OBIEE Server will follow to execute the query and periodically refresh the value of the variable.
Session Variables
Session variables are created and assigned a value when each user logs on. There are two types of session variables:
1.System
2.Non-System.

What are the cache management? Name all of them and their uses. For Event polling table do u need the table in your physical layer?
Monitoring and managing the cache is cache management. There are three ways to do that.
Disable caching for the system.(INI NQ config file), Cashe persistence time for specified physical tables and Setting event polling table.
Disable caching for the system.(INI NQ config file :
You can disable caching for the whole system by setting the ENABLE parameter to NO in the NQSConfig.INI file and restarting the OBIEE Server. Disabling caching stops all new cache entries and stops any new queries from using the existing cache. Disabling caching allows you to enable it at a later time without losing any entries already stored in the cache.
Cache persistence time for specified physical tables:
You can specify a cacheable attribute for each physical table; that is, if queries involving the specified table can be added to the cache to answer future queries. To enable caching for a particular physical table, select the table in the Physical layer of the Administration Tool and select the option Make table cacheable in the General tab of the Physical Table properties dialog box. You can also use the Cache Persistence Time settings to specify how long the entries for this table should persist in the query cache. This is useful for OLTP data sources and other data sources that are updated frequently, potentially down to every few seconds.
Setting event polling table:

OBIEE Server event polling tables store information about updates in the underlying databases. An application (such as an application that loads data into a data mart) could be configured to add rows to an event polling table each time a database table is updated. The Analytics server polls this table at set intervals and invalidates any cache entries corresponding to the updated tables.
For event polling table, it is a standalone table and doesn’t require to be joined with other tables in the physical layer

What is Authentication? How many types of authentication.
Authentication is the process by which a system verifies, through the use of a user ID and password, that a user has the necessary permissions and authorizations to log in and access data. The OBIEE Server authenticates each connection request it receives.

1. Operating system authentication

2. External table authentication

3. Database authentication

4. LDAP authentication

What is object level security?
There are two types of object level security: Repository level and Web level
Repository level: In presentation layer we can set Repository level security by giving permission or deny permission to users/groups to see particular table or column.
Web level: this provides security for objects stored in the OBIEE web catalog, such as dashboards, dashboards pages, folder, and reports you can only view the objects for which you are authorized. For example, a mid-level manager may not be granted access to a dashboard containing summary information for an entire department.

What is data level security?
This controls the type an amount of data that you can see in a report. When multiple users run the same report the results that are returned to each depend on their access rights and roles in the organization. For example a sales vice president sees results for all regions, while a sales representative for a particular region sees only data for that region.

What is the difference between Data Level Security and Object Level Security?
Data level security controls the type and amount of data that you can see in a reports. Security provides security for objects stored in the OBIEE web catalog, like dashboards, dashboards pages, folder, and reports.

How do you implement security using External Tables and LDAP?
Instead of storing user IDs and passwords in OBIEE Server repository, you can maintain lists of users and their passwords in an external database table and use this table for authentication purposes. The external database table contains user IDs and passwords, and could contain other information, including group membership and display names used for OBIEE Web users. The table could also contain the names of specific database catalogs or schemas to use for each user when querying data
Instead of storing user IDs and passwords in a OBIEE Server repository, you can have the OBIEE Server pass the user ID and password entered by the user to an LDAP(Lightweight Directory Access Protocol ) server for authentication. The server uses clear text passwords in LDAP authentication. Make sure your LDAP servers are set up to allow this.

If you have 2 fact and you want to do report on one with quarter level and the other with month level how do you do that with just one time dimension?
Using level base metrics.

Did you work on a standalone Siebel system or was it integrated to other platforms?
Deploying the OBIEE platform without other Siebel applications is called OBIEE Stand -Alone .If your deployment includes other OBIEE Application it called integrated analytics -You can say Stand-Alone OBIEE

How to sort columns in RPD and web?
Sorting on web column, sort in the RPD its sort order column

If you want to create new logical column where will you create (in repository or dashboard) why?
I will create new logical column in repository. Because if it is in repository, you can use for any report. If you create new logical column in dashboard then it is going to effect on those reports, which are on that dashboard. you cannot use that new logical column for other dashboard(or request)

What is complex join, and where it is used?
We can join dimension table and fact table in BMM layer using complex join. When there is SCD type 2 we have to use complex join in BMM layer.

If you have dimension table like customer, item, time and fact table like sale and if you want to find out how often a customer comes to store and buys a particular item, what will you do?

Write a query as SELECT customer_name, item_name, sale_date, sum(qty) FROM customer_dim a, item_dim b, time_dim c, sale_fact d WHERE d.cust_key = a.cust_key AND d.item_key = b.item_key AND d.time_key = c.time_key GROUP BY customer_name, item_name, sale_date

You worked on standalone or integrated system?
Standalone.

If you want to limit the users by the certain region to access only certain data, what would you do?
Using data level security.
OBIEE Administrator: go to Manage -> Security in left hand pane u will find the user, groups, LDAP server, Hierarchy

What you can do is select the user and right click and go to properties, you will find two tabs named as users and logon, go to user tab and click at permission button in front of user name you have selected as soon as u click at permission you will get a new window with user group permission having three tabs named as general ,query limits and filter and you can specify your condition at filter tab, in which you can select presentation table ,presentation columns ,logical table and logical columns where you can apply the condition according to your requirement for the selected user or groups.

If there are 100 users accessing data, and you want to know the logging details of all the users, where can you find that?
To set a user’s logging level
1. In the Administration Tool, select Manage > Security.
The Security Manager Dialog box appears.
2. Double-click the user’s user ID. The User dialog box appears.
3. Set the logging level by clicking the Up or down arrows next to the Logging Level field

How do implement event polling table?
OBIEE Server event polling tables store information about updates in the underlying databases. An application (such as an application that loads data into a data mart) could be configured to add rows to an event polling table each time a database table is updated. The Analytics server polls this table at set intervals and invalidates any cache entries corresponding to the updated tables.

Data Warehouse Questions:

What is Data Warehousing?

A data warehouse is the main repository of an organization's historical data, it is corporate memory. It contains the raw material for management's decision support system. The critical factor leading to the use of a data warehouse is that a data analyst can perform complex queries and analysis, such as data mining, on the information without slowing down the operational systems. Data warehousing collection of data designed to support management decision making. Data warehouses contain a wide variety of data that present a coherent picture of business conditions at a single point in time. It is a repository of integrated information, available for queries and analysis.

What is Business Intelligence (BI)?

Business Intelligence (BI) refers to technologies, applications and practices for the collection, integration, analysis, and presentation of business information and sometimes to the information itself. The purpose of business intelligence is to support better business decision making. Thus, BI is also described as a decision support system (DSS).

BI systems provide historical, current, and predictive views of business operations, most often using data that has been gathered into a data warehouse or a data mart and occasionally working from operational data.

What is Dimension table?

Dimensional table contains textual attributes of measurements stored in the facts tables. Dimensional table is a collection of hierarchies, categories and logic which can be used for user to traverse in hierarchy nodes.

What is Dimensional Modelling?

Dimensional data model concept involves two types of tables and it is different from the 3rd normal form. This concept uses Facts table which contains the measurements of the business and Dimension table which contains the context (dimension of calculation) of the measurements.

What is Fact table?

Fact table contains measurements of business process. Fact table contains the foreign keys for the dimension tables. Example, if you are business process is "paper production", "average production of paper by one machine" or "weekly production of paper" will be considered as measurement of business process.

What are fundamental stages of Data Warehousing?

There are four different fundamental stages of Data Warehousing.

Offline Operational Databases:

Data warehouses in this initial stage are developed by simply copying the database of an operational system to an off‐line server where the processing load of reporting does not impact on the operational system's performance.

Offline Data Warehouse:

Data warehouses in this stage of evolution are updated on a regular time cycle (usually daily, weekly or monthly) from the operational systems and the data is stored in an integrated reporting‐oriented data structure

Real Time Data Warehouse:

Data warehouses at this stage are updated on a transaction or event basis, every time an operational system performs a transaction (e.g. an order or a delivery or a booking etc.)

Integrated Data Warehouse:

Data warehouses at this stage are used to generate activity or transactions that are passed back into the operational systems for use in the daily activity of the organization.

What are the Different methods of loading Dimension tables?

There are two different ways to load data in dimension tables.

Conventional (Slow):

All the constraints and keys are validated against the data before, it is loaded, this way data integrity is maintained.

Direct (Fast):

All the constraints and keys are disabled before the data is loaded. Once data is loaded, it is validated against all the constraints and keys. If data is found invalid or dirty it is not included in index and all future processes are skipped on this data.

Describes the foreign key columns in fact table and dimension table?

Foreign keys of dimension tables are primary keys of entity tables.

Foreign keys of facts tables are primary keys of Dimension tables.

What is Data Mining?

Data Mining is the process of analyzing data from different perspectives and summarizing it into useful information.

What is the difference between view and materialized view?

A view takes the output of a query and makes it appear like a virtual table and it can be used in place of tables.

A materialized view provides indirect access to table data by storing the results of a query in a separate schema object.

What is OLTP?

OLTP is abbreviation of On‐Line Transaction Processing. This system is an application that modifies data the instance it receives and has a large number of concurrent users.

What is OLAP?

OLAP is abbreviation of Online Analytical Processing. This system is an application that collects, manages, processes and presents multidimensional data for analysis and management purposes.

What is the difference between OLTP and OLAP?

Data Source

OLTP: Operational data is from original data source of the data

OLAP: Consolidation data is from various sources.

Process Goal

OLTP: Snapshot of business processes which does fundamental business tasks

OLAP: Multi‐dimensional views of business activities of planning and decision making

Queries and Process Scripts

OLTP: Simple quick running queries ran by users.

OLAP: Complex long running queries by system to update the aggregated data.

Database Design

OLTP: Normalized small database. Speed will be not an issue due to smaller database and normalization will not degrade performance. This adopts entity relationship(ER) model and an application‐oriented database design.

OLAP: De‐normalized large database. Speed is issue due to larger database and de‐normalizing will improve performance as there will be lesser tables to scan while performing tasks. This adopts star, snowflake or fact constellation mode of subject‐oriented database design.

Back up and System Administration

OLTP: Regular Database backup and system administration can do the job.

OLAP: Reloading the OLTP data is good considered as good backup option.

What are normalization forms?

There are different types of normalization forms like,

1NF: Eliminate Repeating Groups

Make a separate table for each set of related attributes, and give each table a primary key. Each field contains at most one value from its attribute domain.

2NF: Eliminate Redundant Data

If an attribute depends on only part of a multi‐valued key, remove it to a separate table.

3NF: Eliminate Columns Not Dependent On Key

If attributes do not contribute to a description of the key, remove them to a separate table. All attributes must be directly dependent on the primary key

BCNF: Boyce‐Codd Normal Form

If there are non‐trivial dependencies between candidate key attributes, separate them out into distinct tables.

4NF: Isolate Independent Multiple Relationships

No table may contain two or more 1:n or n:m relationships that are not directly related.

5NF: Isolate Semantically Related Multiple Relationships

There may be practical constrains on information that justify separating logically related many‐to‐many relationships.

ONF: Optimal Normal Form

A model limited to only simple (elemental) facts, as expressed in Object Role Model notation.

DKNF: Domain‐Key Normal Form A model free from all modification anomalies.

Remember, these normalization guidelines are cumulative. For a database to be in

3NF, it must first fulfil all the criteria of a 2NF and 1NF database.

What is ODS?

ODS is abbreviation of Operational Data Store. A database structure that is a repository for near real‐time operational data rather than long term trend data. The

ODS may further become the enterprise shared operational database, allowing operational systems that are being re‐engineered to use the ODS as there operation databases.

What is ER Diagram?

Entity Relationship Diagrams are a major data modelling tool and will help organize

The data in your project into entities and define the relationships between the

Entities. This process has proved to enable the analyst to produce a good database

Structure so that the data can be stored and retrieved in a most efficient manner.

An entity‐relationship (ER) diagram is a specialized graphic that illustrates the

Interrelationships between entities in a database. A type of diagram used in data

Modelling for relational data bases. These diagrams show the structure of each table

And the links between tables.

What is ETL?

ETL is abbreviation of extract, transform, and load. ETL is software that enables

Businesses to consolidate their disparate data while moving it from place to place,

And it doesn't really matter that that data is in different forms or formats. The data

Can come from any source. ETL is powerful enough to handle such data disparities.

First, the extract function reads data from a specified source database and extracts a desired subset of data. Next, the transform function works with the acquired data using rules or lookup tables, or creating combinations with other data ‐ to convert it to the desired state. Finally, the load function is used to write the resulting data to a target database.

What is VLDB?

VLDB is abbreviation of Very Large Database. A one terabyte database would

Normally be considered to be a VLDB. Typically, these are decision support systems

Or transaction processing applications serving large numbers of users.

Is OLTP database is design optimal for Data Warehouse?

No. OLTP database tables are normalized and it will add additional time to queries to return results. Additionally OLTP database is smaller and it does not contain longer period (many years) data, which needs to be analysed. An OLTP system is basically ER model and not Dimensional Model. If a complex query is executed on an OLTP system, it may cause a heavy overhead on the OLTP server that will affect the normal business processes.

If de‐normalized is improves data warehouse processes, why fact table is in normal form?

Foreign keys of facts tables are primary keys of Dimension tables. It is clear that fact table contains columns which are primary key to other table that itself make normal form table.

What are lookup tables?

A lookup table is the table placed on the target table based upon the primary key of

The target, it just updates the table by allowing only modified (new or updated)

Records based on the lookup condition.

What are Aggregate tables?

Aggregate table contains the summary of existing warehouse data which is grouped

To certain levels of dimensions. It is always easy to retrieve data from aggregated

Tables than visiting original table which has million records. Aggregate tables reduce the load in the database server and increase the performance of the query and can retrieve the result quickly.

What is real time data‐warehousing?

Data warehousing captures business activity data. Real‐time data warehousing

Captures business activity data as it occurs. As soon as the business activity is

Complete and there is data about it, the completed activity data flows into the data

Warehouse and becomes available instantly.

What are conformed dimensions?

Conformed dimensions mean the exact same thing with every possible fact table to

Which they are joined. They are common to the cubes.

What is conformed fact?

Conformed dimensions are the dimensions which can be used across multiple Data

Marts in combination with multiple facts tables accordingly.

How do you load the time dimension?

Time dimensions are usually loaded by a program that loops through all possible

Dates that may appear in the data. 100 years may be represented in a time

Dimension, with one row per day.

What is a level of Granularity of a fact table?

Level of granularity means level of detail that you put into the fact table in a data

Warehouse. Level of granularity would mean what detail you are willing to put for each transactional fact.

What are non‐additive facts?

Non‐additive facts are facts that cannot be summed up for any of the dimensions

Present in the fact table. However they are not considered as useless. If there are

Changes in dimensions the same facts can be useful.

What is fact less facts table?

A fact table which does not contain numeric fact columns it is called fact less facts table.

What are slowly changing dimensions (SCD)?

SCD is abbreviation of slowly changing dimensions. SCD applies to cases where the attribute for a record varies over time. There are three different types of SCD.

1) SCD1: The new record replaces the original record. Only one record exist in database ‐ current data.

2) SCD2: A new record is added into the customer dimension table. Two records exist in database ‐ current data and previous history data.

3) SCD3: The original data is modified to include new data. One record exist in database ‐ new information are attached with old information in same row.

What is hybrid slowly changing dimension?

Hybrid SCDs are combination of both SCD 1 and SCD 2. It may happen that in a table, some columns are important and we need to track changes for them i.e. capture the historical data for them whereas in some columns even if the data changes, we don't care.

What is BUS Schema?

BUS Schema is composed of a master suite of confirmed dimension and standardized definition if facts.

What is a Star Schema?

Star schema is a type of organizing the tables such that we can retrieve the result from the database quickly in the warehouse environment.

What Snow Flake Schema?

Snowflake Schema, each dimension has a primary dimension table, to which one or more additional dimensions can join. The primary dimension table is the only table that can join to the fact table.

Differences between star and snowflake schema?

Star schema: A single fact table with N number of Dimension, all dimensions will be linked directly with a fact table. This schema is de-normalized and results in simple join and less complex query as well as faster results.

Snow schema: Any dimensions with extended dimensions are known as Snowflake schema, dimensions maybe interlinked or may have one to many relationship with other tables. This schema is normalized and results in complex join and very complex query as well as slower results.

What is Difference between ER Modelling and Dimensional Modelling?

ER modelling is used for normalizing the OLTP database design. Dimensional

Modelling is used for de‐normalizing the ROLAP/MOLAP design.

What is degenerate dimension table?

If a table contains the values, which is neither dimension nor measures is called degenerated dimensions.

Why is Data Modelling Important?

Data modelling is probably the most labour intensive and time consuming part of the

Development process. The goal of the data model is to make sure that the all data

Objects required by the database are completely and accurately represented.

Because the data model uses easily understood notations and natural language, it

Can be reviewed and verified as correct by the end‐users.

In computer science, data modelling is the process of creating a data model by applying a data model theory to create a data model instance. A data model theory is a formal data model description. When data modelling, we are structuring and organizing data. These data structures are then typically implemented in a database management system. In addition to defining and organizing the data, data modelling will impose (implicitly or explicitly) constraints or limitations on the data placed within the structure.

Managing large quantities of structured and unstructured data is a primary function of information systems. Data models describe structured data for storage in data management systems such as relational databases. They typically do not describe unstructured data, such as word processing documents, email messages, pictures, digital audio, and video. (Reference: Wikipedia)

What is surrogate key?

Surrogate key is a substitution for the natural primary key. It is just a unique

Identifier or number for each row that can be used for the primary key to the table.

The only requirement for a surrogate primary key is that it is unique for each row in the table. It is useful because the natural primary key can change and this makes updates more difficult. Surrogated keys are always integer or numeric.

What is junk dimension?

A number of very small dimensions might be lumped together to form a single dimension, a junk dimension ‐ the attributes are not closely related. Grouping of

Random flags and text attributes in a dimension and moving them to a separate sub dimension is known as junk dimension.

What is Data Mart?

A data mart (DM) is a specialized version of a data warehouse (DW). Like data warehouses, data marts contain a snapshot of operational data that helps business people to strategize based on analyses of past trends and experiences. The key difference is that the creation of a data mart is predicated on a specific, predefined need for a certain grouping and configuration of select data. A data mart configuration emphasizes easy access to relevant information (Reference: Wiki).

Data Marts are designed to help manager make strategic decisions about their business.

What is the difference between OLAP and data warehouse?

Data warehouse is the place where the data is stored for analysing whereas OLAP is the process of analysing the data, managing aggregations, partitioning information into cubes for in depth visualization.

What is a Cube and Linked Cube with reference to data warehouse?

Cubes are logical representation of multidimensional data. The edge of the cube contains dimension members and the body of the cube contains data values. The linking in cube ensures that the data in the cubes remain consistent.

What is snapshot with reference to data warehouse?

You can disconnect the report from the catalog to which it is attached by saving the report with a snapshot of the data.

What is active data warehousing?

An active data warehouse provides information that enables decision‐makers within an organization to manage customer relationships nimbly, efficiently and proactively.

What is the difference between data warehousing and business intelligence?

Data warehousing deals with all aspects of managing the development, implementation and operation of a data warehouse or data mart including metadata management, data acquisition, data cleansing, data transformation, storage management, data distribution, data archiving, operational reporting, analytical reporting, security management, backup/recovery planning, etc. Business intelligence, on the other hand, is a set of software tools that enable an organization to analyse measurable aspects of their business such as sales performance, profitability, operational efficiency, effectiveness of marketing campaigns, market penetration among certain customer groups, cost trends, anomalies and exceptions, etc. Typically, the term “business intelligence” is used to encompass OLAP, data visualization, data mining and query/reporting tools. (Reference: Les Barbusinski)

Explain paradigm of Bill Inmon and Ralph Kimball.

Bill Inmon's paradigm: Data warehouse is one part of the overall business intelligence system. An enterprise has one data warehouse, and data marts source their information from the data warehouse. In the data warehouse, information is stored in 3rd normal form.

Ralph Kimball's paradigm: Data warehouse is the conglomerate of all data marts within the enterprise. Information is always stored in the dimensional model.

What are different types of fact tables?

Types of fact:

Additive:
Additive facts are facts that can be summed up through all of the dimensions in the fact table. A sales fact is a good example for additive fact.
Semi-Additive:
Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others.
E.g. Daily balances fact can be summed up through the customers dimension but not through the time dimension.
Non-Additive:
Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.
E.g. Facts which have percentages, ratios calculated.

Fact less Fact Table:

In the real world, it is possible to have a fact table that contains no measures or facts. These tables are called “Factless Fact tables”.

E.g. A fact table which has only product key and date key is a factless fact. There are no measures in this table. But still you can get the number products sold over a period of time.

Based on the above classifications, fact tables are categorized into two:

1. Cumulative:

This type of fact table describes what has happened over a period of time. For example, this fact table may describe the total sales by product by store by day. The facts for this type of fact tables are mostly additive facts. The first example presented here is a cumulative fact table.

2. Snapshot:

This type of fact table describes the state of things in a particular instance of time, and usually includes more semi-additive and non-additive facts. The second example presented here is a snapshot fact table.

What are different types of dimension tables?

1. Conformed Dimension:
Conformed dimensions mean the exact same thing with every possible fact table to which they are joined.
E.g. the date dimension table connected to the sales facts is identical to the date dimension connected to the inventory facts.

2. Junk Dimension:
A junk dimension is a collection of random transactional codes flags and/or text attributes that are unrelated to any particular dimension. The junk dimension is simply a structure that provides a convenient place to store the junk attributes.
E.g. assume that we have a gender dimension and marital status dimension. In the fact table we need to maintain two keys referring to these dimensions. Instead of that create a junk dimension which has all the combinations of gender and marital status (cross join gender and marital status table and create a junk table). Now we can maintain only one key in the fact table.

3. Degenerated Dimension:
A degenerate dimension is a dimension which is derived from the fact table and doesn't have its own dimension table.
E.g. A transactional code in a fact table.

4. Role-playing dimension:
Dimensions which are often used for multiple purposes within the same database are called role-playing dimensions. For example, a date dimension can be used for “date of sale", as well as "date of delivery", or "date of hire".

5. Slowly changing dimensions:

Slowly changing dimensions are the dimensions in which the data changes slowly, rather than changing regularly on a time basis.

Types:

a. SCD type 1

It is used when there is no need to store historical data in the dimension table. This method overwrites the old data in the dimension table with the new data. It is used to correct data errors in the dimension.

b. SCD type 2

SCD type 2 stores the entire history the data in the dimension table. With type 2 we can store unlimited history in the dimension table. In type 2, you can store the data in three different ways.

They are:

1. Versioning

2. Flagging

3. Effective Date

c. SCD type 3

Only the current status and previous status of the row is maintained in the table. To track these changes two separate columns are created in the table.

What is granularity in database?

Granularity is the relative size, scale, level of detail, or depth of penetration that characterizes an object or activity. It may help to think of it as: which type of "granule" are we looking at? This term is used in astronomy, photography, physics, linguistics, and fairly often in information technology. It can refer to the level of a hierarchy of objects or actions, to the fineness of detail in a photograph, or to the amount of information that is supplied in describing a person's age. Its meaning is not always immediately clear to those unfamiliar with the context in which it's being used.

Define repository in terms of OBIEE?

Repository stores the Meta data information. The extension of the repository file is “.rpd”. With OBIEE Server, all the rules needed for security, data modelling, aggregate navigation, caching, and connectivity is stored in metadata repositories. Each metadata repository can store multiple business models. OBIEE Server can access multiple repositories

Repository is divided into three layer,
1. Physical – Represents the data Sources
2. Business – model the Data sources into Facts and Dimension and apply business logic
3. Presentation – Specifies the user’s view of the data rendered in OBIEE answers client

What is the end to end life cycle of OBIEE?

Requirement GatheringàIdentify Source SystemàETL and DW designàRPD designàReport and DB designàDefine SecurityàDetermine AggàTesting and QA.

If you have 3 facts and 4 dimension and you need to join would you recommend joining fact with fact? If no than what is the option? Why you won’t join fact to fact?
Instead of joining fact with fact we can create one logical table (fact) and add the 3 fact tables as logical table source in the BMM layer.

What is connection pool and how many connection pools did you have in your last project?
Connection pool is needed for every physical database.
It contains information about the connection to the database, not the database itself.
Can use either shared user accounts or can use pass-through accounts.
We can have multiple connection pools for each group to avoid waiting.

What is the purpose of Alias Tables?
An Alias table is a physical table with the type of Alias. It is a reference to a physical table, and inherits all its column definitions and some properties from the physical table.

Why alias:
1. to reuse an existing table more than once in your physical layer (without having to import it several times).
2. To set up multiple alias tables, each with different keys, names, or joins
3. To help you design sophisticated star or snowflake structures in the business model layer. Alias tables are critical in the process of converting ER Schemas to Dimensional Schemas.

How do you define the relationship between facts and dimensions in BMM layer?
Using complex join or Logical joins

Did you create any new logical column in BMM layer, how?
Yes. We can create new logical column in BMM layer.
Example: Right click on fact table> select new logical column>give name for new logical column like Total cost.

Can you use physical join in BMM layer?
Yes we can use physical join in BMM layer.

Can you use outer join in BMM layer?
Yes we can. When we are doing complex join in BMM layer in the join properties we have an option to change the type of join to left outer, right outer, full outer or inner.

What is level based metrics?
Level based metrics means, having a measure pinned at a certain level of the dimension.

A LBM is a metric that is defined for a specific level or intersection of levels.
Monthly Total Sales or Quarterly Sales are the examples.

What is logging level? Where can you set logging levels?

What are different types of variables? Explain each.
There are two classes of variables:

A repository variable has a single value at any point in time. There are two types of repository variables: static and dynamic.
Session variables are created and assigned a value when each user logs on. There are two types of session variables: system and non system.

Initialization blocks are used to initialize dynamic repository variables, system session variables, and non system session variables.

What is Authentication? How many types of authentication do we have in OBIEE?
Authentication is the process by which a system verifies a user with the help of a user ID and password. It checks if user has the necessary permissions and authorizations to log in and access data. There are 4 main types of Authentication in OBIEE:
Operating system authentication
External table authentication
Database authentication
LDAP authentication

What is data level security?
This controls the type or amount of data that you can see in a report. When multiple users run the same report the results that are returned to each depend on their access rights and roles in the organization. For example a sales vice president sees results for all regions, while a sales representative for a particular region sees data for that region.

What is the difference between Data Level Security and Object Level Security?
Data level security controls the type and amount of data that you can see in a report. Object level security provides security for objects stored in the OBIEE web catalog like dashboards, dashboards pages, folder and reports.

Oracle Business Intelligence

Pages

Monday, 22 August 2016

OBIEE Interview Questions Part - 3

No comments:

Post a Comment