[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ISN] DR at a mutual fund


By Vinita Gupta
31 July 2006  

The one important lesson that many organisations learned after last year's 
deluge in Mumbai is the significance of maintaining a disaster recovery 
(DR) site. Those companies that coped did so on by virtue of the fact that 
they had a DR site to resume operations from. The other reason for the 
need and growing acceptance of DR sites is the increasing number of 
business applications and the increasing dependence of organisations on 

Though SBI Mutual Funds (SBIMF) was not affected by the flooding last year 
(or this year, for that matter), they have nevertheless invested in a DR 
site and are quite confident that if disaster hits they are prepared for 
the worst.

The need

The reasons that compelled SBIMF to go in for a DR site were internal risk 
management, regulatory guidelines from the Securities Exchange Board of 
India (SEBI), and the desire to gain the investor's trust.

Says Subhojit Roy, SBIMF's Head of IT, "Business Continuity (BC) is very 
important, especially for the financial services sector. DR is a sub-set 
of BC. To keep the business running in case of a disaster such as a 
natural calamity, or the break-down of the primary data centre, DR is a 
must." SBIMF has many applications such as publication of the net asset 
value (NAV) for investors which need to be released everyday irrespective 
of disasters.

The mutual fund's (MF's) workings are regulated by SEBI guidelines, which 
stipulate that the MF and its registrar and transfer (R&T)  agents and 
custodians should have an offsite back-up facility and business 
contingency plan that is tested and evaluated on a regular basis. The 
business contingency plan should be comprehensive and should cover IT, 
infrastructure and personnel requirements.

Since the MF works with intermediaries such as banks, custodians, R&T 
agents and brokers, the level of restoration of normal operations by the 
MF and the time taken for different levels of normalcy will depend on the 
individual DR implementation of its partners.

The need for DR among MF providers also arises as investors want to know 
the level of preparedness of the provider before investing. Notes Roy, "BC 
has become quite critical in the financial services sector.  Apart from 
regulatory requirements for DR and BC, institutional investors also like 
to know before investing whether risk management practices are in place or 

Process and implementation

The process of DR planning began in May 2005, and the final implementation 
started in February this year; the DR site went live in June.

SBIMF's DR site is at Chennai. The reason for choosing the TN capital was 
because it does not fall in a high seismic activity zone. Apart from 
deciding the location of its DR site, SBIMF had to decide on the cost and 
modalities i.e. whether it would be deployed and managed by an in-house IT 
team or whether it would be outsourced. Says Roy, "SBI had already set up 
a complete DR site in Chennai for its core banking and ATM network, and 
they provided space and infrastructure in their DR data centre to us."

Though the site has well-equipped infrastructure, skilled personnel and 
BS7799 certification, SBIMF had to establish its own systems at the site.

The stages of DR
* Level 1. The first level of DR implementation consisted of planning and 
  implementing policy-based strategic back-up management, back-up 
  strategies, data consolidation and tape vaulting at the offsite 
  facility. Informs Roy, "We are presently taking daily back-up of data of 
  all critical servers. The back-up tapes are stored in a fire-proof 
  cabinet in our office as well as in the bank's locker for offsite 
* Level 2. The second step included charting out the critical components 
  and designing a redundancy plan. Most of the servers and active network 
  components are critical to the operations. A single point of failure in 
  such components can raise the risk of disasters and bring the entire 
  business to a halt. In this level the redundancy path is designed to 
  avoid total disruption. "As a result of risk mitigation, you get 
  different redundancy designs for critical network components. All single 
  points of failure are treated for redundancy planning," adds Roy.
* Level 3. Finally, in the third stage of DR, the primary site is offered 
  an alternative site of operation to undertake business critical 
  processes within the stipulated recovery time objective (RTO) and 
  recovery point objective (RPO). While setting up a DR site, an 
  appropriate data recovery solution is defined to satisfy the needs of 
  RTO and RPO.

Applications on the DR site

SBIMF is running business applications such as Mfund, and front office and 
cash management systems at the DR site. All business-critical applications 
like Oracle database, and the mail, file and print server, are being 

Roy says, "Based on business impact analysis and the objectives of BC such 
as RTO and RPO, we have selected these applications and data replication 
technology. The applications are front office and back office systems 
(running on Oracle 9i), the cash management system (also running on Oracle 
9i), portfolio management system (running on MS-SQL), centralised mailing 
system (Lotus Domino 6.5.3) and files of mapped drives of all the users in 
the network of the primary site."  Non-critical applications such as 
workflow applications are not part of DR.

Technology used

SBIMF has about 50 branch offices which look at sales and investor 
servicing. All these branches are connected to the corporate office (at 
Cuffe Parade in Mumbai) through the WAN. Data from the branches is 
collated at the centralised server located at the corporate office.  The 
servers are Intel-Windows-based.

Data is replicated in two ways: host-based replication and consolidated 
replication. Host-based replication means data replication from one system 
at the primary site to a similar system at the DR site. It is 
application-level replication, which means it can be done at the 
application level (like Oracle Data Guard) or through third-party 
software. The other way is to consolidate the data from the various 
servers into a single storage box (like a SAN or NAS box), and then 
replicate the data of different applications from the external storage box 
to another similar box at the DR site.

SBIMF has chosen to replicate its data by following a consolidated 
replication method. SBIMF first consolidates all critical server data 
through storage consolidation. With the use of Network Appliance’s 
fibre-attached storage (FAS), they replicate all the data to a similar FAS 
device at the DR site. At present, the servers are also accessing the FAS 

Informs Roy, "We have done data consolidation at the primary site, that 
is, the corporate office. All the critical data of the Oracle, mail and 
file servers have been migrated into a unified storage box."  Critical 
data of SBIMF gets replicated every four hours; this means that whatever 
data there is in the FAS box in the primary site gets replicated to the 
FAS box at the DR site. The less critical data is replicated at the end of 
the day to reduce bandwidth utilisation during working hours.

At the primary site, SBIMF is using a Tandberg autoloader and Veritas 
back-up software for archival. Earlier, back-ups were taken into SDLTs 
without an autoloader. For the connectivity part, the primary and DR site 
are connected by leased lines of 2 Mbps. Since they are using the same DR 
site as SBI, SBIMF could leverage it. Reveals Roy, "SBI has set up a 
leased line between its Chennai DR site and the central hub in Mumbai; we 
too are connected to the central hub of the SBI through a leased line of 2 
Mbps. Because of this we saved on the cost of setting up our own leased 
line connecting the DR site to Mumbai."

Role of BCP committee

The company's Business Continuity Planning (BCP) committee is the 
highest-level committee for DR. This committee takes the final decisions 
on actual disaster situations, and based on its decision the BCP team will 
act. Typically, a BCP committee comprises the top management team, members 
of different functional areas, and the IT team. "The BCP team is 
responsible for reviewing the DR / BC plan, testing the DR site 
periodically through live DR drills with the help of users, and has a 
specific role to play in case of disaster," says Roy.

The challenges faced by the SBIMF team in setting up the DR site were 
selection of the site, making a complete DR / BC manual, involving all 
departments / functions of the company, planning appropriate technology 
for the DR requirements of the company, continuous review and updating of 
DR / BC processes, and regular testing of DR. The testing to ensure 
accuracy of the DR site is conducted every quarter.

First step to BC

Roy believes that having a DR site is the first step towards BC. If any of 
the server components in the primary site is down, they can work from the 
DR site till the primary site's equipment is revived.  "Business impact 
analysis helps as it gives us a complete picture for setting up an 
alternate operational site for BC, and also the manpower requirements for 
BC. It is useful in building adequate redundancy in the present 
infrastructure, and a complete DR / BC manual by giving everybody clear 
guidelines for disaster situations."

Attend the Black Hat Briefings and
Training, Las Vegas July 29 - August 3
2,500+ international security experts from 40 nations,
10 tracks, no vendor pitches.