A Unified Peer-To-Peer Database Framework

Essay by 24 • October 29, 2010 • 9,491 Words (38 Pages) • 1,964 Views

Essay Preview: A Unified Peer-To-Peer Database Framework

prev next

Report this essay

Page 1 of 38

A Unified Peer-to-Peer Database Framework

and its Application for Scalable Service Discovery

Wolfgang Hoschek

CERN IT Division

European Organization for Nuclear Research

1211 Geneva 23, Switzerland

Wolfgang.Hoschek@cern.ch

Abstract

In a large distributed system spanning many administrative domains such as a Data-

Grid, it is often desirable to maintain and query dynamic and timely information about

active participants such as services, resources and user communities. However, in such a

database system, the set of information tuples in the universe is partitioned over one or

more distributed nodes, for reasons including autonomy, scalability, availability, performance

and security. It is not obvious how to enable powerful discovery query support and

collective collaborative functionality that operate on the distributed system as a whole,

rather than on a given part of it. Further, it is not obvious how to allow for search

results that are fresh, allowing dynamic content. It appears that a Peer-to-Peer (P2P)

database network may be well suited to support dynamic distributed database search, for

example for service discovery. In this paper, we take the first steps towards unifying the

fields of database management systems and P2P computing, which so far have received

considerable, but separate, attention. We extend database concepts and practice to cover

P2P search. Similarly, we extend P2P concepts and practice to support powerful generalpurpose

query languages such as XQuery and SQL. As a result, we devise the Unified

Peer-to-Peer Database Framework (UPDF), which is unified in the sense that it allows

to express specific applications for a wide range of data types, node topologies, query

languages, query response modes, neighbor selection policies, pipelining characteristics,

timeout and other scope options.

1 Introduction

The next generation Large Hadron Collider (LHC) project at CERN, the European Organization

for Nuclear Research, involves thousands of researchers and hundreds of institutions

spread around the globe. A massive set of computing resources is necessary to support it's

data-intensive physics analysis applications, including thousands of network services, tens of

thousands of CPUs, WAN Gigabit networking as well as Petabytes of disk and tape storage

[1]. To make collaboration viable, it was decided to share in a global joint effort - the European

DataGrid (EDG) [2, 3, 4, 5] - the data and locally available resources of all participating

laboratories and university departments.

Grid technology attempts to support flexible, secure, coordinated information sharing

among dynamic collections of individuals, institutions and resources. This includes data

sharing but also includes access to computers, software and devices required by computation

and data-rich collaborative problem solving [6]. These and other advances of distributed

computing are necessary to increasingly make it possible to join loosely coupled people and

resources from multiple organizations.

An enabling step towards increased Grid software execution flexibility is the (still immature

and hence often hyped) web services vision [2, 7, 8] of distributed computing where

programs are no longer configured with static information. Rather, the promise is that programs

are made more flexible and powerful by querying Internet databases (registries) at

runtime in order to discover information and network attached third-party building blocks.

Services can advertise themselves and related metadata via such databases, enabling the assembly

of distributed higher-level components. For example, a data-intensive High Energy

Physics analysis application sweeping over Terabytes of data looks for remote services that

exhibit a suitable combination of characteristics, including network load, available disk quota,

access rights, and perhaps Quality of Service and monetary cost.

More generally, in a distributed system, it is often desirable to maintain and query dynamic

and timely information about active participants such as services, resources and user

communities. As in a data integration system [9, 10, 11], the goal is to exploit several independent

information sources as if they were a single source. However, in a large distributed

database system spanning many administrative domains, the set of information tuples in the

universe is partitioned over one or more distributed nodes, for reasons including autonomy,

scalability, availability, performance and security. It is not obvious how to enable powerful

discovery query support and collective collaborative functionality that operate on the distributed

system as a whole, rather than on a given part of it. Further, it is not obvious

how

...

Download as: txt (63.8 Kb) pdf (521.2 Kb) docx (43.3 Kb)

Continue for 37 more pages »

Read Full Essay Save

Only available on Essays24.com

Similar Essays

A Critical Analysis Of Personal Leadership Style With Reference To Classical Theoretical Frameworks.

A critical analysis of personal leadership style with reference to classical theoretical frameworks. The aim of this study is to examine my personal leadership style,

8,904 Words | 36 Pages
Pros & Cons Of Web Access To Databases In Ecommerce

This paper will discuss the pros and cons of Web access to databases in tourism ecommerce. As one of the fastest growing companies in the

291 Words | 2 Pages
Normalize A Database To 3nf

Simple STEP BY STEP METHOD TO NORMALIZE TABLES TO 3NF  STEP 1: Ask the following question:  DOES THE TABLE IN QUESTION HAVE ANY

756 Words | 4 Pages
Security In Peer To Peer Networks

Thesis statement: Users of Peer to Peer networks must be aware of the security and how to deal with the attacks. Introduction Technologies are getting

1,807 Words | 8 Pages
Peer-To-Peer Networking And Operations

Peer-to-Peer Networking and Operations Peer-to-peer (p2p) networking allows resources to be shared between computers on a network. The low-cost and easy setup makes them ideal

1,058 Words | 5 Pages
Dahl’s “a Lamb to the Slaughter” and Glaspell’s “a Jury of Her Peers”: Twisted Housewives

Dahl’s “A Lamb to the Slaughter” and Glaspell’s “A Jury of Her Peers”: Twisted Housewives Leonard Mustazza in his analysis “Generic Translation and Thematic

2,266 Words | 10 Pages
A Peer-To-Peer (p2p) Service

Network Model The way that networks are constructed and connected. The connection are physically and electrically made . It would be the communication path between

1,415 Words | 6 Pages
Peer-To-Peer Lending, P2p, Crowdfunding, Default Risk, Loan Performance

Abstract: We exploit a large sample of loans observation from the Lending Club to identify the main determinant of borrowers’ default in P2P lending within

7,267 Words | 30 Pages