0

I'm just getting into the concept of a Distributed System and its advantages and disadvantages. In the book I'm reading it discusses the complexity of a Distributed System and that they are inherently complex, it lists the following as potential reasons for complexity;

  • Heterogeneity
  • Asynchronous communication
  • Partial failures

What I am struggling to understand is what these concepts actually encompass (i.e what is a partial failure and what are the causes of a partial failure?), and how they are dealt with in modern systems? Does middleware successfully solve all three of these complexity issues within a system?

3 Answers 3

2

This question can be answered in many words, but I'll try to boil it down to essentials:

Heterogeneity is one of the main problems integration tries to solve. It is an inherent characteristic of most distributed systems and it refers to the fact that most often than not, when you have to integrate multiple systems, they will:

  • Be on different platforms, in different networks;
  • Differ in their capabilities in terms of integration;
  • Have disparities in data, even data referring to the same business domain;
  • Use and support different (sometimes even forgotten or unsupported) technologies and standards;
  • Have different owners (are controlled by different departments, companies).

All of the above add more and more complexity.

Asynchronous communication solves some problems of stateless communication but introduces whole other set of complexities, that can easily lead to problems when not implementation is not proper. This is mainly due to the fact that you only have guarantee that the message will be successfully received on the other end, but have no guarantee whatsoever when the operation will be processed, if ever. So it is much harder to carry out orchestration of interdependent asynchronous tasks, as opposed to synchronous tasks.

Partial failures - When you have processes that involve multiple interdependent write operations you need to ensure ACID transactions. Having to do so in scenarios when multiple systems are involved is even harder because you cannot achieve common transactional context as easily in heterogeneous distributed environment as you would if you were within the boundaries of a single system. Often you will need to implement opposite operations in services (or worse, implement two-phase commit), just to be able to compensate all prior writes in the process in case something goes wrong with one of the tasks.

Hope this clears things a bit!

Sign up to request clarification or add additional context in comments.

Comments

1

The reason distributed systems are so complex is simple: time!

Perfect synchronization of state becomes impossible in distributed systems for the simple fact that some amount of time must pass between the point that a message leaves one server and that point it arrives at its intended destination. In addition to this, networks are a far more unreliable communication medium, meaning that message may never make it at all.

The lack of perfect time synchronization means that it's impossible to make absolute assumptions about the order of events. For instance, in a highly available distributed database, if two requests writing to the same resource arrive on two different servers nearly simultaneously, there's no way to determine the absolute order of those events. So, distributed systems must use approximations of logical time and conflict resolution to resolve these types of event order issues.

Comments

0

Partial Failure - In case of a transactionS involving many clients (@2or more),the scheduling technique being used many involve conflicting operations of a write and maybe a write, in the process of issuing a lock complexities arise like for in case of a deadlock. When the lock manager tries either to detect, avoid or prevent the system may partially fail leading to rollback of the whole process.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.