Demultiplexing SMB Requests
De-multiplexing of SMB requests requires knowledge of SMB state information,
all of which must be held by the front-end
virtual
server.
This is a perplexing and complicated problem to solve.
Windows XP and later have changed semantics so state information (vuid, tid, fid)
must match for a successful operation. This makes things simpler than before and is a
positive step forward.
SMB requests are sent by vuid to their associated server. No code exists today to
effect this solution. This problem is conceptually similar to the problem of
correctly handling requests from multiple requests from Windows 2000
Terminal Server in Samba.
One possibility is to start by exposing the server pool to clients directly.
This could eliminate the de-multiplexing step.
The Distributed File System Challenge
There exists many distributed file systems for UNIX and Linux.
Many could be adopted to backend our cluster, so long as awareness of SMB
semantics is kept in mind (share modes, locking, and oplock issues in particular).
Common free distributed file systems include:
The server pool (cluster) can use any distributed file system backend if all SMB
semantics are performed within this pool.
Restrictive Constraints on Distributed File Systems
Where a clustered server provides purely SMB services, oplock handling
may be done within the server pool without imposing a need for this to
be passed to the backend file system pool.
On the other hand, where the server pool also provides NFS or other file services,
it will be essential that the implementation be oplock-aware so it can
interoperate with SMB services. This is a significant challenge today. A failure
to provide this interoperability will result in a significant loss of performance that will be
sorely noted by users of Microsoft Windows clients.
Last, all state information must be shared across the server pool.
Server Pool Communications
Most backend file systems support POSIX file semantics. This makes it difficult
to push SMB semantics back into the file system. POSIX locks have different properties
and semantics from SMB locks.
All
smbd
processes in the server pool must of necessity communicate
very quickly. For this, the current
tdb
file structure that Samba
uses is not suitable for use across a network. Clustered
smbd
s must use something else.
Server Pool Communications Demands
High-speed interserver communications in the server pool is a design prerequisite
for a fully functional system. Possibilities for this include:
-
Proprietary shared memory bus (example: Myrinet or SCI [scalable coherent interface]).
These are high-cost items.
-
Gigabit Ethernet (now quite affordable).
-
Raw Ethernet framing (to bypass TCP and UDP overheads).
We have yet to identify metrics for performance demands to enable this to happen
effectively.
Required Modifications to Samba
Samba needs to be significantly modified to work with a high-speed server interconnect
system to permit transparent failover clustering.
Particular functions inside Samba that will be affected include:
-
The locking database, oplock notifications,
and the share mode database.
-
Failure semantics need to be defined. Samba behaves the same way as Windows.
When oplock messages fail, a file open request is allowed, but this is
potentially dangerous in a clustered environment. So how should interserver
pool failure semantics function, and how should such functionality be implemented?
-
Should this be implemented using a point-to-point lock manager, or can this
be done using multicast techniques?