Sunday, March 1, 2015

socket more detail ( tags: protocol field,



  • To create a stream socket in the Internet domain, you could use the following call:
int socket( domain , type , protocol);
 

  1. Domain: It specifies the communication domain. It takes one of the predefined values described under the protocol family and address family above in this lecture.
  2. Type: It specifies the semantics of communication , or the type of service that is desired . It takes the following values:
    • SOCK_STREAM : Stream Socket
    • SOCK_DGRAM : Datagram Socket
    • SOCK_RAW : Raw Socket
    • SOCK_SEQPACKET : Sequenced Packet Socket
    • SOCK_RDM : Reliably Delivered Message Packet
  3. Protocol: This parameter identifies the protocol the socket is supposed to use . Some values are as follows:
    • IPPROTO_TCP : For TCP (SOCK_STREAM)
    • IPPROTO_UDP : For UDP (SOCK_DRAM)
    Since we have only one protocol for each kind of socket, it does not matter if we do not define any protocol at all. So for simplicity, we can put "0" (zero) in the protocol field
  4. stream socket/data gram socket vs raw socket:
 * A raw socket is used to receive raw packets. This means packets received at the Ethernet layer will directly pass to the raw socket. Stating it precisely, a raw socket bypasses the normal TCP/IP processing and sends the packets to the specific user application (see Figure 1)
* Other sockets like stream sockets and data gram sockets receive data from the transport layer that contains no headers but only the payload. This means that there is no information about the source IP address and MAC address. If applications running on the same machine or on different machines are communicating, then they are only exchanging data.
* The purpose of a raw socket is absolutely different. A raw socket allows an application to directly access lower level protocols, which means a raw socket receives un-extracted packets (see Figure 2). There is no need to provide the port and IP address to a raw socket, unlike in the case of stream and datagram sockets.








  • Stream Sockets − Delivery in a networked environment is guaranteed. If you send through the stream socket three items "A, B, C", they will arrive in the same order − "A, B, C". These sockets use TCP (Transmission Control Protocol) for data transmission. If delivery is impossible, the sender receives an error indicator. Data records do not have any boundaries.
  • Datagram Sockets − Delivery in a networked environment is not guaranteed. They're connectionless because you don't need to have an open connection as in Stream Sockets − you build a packet with the destination information and send it out. They use UDP (User Datagram Protocol).

source: 
http://opensourceforu.com/2015/03/a-guide-to-using-raw-sockets/
http://www.tutorialspoint.com/unix_sockets/what_is_socket.htm

Example: 
 s = socket(AF_INET, SOCK_STREAM, 0);
  • This call would result in a stream socket being created with the TCP protocol providing the underlying communication support.

  •  If the protocol argument to the socket() call is 0, socket() will select a default protocol to use with the returned socket of the type requested. The default protocol is usually correct; alternate choices aren't usually available.
  • However, when you're using ``raw'' sockets to communicate directly with lower-level protocols or hardware interfaces, the protocol argument may be important for setting up demultiplexing.
  • For example, raw sockets in the Internet family may be used to implement a new protocol above IP. The socket will receive packets only for the specified protocol. To obtain a particular protocol, you determine the protocol number defined within the communication domain, using the getprotobyname() function, for example:
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
...
pp = getprotobyname("newtcp");
s = socket(AF_INET, SOCK_STREAM, pp->p_proto);
  • This would result in a socket s that uses a stream-based connection, but with protocol type of newtcp instead of the default tcp

Data transfer ( IN TCP)

With a connection established, data may begin to flow. To send and receive data, you can choose from several calls.
If the peer entity at each end of a connection is anchored, you can send or receive a message without specifying the peer. In this case, you can use the normal read() and write() functions:
write(s, buf, sizeof (buf));
read(s, buf, sizeof (buf));
In addition to read() and write(), you can use the new recv() and send() calls:
send(s, buf, sizeof (buf), flags);
recv(s, buf, sizeof (buf), flags);
Although recv() and send() are virtually identical to read() and write(), the extra flags argument is important (the flag values are defined in <sys/socket.h>). One or more of the following flags may be specified:
MSG_OOB
Send/receive out-of-band data.
MSG_PEEK
Look at data without reading.
MSG_DONTROUTE
Send data without routing packets
 DATA TRANSFER (in UDP)

 To send data, you use the sendto() function:
 
sendto(s, buf, buflen, flags, (struct sockaddr *)&to, tolen);

The s, buf, buflen, and flags parameters are used as before. The to and tolen values indicate the address of the intended recipient of the message.

To receive messages on an unconnected datagram socket, you use the recvfrom() function:

Out-of-band data is a notion specific to stream sockets; we won't immediately consider it here. The option to have data sent without routing applied to the outgoing packets is currently used only by the routing-table management process and is unlikely to be of interest to the casual user. 


On the other hand, the ability to preview data can be quite useful. When MSG_PEEK is specified with a recv() call, any data present is returned, but treated as still unread. That is, the next read() or recv() call applied to the socket will return the data previously viewed.

recvfrom( s, buf, buflen, flags, 
          (struct sockaddr *)&from, &fromlen );
 
Once again, fromlen is a value-result parameter, initially containing the size of the from buffer, and modified on return to indicate the actual size of the address that the datagram was received from.

Purpose of setsockoption:

Multiple binds to same local port

With certain applications, the algorithm used by the Socket Manager to select port numbers may be unsuitable. For example, the Internet file transfer protocol, FTP, specifies that data connections must always originate from the same local port (i.e. local from the server's point of view).

Note: A server (e.g. ftpd) avoids duplicate associations because the initiating programs (e.g. ftp) use different remote ports (i.e. remote from the server's point of view), even though the server is accessed from the same local port (i.e. local from the server's point of view). In this situation, the Socket Manager would typically disallow the server's binding the same local address and port number if a previous data connection's socket still existed on that port. (This would be a bad thing for servers such as ftpd, which always want to listen to the same well-known local port).

To override the default port selection algorithm, an option call must be performed prior to address binding:
...

int on = 1;
...
setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on));
bind(s, (struct sockaddr *) &sin, sizeof (sin));

With the above call, local addresses already in use may be bound. This doesn't violate the uniqueness requirement, because the system still checks at connect time to be sure any other sockets with the same local address and port don't have the same remote address and port. If the association already exists, the error EADDRINUSE is returned.

Broadcasting and determining network configuration

By using a datagram socket, you can send broadcast packets on many networks supported by the system. The network itself must support broadcasting - the system provides no broadcast simulation in software.
Broadcast messages can place a high load on a network since they force every host on the network to service them. Consequently, the ability to send broadcast packets has been limited to sockets explicitly marked as allowing broadcasting. Broadcasting is typically used for one of two reasons:
  • to find a resource on a local network without prior knowledge of its address
  • for functions such as routing that require information to be sent to all accessible neighbors
To send a broadcast message, a datagram socket should be created:
 
s = socket(AF_INET, SOCK_DGRAM, 0);

The socket is marked as allowing broadcasting:
int on = 1;

setsockopt(s, SOL_SOCKET, SO_BROADCAST, &on, sizeof (on));

and at least a port number should be bound to the socket:
 
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = htonl(INADDR_ANY);
sin.sin_port = htons(MYPORT);
bind(s, (struct sockaddr *) &sin, sizeof (sin));


how to handle multiple sockets at a same time(blocking and non blocking and synchrnous i/o and asynchrnous i/o)

Multiple Sockets

Suppose we have a process which has to handle multiple sockets. We cannot simply read from one of them if a request comes, because that will block while waiting on the request on that particular socket. In the meantime a request may come on any other socket. To handle this input/output multiplexing we could use different techniques :
  1. Busy waiting: In this methodology we make all the operations on sockets non-blocking and handle them simultaneously by doing polling. For example, we could use the read() system call this way and read from all the sockets together. The disadvantage in this is that we waste a lot of CPU cycles. To make the system calls non-blocking we use: fcntl (s, f_setfl, fndelay);
  2. Asynchronous I/O: Here we ask the Operating System to tell us whenever we are waiting for I/O on some sockets. The Operating System sends a signal whenever there is some I/O. When we receive a signal, we will have to check all sockets and then wait till the next signal comes. But there are two problems - first, the signals are expensive to catch and second, we would not be able to know if an input comes on a socket when we are doing I/O on another one. For Asynchronous I/O, we have a different set of commands (here we give the ones for UNIX with a VHD variant): signal(sigio, io_handler); fcntl(s, f_setown, getpid()); fcntl(s, f_setfl, fasync);
  3. Separate process for each I/O: We could as well fork out 10 different child processes for 10 different sockets. These child processes are very light weight and have some communication between them. Now these processes waiting on each socket can have blocking system calls. This wastes a lot of memory, data structures and other resources.
  4. Select() system call: We can use the select system call to instruct the Operating System to wait for any one of multiple events to occur and to wake up the process only if one of these events occur. This way we would know that the I/O request has come from which socket. int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *errorfds, struct timeval *timeout); void FD_CLR(int fd, fd_set *fdset); int FD_ISSET(int fd, fd_set *fdset); void FD_SET(int fd, fd_set *fdset); void FD_ZERO(fd_set *fdset);
    The select() function indicates which of the specified file descriptors is ready for reading, ready for writing, or has an error condition pending. If the specified condition is false for all of the specified file descriptors, select() blocks up to the specified timeout interval, until the specified condition is true for at least one of the specified file descriptors. The nfds argument specifies the range of file descriptors to be tested. The select() function tests file descriptors in the range of 0 to nfds-1. readfds, writefds and errorfds arguments point to an object of type fd_set. readfds specifies the file descriptors to be checked for being ready to read. writefds specifies the file descriptors to be checked for being ready to write, errorfds specifies the file descriptors to be checked for error conditions pending.
    On successful completion, the objects pointed to by the readfds, writefds, and errorfds arguments are modified to indicate which file descriptors are ready for reading, ready for writing, or have an error condition pending, respectively. For each file descriptor less than nfds, the corresponding bit will be set on successful completion if it was set on input and the associated condition is true for that file descriptor. The timeout is an upper bound on the amount of time elapsed before select returns. It may be zero, causing select to return immediately. If the timeout is a null pointer, select() blocks until an event causes one of the masks to be returned with a valid (non-zero) value. If the time limit expires before any event occurs that would cause one of the masks to be set to a non-zero value, select() completes successfully and returns 0.

synchronous and asynchnornous ( blocking and non blocking):

Synchronous or Asynchronous?

* practical example of blocking and unblocking  usnig web browser:
http://www.scottklement.com/rpg/socktut/nonblocking.html
https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.1.0/com.ibm.zos.v2r1.hala001/orgblockasyn.htm

There are many types of sockets. Two of them are blocking and nonblocking. Blocking sockets are the
ones that get blocked (no line of code executes after this) after making a system call until a reply comes or timeout or some kind of error occurs. On the other
hand, nonblocking continue the execution after making a system call and do not wait for reply.


Let's say that you're writing a web browser. You try to connect to a web server, but
the server isn't responding. When a user presses (or clicks) a stop button, you want
the connect() API to stop trying to connect.
With what you've learned so far, that can't be done. When you issue a call to
connect(), your program doesn't regain control until either the connection is made,
or an error occurs.
The solution to this problem is called "non-blocking sockets". By default, TCP
sockets are in "blocking" mode. For example, when you call recv() to read from a
stream, control isn't returned to your program until at least one byte of data is
read from the remote site. This process of waiting for data to appear is referred
to as "blocking". The same is true for the write() API, the connect() API, etc.
When you run them, the connection "blocks" until the operation is complete.
Its possible to set a descriptor so that it is placed in "non-blocking" mode.
When placed in non-blocking mode, you never wait for an operation to complete.
This is an invaluable tool if you need to switch between many different connected
sockets, and want to ensure that none of them cause the program to "lock up."

Network communication (or file system access in general) system
calls may operate in two modes: synchronous or asynchronous. In the
synchronous mode, socket routines return only when the operation
is complete. For example, accept returns only when a connection
arrives. In the asynchronous mode, socket routines return immediately:
system calls become non-blocking calls (e.g., read does not block, waiting
until data arrives).
You can change the mode with the fcntl system call. For example,

fcntl(s, F_SETFF, FNDELAY);
sets the socket s to operate in asynchronous mode
note: can see detailed usage of fcntl in this programming example:
www.lowtek.com/sockets/select.html

Table 1. Socket programming interface actions
Call typeSocket stateblockingNonblocking
Types of read() callsInput is availableImmediate returnImmediate return
No input is availableWait for inputImmediate return with EWOULDBLOCK error number (select() exception: READ)
Types of write() callsOutput buffers availableImmediate returnImmediate return
No output buffers availableWait for output buffersImmediate return with EWOULDBLOCK error number (select() exception: WRITE)
accept() callNew connectionImmediate returnImmediate return
No connections queuedWait for new connectionImmediate return with EWOULDBLOCK error number (select() exception: READ)
connect() call WaitImmediate return with EINPROGRESS error number (select() exception: WRITE)


* select is completely unblocking:
When you use select() call logic, you do not issue any socket call on a given socket
 until the select() call tells you that something has happened on that socket;
 for example, data has arrived and is ready to be read by a read() call.
 By using the select() call, you do not issue a blocking call until you know that 
the call cannot block.

reference:
http://users.pja.edu.pl/~jms/qnx/help/tcpip_4.25_en/prog_guide/sock_advanced_tut.html
http://cse.iitk.ac.in/users/dheeraj/cs425/lec20.html
 http://www.cs.rutgers.edu/~pxk/rutgers/notes/sockets/
https://www.quora.com/In-networking-programming-what-is-nonblocking-socket


http://www.ibm.com/developerworks/aix/library/au-tcpsystemcalls/
http://sock-raw.org/papers/sock_raw (socket very detail and worth reading)
# /etc/protocols:
# $Id: protocols,v 1.11 2011/05/03 14:45:40 ovasik Exp $
#
# Internet (IP) protocols
#
#    from: @(#)protocols    5.1 (Berkeley) 4/17/89
#
# Updated for NetBSD based on RFC 1340, Assigned Numbers (July 1992).
# Last IANA update included dated 2011-05-03
#
# See also http://www.iana.org/assignments/protocol-numbers

ip    0    IP        # internet protocol, pseudo protocol number
hopopt    0    HOPOPT        # hop-by-hop options for ipv6
icmp    1    ICMP        # internet control message protocol
igmp    2    IGMP        # internet group management protocol
ggp    3    GGP        # gateway-gateway protocol
ipv4    4    IPv4        # IPv4 encapsulation
st    5    ST        # ST datagram mode
tcp    6    TCP        # transmission control protocol
cbt    7    CBT        # CBT, Tony Ballardie <A.Ballardie@cs.ucl.ac.uk>
egp    8    EGP        # exterior gateway protocol
igp    9    IGP        # any private interior gateway (Cisco: for IGRP)
bbn-rcc    10    BBN-RCC-MON        # BBN RCC Monitoring
nvp    11    NVP-II        # Network Voice Protocol
pup    12    PUP        # PARC universal packet protocol
argus    13    ARGUS        # ARGUS
emcon    14    EMCON        # EMCON
xnet    15    XNET        # Cross Net Debugger
chaos    16    CHAOS        # Chaos
udp    17    UDP        # user datagram protocol
mux    18    MUX        # Multiplexing protocol
dcn    19    DCN-MEAS        # DCN Measurement Subsystems
hmp    20    HMP        # host monitoring protocol
prm    21    PRM        # packet radio measurement protocol
xns-idp    22    XNS-IDP        # Xerox NS IDP
trunk-1    23    TRUNK-1        # Trunk-1
trunk-2    24    TRUNK-2        # Trunk-2
leaf-1    25    LEAF-1        # Leaf-1
leaf-2    26    LEAF-2        # Leaf-2
rdp    27    RDP        # "reliable datagram" protocol
irtp    28    IRTP        # Internet Reliable Transaction Protocol
iso-tp4    29    ISO-TP4        # ISO Transport Protocol Class 4
netblt    30    NETBLT        # Bulk Data Transfer Protocol
mfe-nsp    31    MFE-NSP        # MFE Network Services Protocol
merit-inp    32    MERIT-INP        # MERIT Internodal Protocol
dccp    33    DCCP        # Datagram Congestion Control Protocol
3pc    34    3PC        # Third Party Connect Protocol
idpr    35    IDPR        # Inter-Domain Policy Routing Protocol
xtp    36    XTP        # Xpress Tranfer Protocol
ddp    37    DDP        # Datagram Delivery Protocol
idpr-cmtp    38    IDPR-CMTP        # IDPR Control Message Transport Proto
tp++    39    TP++        # TP++ Transport Protocol
il    40    IL        # IL Transport Protocol
ipv6    41    IPv6        # IPv6 encapsulation
sdrp    42    SDRP        # Source Demand Routing Protocol
ipv6-route    43    IPv6-Route        # Routing Header for IPv6
ipv6-frag    44    IPv6-Frag        # Fragment Header for IPv6
idrp    45    IDRP        # Inter-Domain Routing Protocol
rsvp    46    RSVP        # Resource ReSerVation Protocol
gre    47    GRE        # Generic Routing Encapsulation
dsr    48    DSR        # Dynamic Source Routing Protocol
bna    49    BNA        # BNA
esp    50    ESP        # Encap Security Payload
ipv6-crypt    50    IPv6-Crypt        # Encryption Header for IPv6 (not in official list)
ah    51    AH        # Authentication Header
ipv6-auth    51    IPv6-Auth        # Authentication Header for IPv6 (not in official list)
i-nlsp    52    I-NLSP        # Integrated Net Layer Security TUBA
swipe    53    SWIPE        # IP with Encryption
narp    54    NARP        # NBMA Address Resolution Protocol
mobile    55    MOBILE        # IP Mobility
tlsp    56    TLSP        # Transport Layer Security Protocol
skip    57    SKIP        # SKIP
ipv6-icmp    58    IPv6-ICMP        # ICMP for IPv6
ipv6-nonxt    59    IPv6-NoNxt        # No Next Header for IPv6
ipv6-opts    60    IPv6-Opts        # Destination Options for IPv6
#    61            # any host internal protocol
cftp    62    CFTP        # CFTP
#    63            # any local network
sat-expak    64    SAT-EXPAK        # SATNET and Backroom EXPAK
kryptolan    65    KRYPTOLAN        # Kryptolan
rvd    66    RVD        # MIT Remote Virtual Disk Protocol
ippc    67    IPPC        # Internet Pluribus Packet Core
#    68            # any distributed file system
sat-mon    69    SAT-MON        # SATNET Monitoring
visa    70    VISA        # VISA Protocol
ipcv    71    IPCV        # Internet Packet Core Utility
cpnx    72    CPNX        # Computer Protocol Network Executive
cphb    73    CPHB        # Computer Protocol Heart Beat
wsn    74    WSN        # Wang Span Network
pvp    75    PVP        # Packet Video Protocol
br-sat-mon    76    BR-SAT-MON        # Backroom SATNET Monitoring
sun-nd    77    SUN-ND        # SUN ND PROTOCOL-Temporary
wb-mon    78    WB-MON        # WIDEBAND Monitoring
wb-expak    79    WB-EXPAK        # WIDEBAND EXPAK
iso-ip    80    ISO-IP        # ISO Internet Protocol
vmtp    81    VMTP        # Versatile Message Transport
secure-vmtp    82    SECURE-VMTP        # SECURE-VMTP
vines    83    VINES        # VINES
ttp    84    TTP        # TTP
nsfnet-igp    85    NSFNET-IGP        # NSFNET-IGP
dgp    86    DGP        # Dissimilar Gateway Protocol
tcf    87    TCF        # TCF
eigrp    88    EIGRP        # Enhanced Interior Routing Protocol (Cisco)
ospf    89    OSPFIGP        # Open Shortest Path First IGP
sprite-rpc    90    Sprite-RPC        # Sprite RPC Protocol
larp    91    LARP        # Locus Address Resolution Protocol
mtp    92    MTP        # Multicast Transport Protocol
ax.25    93    AX.25        # AX.25 Frames
ipip    94    IPIP        # Yet Another IP encapsulation
micp    95    MICP        # Mobile Internetworking Control Pro.
scc-sp    96    SCC-SP        # Semaphore Communications Sec. Pro.
etherip    97    ETHERIP        # Ethernet-within-IP Encapsulation
encap    98    ENCAP        # Yet Another IP encapsulation
#    99            # any private encryption scheme
gmtp    100    GMTP        # GMTP
ifmp    101    IFMP        # Ipsilon Flow Management Protocol
pnni    102    PNNI        # PNNI over IP
pim    103    PIM        # Protocol Independent Multicast
aris    104    ARIS        # ARIS
scps    105    SCPS        # SCPS
qnx    106    QNX        # QNX
a/n    107    A/N        # Active Networks
ipcomp    108    IPComp        # IP Payload Compression Protocol
snp    109    SNP        # Sitara Networks Protocol
compaq-peer    110    Compaq-Peer        # Compaq Peer Protocol
ipx-in-ip    111    IPX-in-IP        # IPX in IP
vrrp    112    VRRP        # Virtual Router Redundancy Protocol
pgm    113    PGM        # PGM Reliable Transport Protocol
#    114            # any 0-hop protocol
l2tp    115    L2TP        # Layer Two Tunneling Protocol
ddx    116    DDX        # D-II Data Exchange
iatp    117    IATP        # Interactive Agent Transfer Protocol
stp    118    STP        # Schedule Transfer
srp    119    SRP        # SpectraLink Radio Protocol
uti    120    UTI        # UTI
smp    121    SMP        # Simple Message Protocol
sm    122    SM        # SM
ptp    123    PTP        # Performance Transparency Protocol
isis    124    ISIS        # ISIS over IPv4
fire    125    FIRE
crtp    126    CRTP        # Combat Radio Transport Protocol
crdup    127    CRUDP        # Combat Radio User Datagram
sscopmce    128    SSCOPMCE
iplt    129    IPLT
sps    130    SPS        # Secure Packet Shield
pipe    131    PIPE        # Private IP Encapsulation within IP
sctp    132    SCTP        # Stream Control Transmission Protocol
fc    133    FC        # Fibre Channel
rsvp-e2e-ignore    134    RSVP-E2E-IGNORE
mobility-header    135    Mobility-Header        # Mobility Header
udplite    136    UDPLite
mpls-in-ip    137    MPLS-in-IP
manet    138    manet        # MANET Protocols
hip    139    HIP        # Host Identity Protocol
shim6    140    Shim6        # Shim6 Protocol
wesp    141    WESP        # Wrapped Encapsulating Security Payload
rohc    142    ROHC        # Robust Header Compression
#   143-252 Unassigned                                       [IANA]
#   253     Use for experimentation and testing           [RFC3692]
#   254     Use for experimentation and testing           [RFC3692]
#   255                 Reserved                             [IANA]

1 comment:

  1. Selva, I need a logic on event handler, client has to tell server to init a function, until that the server has to wait for the client socket call. can u help me in writing this with a simple prog

    ReplyDelete