- To create a stream socket in the Internet domain, you could
use the following call:
int socket( domain , type , protocol);
- Domain: It specifies the communication domain. It takes one
of the predefined values described under the protocol family and
address family above in this lecture.
- Type: It specifies the semantics of communication , or the type of service that is desired . It takes the following values:
- SOCK_STREAM : Stream Socket
- SOCK_DGRAM : Datagram Socket
- SOCK_RAW : Raw Socket
- SOCK_SEQPACKET : Sequenced Packet Socket
- SOCK_RDM : Reliably Delivered Message Packet
- Protocol: This parameter identifies the protocol the socket is supposed to use . Some values are as follows:
- IPPROTO_TCP : For TCP (SOCK_STREAM)
- IPPROTO_UDP : For UDP (SOCK_DRAM)
Since we have only one protocol for each kind of socket, it does not
matter if we do not define any protocol at all. So for simplicity, we
can put "0" (zero) in the protocol field
- stream socket/data gram socket vs raw socket:
* A raw socket is used to receive raw packets. This means packets received at the Ethernet layer will directly pass to the raw socket. Stating it precisely, a raw socket bypasses the normal TCP/IP processing and sends the packets to the specific user application (see Figure 1)
* Other sockets like stream sockets and data gram sockets receive data from the transport layer that contains no headers but only the payload. This means that there is no information about the source IP address and MAC address. If applications running on the same machine or on different machines are communicating, then they are only exchanging data.
* The purpose of a raw socket is absolutely different. A raw socket allows an application to directly access lower level protocols, which means a raw socket receives un-extracted packets (see Figure 2). There is no need to provide the port and IP address to a raw socket, unlike in the case of stream and datagram sockets.
Stream Sockets − Delivery in a networked environment is guaranteed. If you send through the stream socket three items "A, B, C", they will arrive in the same order − "A, B, C". These sockets use TCP (Transmission Control Protocol) for data transmission. If delivery is impossible, the sender receives an error indicator. Data records do not have any boundaries.
Datagram Sockets − Delivery in a networked environment is not guaranteed. They're connectionless because you don't need to have an open connection as in Stream Sockets − you build a packet with the destination information and send it out. They use UDP (User Datagram Protocol).
source: http://opensourceforu.com/2015/03/a-guide-to-using-raw-sockets/
http://www.tutorialspoint.com/unix_sockets/what_is_socket.htm
Example:
s = socket(AF_INET, SOCK_STREAM, 0);
- This call would result in a stream socket being created with
the TCP protocol providing the underlying communication
support.
- If the protocol argument to the socket() call is 0,
socket() will select a default protocol to use with
the returned socket of the type requested. The default
protocol is usually correct; alternate choices aren't
usually available.
- However, when you're using ``raw'' sockets to
communicate directly with lower-level protocols or hardware
interfaces, the protocol argument may be important for
setting up demultiplexing.
- For example, raw sockets in the Internet family may be used to
implement a new protocol above IP. The socket will receive packets
only for the specified protocol. To obtain a particular protocol, you
determine the protocol number defined within the communication domain,
using the getprotobyname()
function, for example:
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
...
pp = getprotobyname("newtcp");
s = socket(AF_INET, SOCK_STREAM, pp->p_proto);
- This would result in a socket s that uses a
stream-based connection, but with protocol type of
newtcp instead of the default
tcp.
With a connection established, data may begin to flow. To
send and receive data, you can choose from several calls.
If the peer entity at each end of a connection is anchored,
you can send or receive a message without specifying the
peer. In this case, you can use the normal
read()
and
write() functions:
write(s, buf, sizeof (buf));
read(s, buf, sizeof (buf));
In addition to
read() and
write(), you
can use the new
recv() and
send() calls:
send(s, buf, sizeof (buf), flags);
recv(s, buf, sizeof (buf), flags);
Although
recv() and
send() are
virtually identical to
read() and
write(), the extra
flags argument is
important (the flag values are defined in
<sys/socket.h>). One or more of the following
flags may be specified:
- MSG_OOB
- Send/receive out-of-band data.
- MSG_PEEK
- Look at data without reading.
- MSG_DONTROUTE
- Send data without routing packets
DATA TRANSFER (in UDP)
To send data, you use the
sendto()
function:
sendto(s, buf, buflen, flags, (struct sockaddr *)&to, tolen);
The
s,
buf,
buflen, and
flags parameters are used as before. The
to and
tolen values indicate the
address of the intended recipient of the message.
To receive messages on an unconnected datagram socket, you
use the
recvfrom() function:
Out-of-band data is a notion specific to stream sockets; we won't immediately consider it here. The option to have data sent without routing applied to the outgoing packets is currently used only by the routing-table management process and is unlikely to be of interest to the casual user.
On the other hand, the ability to preview data can be quite useful. When MSG_PEEK is specified with a recv() call, any data present is returned, but treated as still unread. That is, the next read() or recv() call applied to the socket will return the data previously viewed.
recvfrom( s, buf, buflen, flags,
(struct sockaddr *)&from, &fromlen );
Once again,
fromlen is a value-result parameter,
initially containing the size of the
from buffer, and modified on return to indicate
the actual size of the address that the datagram was
received from.
Purpose of setsockoption:
With certain applications, the algorithm used by the Socket
Manager to select port numbers may be unsuitable. For
example, the Internet file transfer protocol, FTP, specifies
that data connections must always originate from the same
local port (i.e. local from the server's point of view).
|
A server (e.g. ftpd) avoids duplicate
associations because the initiating programs (e.g.
ftp) use different remote ports (i.e. remote from
the server's point of view), even though the server is
accessed from the same local port (i.e. local from the
server's point of view).
In this situation, the Socket Manager would typically
disallow the server's binding the same local address and
port number if a previous data connection's socket still
existed on that port. (This would be a bad thing for servers
such as ftpd, which always want to listen to the
same well-known local port). |
To override the default port selection algorithm, an option
call must be performed prior to address binding:
...
int on = 1;
...
setsockopt(s, SOL_SOCKET, SO_REUSEADDR, &on, sizeof(on));
bind(s, (struct sockaddr *) &sin, sizeof (sin));
With the above call, local addresses already in use may be
bound. This doesn't violate the uniqueness requirement,
because the system still checks at connect time to be
sure any other sockets with the same local address and port
don't have the same remote address and port. If the
association already exists, the error EADDRINUSE is
returned.
By using a datagram socket, you can send broadcast packets
on many networks supported by the system. The network itself
must support broadcasting - the system provides no
broadcast simulation in software.
Broadcast messages can place a high load on a network since
they force every host on the network to service them.
Consequently, the ability to send broadcast packets has been
limited to sockets explicitly marked as allowing
broadcasting. Broadcasting is typically used for one of two
reasons:
- to find a resource on a local network without prior
knowledge of its address
- for functions such as routing that require information to
be sent to all accessible neighbors
To send a broadcast message, a datagram socket should be
created:
s = socket(AF_INET, SOCK_DGRAM, 0);
The socket is marked as allowing broadcasting:
int on = 1;
setsockopt(s, SOL_SOCKET, SO_BROADCAST, &on, sizeof (on));
and at least a port number should be bound to the socket:
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = htonl(INADDR_ANY);
sin.sin_port = htons(MYPORT);
bind(s, (struct sockaddr *) &sin, sizeof (sin));
how to handle multiple sockets at a same time(blocking and non blocking and synchrnous i/o and asynchrnous i/o)
Multiple Sockets
Suppose we have a process which has to handle multiple sockets. We
cannot simply read from one of them if a request comes, because that
will block while waiting on the request on that particular socket. In
the meantime a request may come on any other socket. To handle this
input/output multiplexing we could use different techniques :
- Busy waiting: In this methodology we make all the operations
on sockets non-blocking and handle them simultaneously by doing polling.
For example, we could use the read() system call this way and read from
all the sockets together. The disadvantage in this is that we waste a
lot of CPU cycles. To make the system calls non-blocking we use:
fcntl (s, f_setfl, fndelay);
- Asynchronous I/O: Here we ask the Operating System
to tell us whenever we are waiting for I/O on some sockets. The
Operating System sends a signal whenever there is some I/O. When we
receive a signal, we will have to check all sockets and then wait till
the next signal comes. But there are two problems - first, the signals
are expensive to catch and second, we would not be able to know if an
input comes on a socket when we are doing I/O on another one. For
Asynchronous I/O, we have a different set of commands (here we give the
ones for UNIX with a VHD variant):
signal(sigio, io_handler);
fcntl(s, f_setown, getpid());
fcntl(s, f_setfl, fasync);
- Separate process for each I/O: We could as well fork
out 10 different child processes for 10 different sockets. These child
processes are very light weight and have some communication between
them. Now these processes waiting on each socket can have blocking
system calls. This wastes a lot of memory, data structures and other
resources.
- Select() system call: We can use the select system call
to instruct the Operating System to wait for any one of multiple events
to occur and to wake up the process only if one of these events occur.
This way we would know that the I/O request has come from which socket.
int select(int nfds, fd_set *readfds, fd_set *writefds, fd_set *errorfds, struct timeval *timeout);
void FD_CLR(int fd, fd_set *fdset);
int FD_ISSET(int fd, fd_set *fdset);
void FD_SET(int fd, fd_set *fdset);
void FD_ZERO(fd_set *fdset);
The select() function indicates which of the specified file
descriptors is ready for reading, ready for writing, or has an error
condition pending. If the specified condition is false for all of the
specified file descriptors, select() blocks up to the specified
timeout interval, until the specified condition is true for at least one
of the specified file descriptors. The nfds argument specifies the
range of file descriptors to be tested. The select() function tests file descriptors in the range of 0 to nfds-1. readfds, writefds and errorfds arguments point to an object of type fd_set. readfds specifies the file descriptors to be checked for being ready to read. writefds specifies the file descriptors to be checked for being ready to write, errorfds specifies the file descriptors to be checked for error conditions pending.
On successful completion, the objects pointed to by the readfds, writefds, and errorfds
arguments are modified to indicate which file descriptors are ready for
reading, ready for writing, or have an error condition pending,
respectively. For each file descriptor less than nfds, the corresponding
bit will be set on successful completion if it was set on input and the
associated condition is true for that file descriptor. The timeout is
an upper bound on the amount of time elapsed before select returns. It
may be zero, causing select to return immediately. If the timeout is a
null pointer, select() blocks until an event causes one of the
masks to be returned with a valid (non-zero) value. If the time limit
expires before any event occurs that would cause one of the masks to be
set to a non-zero value, select() completes successfully and returns 0.
synchronous and asynchnornous ( blocking and non blocking):
Synchronous or Asynchronous?
* practical example of blocking and unblocking usnig web browser:
http://www.scottklement.com/rpg/socktut/nonblocking.html
https://www.ibm.com/support/knowledgecenter/en/SSLTBW_2.1.0/com.ibm.zos.v2r1.hala001/orgblockasyn.htm
There are many types of sockets. Two of them are blocking and nonblocking. Blocking sockets are the
ones that get blocked (no line of code executes after this) after making a system call until a reply comes or timeout or some kind of error occurs. On the other
hand, nonblocking continue the execution after making a system call and do not wait for reply.
Let's say that you're writing a web browser. You try to connect to a web server, but
the server isn't responding. When a user presses (or clicks) a stop button, you want
the connect() API to stop trying to connect.
With what you've learned so far, that can't be done. When you issue a call to
connect(), your program doesn't regain control until either the connection is made,
or an error occurs.
The solution to this problem is called "non-blocking sockets". By default, TCP
sockets are in "blocking" mode. For example, when you call recv() to read from a
stream, control isn't returned to your program until at least one byte of data is
read from the remote site. This process of waiting for data to appear is referred
to as "blocking". The same is true for the write() API, the connect() API, etc.
When you run them, the connection "blocks" until the operation is complete.
Its possible to set a descriptor so that it is placed in "non-blocking" mode.
When placed in non-blocking mode, you never wait for an operation to complete.
This is an invaluable tool if you need to switch between many different connected
sockets, and want to ensure that none of them cause the program to "lock up."
Network communication (or file system access in general) system
calls may operate in two modes: synchronous or asynchronous. In the
synchronous mode, socket routines return only when the operation
is complete. For example,
accept returns only when a connection
arrives. In the asynchronous mode, socket routines return immediately:
system calls become non-blocking calls (e.g.,
read does not block, waiting
until data arrives).
You can change the mode with the fcntl system call. For example,
fcntl(s, F_SETFF, FNDELAY);
sets the socket
s
to operate in asynchronous mode
note: can see detailed usage of fcntl in this programming example:
www.lowtek.com/sockets/select.html
Table 1. Socket programming interface actions
Call type | Socket state | blocking | Nonblocking |
Types of read() calls | Input is available | Immediate return | Immediate return |
No input is available | Wait for input | Immediate return with EWOULDBLOCK error number (select() exception: READ) |
Types of write() calls | Output buffers available | Immediate return | Immediate return |
No output buffers available | Wait for output buffers | Immediate return with EWOULDBLOCK error number (select() exception: WRITE) |
accept() call | New connection | Immediate return | Immediate return |
No connections queued | Wait for new connection | Immediate return with EWOULDBLOCK error number (select() exception: READ) |
connect() call | | Wait | Immediate return with EINPROGRESS error number (select() exception: WRITE) |
* select is completely unblocking:
When you use select() call logic, you do not issue any socket call on a given socket
until the select() call tells you that something has happened on that socket;
for example, data has arrived and is ready to be read by a read() call.
By using the select() call, you do not issue a blocking call until you know that
the call cannot block.
reference:
http://users.pja.edu.pl/~jms/qnx/help/tcpip_4.25_en/prog_guide/sock_advanced_tut.html
http://cse.iitk.ac.in/users/dheeraj/cs425/lec20.html
http://www.cs.rutgers.edu/~pxk/rutgers/notes/sockets/
https://www.quora.com/In-networking-programming-what-is-nonblocking-socket
http://www.ibm.com/developerworks/aix/library/au-tcpsystemcalls/
http://sock-raw.org/papers/sock_raw (socket very detail and worth reading)
# /etc/protocols:
# $Id: protocols,v 1.11 2011/05/03 14:45:40 ovasik Exp $
#
# Internet (IP) protocols
#
# from: @(#)protocols 5.1 (Berkeley) 4/17/89
#
# Updated for NetBSD based on RFC 1340, Assigned Numbers (July 1992).
# Last IANA update included dated 2011-05-03
#
# See also http://www.iana.org/assignments/protocol-numbers
ip 0 IP # internet protocol, pseudo protocol number
hopopt 0 HOPOPT # hop-by-hop options for ipv6
icmp 1 ICMP # internet control message protocol
igmp 2 IGMP # internet group management protocol
ggp 3 GGP # gateway-gateway protocol
ipv4 4 IPv4 # IPv4 encapsulation
st 5 ST # ST datagram mode
tcp 6 TCP # transmission control protocol
cbt 7 CBT # CBT, Tony Ballardie <A.Ballardie@cs.ucl.ac.uk>
egp 8 EGP # exterior gateway protocol
igp 9 IGP # any private interior gateway (Cisco: for IGRP)
bbn-rcc 10 BBN-RCC-MON # BBN RCC Monitoring
nvp 11 NVP-II # Network Voice Protocol
pup 12 PUP # PARC universal packet protocol
argus 13 ARGUS # ARGUS
emcon 14 EMCON # EMCON
xnet 15 XNET # Cross Net Debugger
chaos 16 CHAOS # Chaos
udp 17 UDP # user datagram protocol
mux 18 MUX # Multiplexing protocol
dcn 19 DCN-MEAS # DCN Measurement Subsystems
hmp 20 HMP # host monitoring protocol
prm 21 PRM # packet radio measurement protocol
xns-idp 22 XNS-IDP # Xerox NS IDP
trunk-1 23 TRUNK-1 # Trunk-1
trunk-2 24 TRUNK-2 # Trunk-2
leaf-1 25 LEAF-1 # Leaf-1
leaf-2 26 LEAF-2 # Leaf-2
rdp 27 RDP # "reliable datagram" protocol
irtp 28 IRTP # Internet Reliable Transaction Protocol
iso-tp4 29 ISO-TP4 # ISO Transport Protocol Class 4
netblt 30 NETBLT # Bulk Data Transfer Protocol
mfe-nsp 31 MFE-NSP # MFE Network Services Protocol
merit-inp 32 MERIT-INP # MERIT Internodal Protocol
dccp 33 DCCP # Datagram Congestion Control Protocol
3pc 34 3PC # Third Party Connect Protocol
idpr 35 IDPR # Inter-Domain Policy Routing Protocol
xtp 36 XTP # Xpress Tranfer Protocol
ddp 37 DDP # Datagram Delivery Protocol
idpr-cmtp 38 IDPR-CMTP # IDPR Control Message Transport Proto
tp++ 39 TP++ # TP++ Transport Protocol
il 40 IL # IL Transport Protocol
ipv6 41 IPv6 # IPv6 encapsulation
sdrp 42 SDRP # Source Demand Routing Protocol
ipv6-route 43 IPv6-Route # Routing Header for IPv6
ipv6-frag 44 IPv6-Frag # Fragment Header for IPv6
idrp 45 IDRP # Inter-Domain Routing Protocol
rsvp 46 RSVP # Resource ReSerVation Protocol
gre 47 GRE # Generic Routing Encapsulation
dsr 48 DSR # Dynamic Source Routing Protocol
bna 49 BNA # BNA
esp 50 ESP # Encap Security Payload
ipv6-crypt 50 IPv6-Crypt # Encryption Header for IPv6 (not in official list)
ah 51 AH # Authentication Header
ipv6-auth 51 IPv6-Auth # Authentication Header for IPv6 (not in official list)
i-nlsp 52 I-NLSP # Integrated Net Layer Security TUBA
swipe 53 SWIPE # IP with Encryption
narp 54 NARP # NBMA Address Resolution Protocol
mobile 55 MOBILE # IP Mobility
tlsp 56 TLSP # Transport Layer Security Protocol
skip 57 SKIP # SKIP
ipv6-icmp 58 IPv6-ICMP # ICMP for IPv6
ipv6-nonxt 59 IPv6-NoNxt # No Next Header for IPv6
ipv6-opts 60 IPv6-Opts # Destination Options for IPv6
# 61 # any host internal protocol
cftp 62 CFTP # CFTP
# 63 # any local network
sat-expak 64 SAT-EXPAK # SATNET and Backroom EXPAK
kryptolan 65 KRYPTOLAN # Kryptolan
rvd 66 RVD # MIT Remote Virtual Disk Protocol
ippc 67 IPPC # Internet Pluribus Packet Core
# 68 # any distributed file system
sat-mon 69 SAT-MON # SATNET Monitoring
visa 70 VISA # VISA Protocol
ipcv 71 IPCV # Internet Packet Core Utility
cpnx 72 CPNX # Computer Protocol Network Executive
cphb 73 CPHB # Computer Protocol Heart Beat
wsn 74 WSN # Wang Span Network
pvp 75 PVP # Packet Video Protocol
br-sat-mon 76 BR-SAT-MON # Backroom SATNET Monitoring
sun-nd 77 SUN-ND # SUN ND PROTOCOL-Temporary
wb-mon 78 WB-MON # WIDEBAND Monitoring
wb-expak 79 WB-EXPAK # WIDEBAND EXPAK
iso-ip 80 ISO-IP # ISO Internet Protocol
vmtp 81 VMTP # Versatile Message Transport
secure-vmtp 82 SECURE-VMTP # SECURE-VMTP
vines 83 VINES # VINES
ttp 84 TTP # TTP
nsfnet-igp 85 NSFNET-IGP # NSFNET-IGP
dgp 86 DGP # Dissimilar Gateway Protocol
tcf 87 TCF # TCF
eigrp 88 EIGRP # Enhanced Interior Routing Protocol (Cisco)
ospf 89 OSPFIGP # Open Shortest Path First IGP
sprite-rpc 90 Sprite-RPC # Sprite RPC Protocol
larp 91 LARP # Locus Address Resolution Protocol
mtp 92 MTP # Multicast Transport Protocol
ax.25 93 AX.25 # AX.25 Frames
ipip 94 IPIP # Yet Another IP encapsulation
micp 95 MICP # Mobile Internetworking Control Pro.
scc-sp 96 SCC-SP # Semaphore Communications Sec. Pro.
etherip 97 ETHERIP # Ethernet-within-IP Encapsulation
encap 98 ENCAP # Yet Another IP encapsulation
# 99 # any private encryption scheme
gmtp 100 GMTP # GMTP
ifmp 101 IFMP # Ipsilon Flow Management Protocol
pnni 102 PNNI # PNNI over IP
pim 103 PIM # Protocol Independent Multicast
aris 104 ARIS # ARIS
scps 105 SCPS # SCPS
qnx 106 QNX # QNX
a/n 107 A/N # Active Networks
ipcomp 108 IPComp # IP Payload Compression Protocol
snp 109 SNP # Sitara Networks Protocol
compaq-peer 110 Compaq-Peer # Compaq Peer Protocol
ipx-in-ip 111 IPX-in-IP # IPX in IP
vrrp 112 VRRP # Virtual Router Redundancy Protocol
pgm 113 PGM # PGM Reliable Transport Protocol
# 114 # any 0-hop protocol
l2tp 115 L2TP # Layer Two Tunneling Protocol
ddx 116 DDX # D-II Data Exchange
iatp 117 IATP # Interactive Agent Transfer Protocol
stp 118 STP # Schedule Transfer
srp 119 SRP # SpectraLink Radio Protocol
uti 120 UTI # UTI
smp 121 SMP # Simple Message Protocol
sm 122 SM # SM
ptp 123 PTP # Performance Transparency Protocol
isis 124 ISIS # ISIS over IPv4
fire 125 FIRE
crtp 126 CRTP # Combat Radio Transport Protocol
crdup 127 CRUDP # Combat Radio User Datagram
sscopmce 128 SSCOPMCE
iplt 129 IPLT
sps 130 SPS # Secure Packet Shield
pipe 131 PIPE # Private IP Encapsulation within IP
sctp 132 SCTP # Stream Control Transmission Protocol
fc 133 FC # Fibre Channel
rsvp-e2e-ignore 134 RSVP-E2E-IGNORE
mobility-header 135 Mobility-Header # Mobility Header
udplite 136 UDPLite
mpls-in-ip 137 MPLS-in-IP
manet 138 manet # MANET Protocols
hip 139 HIP # Host Identity Protocol
shim6 140 Shim6 # Shim6 Protocol
wesp 141 WESP # Wrapped Encapsulating Security Payload
rohc 142 ROHC # Robust Header Compression
# 143-252 Unassigned [IANA]
# 253 Use for experimentation and testing [RFC3692]
# 254 Use for experimentation and testing [RFC3692]
# 255 Reserved [IANA]