Computer Science
IP(4) Linux Programmer's Manual IP(4)
NAME
ip - Linux IPv4 protocol implementation
SYNOPSIS
#include <sys/socket.h>
#include <net/netinet.h>
tcp_socket = socket(PF_INET, SOCK_STREAM, 0);
raw_socket = socket(PF_INET, SOCK_RAW, protocol);
udp_socket = socket(PF_INET, SOCK_DGRAM, protocol);
DESCRIPTION
Linux implements the IPv4 protocol described in RFC791 and
RFC1122. ip contains a level 2 multicasting implementa-
tion comforming to RFC1112. It also contains an IP router
including a packet filter.
The protocol is implemented in the kernel on the basis of
a BSD compatible socket interface. For more information
on sockets, see socket(4).
An IP socket is created by calling the socket(2) function
with a PF_INET socket family argument. Valid socket types
are SOCK_STREAM to open a tcp(4) socket, SOCK_DGRAM to
open a udp(4) socket, or SOCK_RAW to open a raw socket.
protocol is the IP protocol in the IP header to be
received or sent. For TCP and UDP sockets, only 0,
IPPROTO_TCP , or IPPROTO_UDP are valid. For SOCK_RAW you
may specify a valid IANA IP protocol defined in RFC1700
assigned numbers.
Raw sockets may only be opened by a process with effective
user id 0 or when the process has the CAP_NET_RAW capabil-
ity.
When a process wants to receive new incoming packets or
connections, it should be bound to a local interface
address using bind(2). When INADDR_ANY is specified it
will bind to any local interface. A bound TCP socket is
unavailable for some time after closing, unless the
SO_REUSEADDR flag is set.
ADDRESS FORMAT
An IP socket address is defined as a combination of an IP
interface address and a port number.
struct sockaddr_in {
sa_family_t sin_family;/* address family: AF_INET */
u_int16_t sin_port; /* port in network byte order */
struct in_addr sin_addr;/* internet address */
};
/* Internet address. */
struct in_addr {
u_int32_t s_addr; /* IPv4 address in network byte order */
};
sin_family is always set to AF_INET. This is required; in
Linux 2.2 most networking functions return EINVAL when
this setting is missing. sin_port contains the port in
network byte order. The port numbers below 1024 are called
reserved ports. Only processes with the effective user id
0 or the CAP_NET_BIND_SERVICE attribute set may bind(2) to
these sockets. Note that the raw IPv4 protocol as such has
no concept of a port, they are only implemented by higher
protocols like tcp(4) and udp(4).
sin_addr is the host address. The addr member of struct
in_addr contains the host interface address in network
order. in_addr should be only accessed using the
inet_aton(3), inet_addr(3), inet_makeaddr(3) library func-
tions or directly with the name resolver (see gethostby-
name(3) ). IPv4 addresses are divided into unicast, broad-
cast and multicast addresses. Unicast addresses specify a
single interface of a host, broadcast addresses specify
all host on a network and multicast addresses address all
hosts in a multicast group. Datagrams to broadcast
addresses are only passed to the user when the socket
broadcast flag is set. To send datagrams to broadcast
addresses it has to be set too. Connection oriented sock-
ets are only allowed to use unicast addresses.
Note that the address and the port are always stored in
network order, this particulary means that you need to
call htons(3) on the number that is assigned to a port.
All address/port manipulation functions in the standard
library automatically convert to network order.
SOCKET OPTIONS
IP supports some protocol specific socket options that can
be set with setsockopt(2) and read by getsockopt(2). The
socket option level for IP is SOL_IP
IP_OPTIONS
Sets or get the IP options to be sent with every
packet from this socket. The arguments are a
pointer to a memory buffer contained the options
and the option length. Setsockopt sets the IP
options associated with a socket. Maximum option
size for IPv4 is 40 bytes. See RFC791 for the
allowed options. When the initial connection
request packet for a SOCK_STREAM socket contains IP
options the outgoing IP options will be automati-
cally set to the received options with routing
headers reversed. Thus, outgoing packets will echo
the received options then. After the connection is
established incoming packets are not allowed to
change options anymore. The processing of all
incoming source routing options can be disabled
using the accept_source_route sysctl, which is off
by default. For datagram sockets IP options can be
only set by the local user. getsockopt returns the
current send IP options.
IP_PKTINFO
Pass a IP_PKTINFO ancillary message that contains a
pktinfo structure that supplies some information
about the incoming packet. This only works for
datagram oriented sockets.
struct in_pktinfo
{
unsigned int ipi_ifindex; /* Interface index */
struct in_addr ipi_spec_dst;/* Routing destination address */
struct in_addr ipi_addr; /* Header Destination address */
};
ipi_ifindex is the index of the interface the
packet was received on. The ipi_spec_dst address
is the RFC specified destination address and may
differ from ipi_addr when the packet contains
source routing options.
If IP_PKTINFO is passed to sendmsg(2) then the out-
going packet will be sent over the interface speci-
fied in ipi_ifindex with the destination address
set to ipi_spec_dst
IP_RECVTOS
If enabled the IP_TOS ancillary message is passed
with incomming packets. It contains a byte with the
Type of Service/Precedence field of the packet
header as a byte. Expects a boolean integer flag.
IP_RECVTTL
Set or read a flag to pass a IP_RECVTTL ancillary
message that contains the time to live field of the
received packet as a byte. Not supported for
SOCK_STREAM sockets.
IP_RECVOPTS
Pass all incoming IP options to the user in a
IP_OPTIONS control message. The routing header and
other options are already filled in for the local
host. Not supported for SOCK_STREAM sockets.
IP_RETOPTS
Identical to IP_RECVOPTS but returns raw unpro-
cessed options with timestamp and route record
options not filled in for this hop.
IP_TOS Set or receive the Type-Of-Service (TOS) field that
is sent with every IP packet originating from this
socket. It is used to prioritize packets on the
network. TOS is a byte. There are some standard
TOS flags defined: IPTOS_LOWDELAY to minimize
delays for interactive traffic, IPTOS_THROUGHPUT to
optimize throughput, IPTOS_RELIABILITY to optimize
for reliability, IPTOS_MINCOST should be used for
"filler data" where slow transmission doesn't mat-
ter. At most one of these TOS values can be speci-
fied. Other bits are invalid and shall be cleared.
Linux per default sends IPTOS_LOWDELAY datagrams
first, but the exact behaviour depends on the con-
figured queueing discipline. Some high priority
levels may require an effective user id 0 or the
CAP_NET_ADMIN attribute set. The priority can also
be set in a protocol independent way by the
(SOL_SOCKET, SO_PRIORITY) socket option (see
socket(4) ).
IP_TTL Set or receive the time to live field for every
outgoing IP packet.
IP_HDRINCL
If enabled the user supplies his own ip header in
front of the user data. Only valid for SOCK_RAW
sockets. See raw(4) for more information. When this
flag is enabled the values set by IP_OPTIONS,
IP_TTL, IP_TOS are ignored.
IP_RECVERR Enable extended reliable error message
passing. When enabled on a datagram socket all
generated errors will be queued in a per-socket
error queue. When the user gets an error (by a
error return of a socket operation) then the errors
can be received by calling recvmsg(2) with the
MSG_ERRQUEUE flag set. The sock_extended_err struc-
ture describing the error will be passed in a
ancillary message with the type IP_RECVERR and the
level SOL_IP. This is useful for reliable error
handling on unconnected sockets. The received data
portion of the error queue contains the error
packet.
IP uses the sock_extended_err structure as follows:
ee_origin set to SO_EE_ORIGIN_ICMP for errors
received as an ICMP packet, or SO_EE_ORIGIN_LOCAL
for locally generated errors. ee_type and ee_code
are set from the type and code fields of the ICMP
header. ee_info contains the discovered MTU for
EMSGSIZE errors. ee_data is currently not used.
When the error originated from the network, all IP
options (IP_OPTIONS, IP_TTL, etc.) enabled on the
socket and contained in the error packet are passed
as control messages. The payload of the packet
causing the error is returned as normal data.
On SOCK_STREAM TCP sockets, IP_RECVERR has a
slightly different semantic. Instead of queueing
the errors reliably, it passes all incoming errors
immediately to the user. This might be useful for
very short-lived TCP connection that need quick
error handling. Use this option with care: it makes
TCP unreliable by not allowing it to recover prop-
erly from routing shifts and other normal condi-
tions. Note that TCP has no error queue;
MSG_ERRQUEUE is not invalid on SOCK_STREAM sockets.
All errors are passed by return value only.
For raw sockets, IP_RECVERR enables passing of all
received ICMP errors to the application. This is
turned off by default for compatibility.
It sets or receives an integer boolean flag.
IP_RECVERR defaults to off.
IP_PMTU_DISCOVER
Sets or receives the Path MTU Discovery setting for
a socket. When enabled, Linux will perform Path MTU
Discovery as defined in RFC1191 on this socket. The
system-wide default is controlled by the
ip_no_pmtu_disc sysctl for SOCK_STREAM sockets, and
disabled on all others. The user can retrieve the
path MTU using the IP_MTU or the IP_RECVERR
options.
+-------------------------+--------------------------------+
|Path MTU discovery flags | Meaning |
+-------------------------+--------------------------------+
|IP_PMTUDISC_WANT | Use per-route settings. |
+-------------------------+--------------------------------+
|IP_PMTUDISC_DONT | Never do Path MTU Discovery. |
+-------------------------+--------------------------------+
|IP_PMTUDISC_DO | Always do Path MTU Discovery. |
+-------------------------+--------------------------------+
When PMTU discovery is enabled the kernel automati-
cally keeps track of the path MTU. For TCP sockets
the outgoing packets are automatically sized based
on the path MTU, for datagram oriented sockets the
user has to size the datagrams appropiately. When
it is enabled the kernel rejects packets bigger
than the path MTU with EMSGSIZE raw(4) and udp(4)
for more information.
IP_MTU Retrieve the current known path MTU of the current
socket. Only valid when the socket has been con-
nected. Returns an integer. Only valid as a get-
sockopt(2).
IP_ROUTER_ALERT
Pass all forwarded packets with the IP Router Alert
option set to this socket. Only valid for raw sock-
ets. This is useful, for instance, for user space
RSVP daemons. Expects an integer argument.
IP_MULTICAST_TTL
Set or reads the time-to-live value of outgoing
multicast packets for this socket. It is very
important for multicast packets to set the smallest
TTL possible. The default is 1 which means that
multicast packets don't leave the local network
unless the user program explicitly requests it.
Argument is an integer.
IP_MULTICAST_LOOP
Sets or reads a boolean integer argument whether
sent multicast packets should be looped back to the
local sockets.
IP_ADD_MEMBERSHIP
Join a multicast group. Argument is a struct
ip_mreqn structure.
struct ip_mreqn
{
struct in_addr imr_multiaddr;/* IP multicast group address */
struct in_addr imr_address;/* IP address of local interface */
int imr_ifindex;/* interface index */
};
imr_multiaddr contains the address of the multicast
group the application wants to join or leave. It
must be a valid multicast address. imr_address is
the address of the local interface with which the
system should join the multicast group; if it is
equal to INADDR_ANY an appropriate interface is
chosen by the system. imr_ifindex is the interface
index of the interface that should join/leave the
imr_multiaddr group, or 0 to indicate any inter-
face.
For compatibility, the old ip_mreq structure is
still supported. It differs from ip_mreqn only by
not including the imr_ifindex field. Only valid as
a setsockopt(2).
IP_DROP_MEMBERSHIP
Leave a multicast group. Argument is an ip_mreqn or
ip_mreq structure similar to IP_ADD_MEMBERSHIP.
IP_MULTICAST_IF
Set the local device for a multicast socket. Argu-
ment is an ip_mreqn or ip_mreq structure similar to
IP_ADD_MEMBERSHIP.
When an invalid socket option is passed, ENOPRO-
TOOPT is returned.
SYSCTLS
The IP protocol supports the sysctl interface to configure
some global options. The sysctls can be accessed by read-
ing or writing the /proc/sys/net/ipv4/* files or using the
sysctl(2) interface.
ip_default_ttl
Set the default time-to-live value of outgoing
packets. This can be changed per socket with the
IP_TTL option.
ip_forward
Enable IP forwarding with a boolean flag. IP for-
warding can be also set on a per interface basis.
ip_dynaddr
Enable dynamic socket address rewriting on inter-
face address change. This is useful for dialup
interface with changing IP addresses.
ip_autoconfig
Not documented.
ip_local_port_range
Contains two integers that define the default local
port range allocated to sockets. Allocation starts
with the first number and ends with the second num-
ber.
ip_no_pmtu_disc
If enabled, don't do Path MTU Discovery for TCP
sockets by default. Path MTU discovery may fail if
misconfigured firewalls (that drop all ICMP pack-
ets) or misconfigured interfaces (e.g., a point-to-
point link where the both ends don't agree on the
MTU) are on the path. It is better to fix the bro-
ken routers on the path than to turn off Path MTU
Discovery globally, because not doing it incurs a
high cost to the network.
ipfrag_high_thresh and ipfrag_low_thresh
If the amount of queued IP fragments reaches
ipfrag_high_thresh, the queue is pruned down to
ipfrag_low_thresh. Contains an integer with the
number of bytes.
IOCTLS
These ioctls can be accessed using ioctl(2). The correct
syntax is:
error = ioctl(ip_socket, ioctl_type, value_ptr);
SIOCGSTAMP
Return a struct timeval with the receive timestamp
of the last packet passed to the user. This is use-
ful for accurate round trip time measurements. See
setitimer(2) for a description of struct timeval.
FIOCSETOWN and SIOCSPGRP
Set the process or process group (negative value
passed with a process group id of the absolute
value) to send SIGIO or SIGURG signals to when an
asynchronous I/O operation has finished or urgent
data is available. Argument is a pid_t. Only pro-
cesses with effective user id 0 may set this value
to an arbitrary process/group id; all others only
to processes/groups with a matching effective group
id or user id.
FIOASYNC
Set a flag to enable or disable asynchronous mode
of the socket. Asynchronous mode means that SIGIO
is raised when a new I/O event occurs.
See socket(4) for a description of the valid IO
events.
FIOCGETOWN and SIOCGPGRP
Get the current process or process group that
receive SIGIO or SIGURG signals, or 0 when none is
set. Argument is a pid_t.
The ioctls to configure firewalling are documented in
ipfw(4) from the ipchains package.
Ioctls to configure generic device parameters are
described in netdevice(4).
NOTES
Be very careful with the SO_BROADCAST option - it is not
privileged in Linux. It is easy to overload the network
with careless broadcasts. For new application protocols it
is better to use a multicast group instead of broadcast-
ing. Broadcasting is discouraged.
Some other BSD sockets implementations provide IP_RCVD-
STADDR and IP_RECVIF socket options to get the destination
address and the interface of received datagrams. Linux has
the more general IP_PKTINFO for the same task.
ERRORS
ENOTCONN
The operation is only defined on a connected
socket, but the socket wasn't connected.
EINVAL Invalid argument passed.
EMSGSIZE
Datagram is bigger than an MTU on the path and it
cannot be fragmented.
EACCES The user tried to execute an operation without the
necessary permissions. These include sending to a
broadcast address without having the broadcast
flag set, trying to modify the firewall settings
without effective user id 0 or CAP_NET_ADMIN, or
trying to bind to a reserved port without effec-
tive user id 0 or CAP_NET_BIND_SERVICE.
EADDRINUSE
Tried to bind to an address already in use.
ENOMEM and ENOBUFS
Not enough memory available.
ENOPROTOOPT and EOPNOTSUPP
Invalid socket option passed.
EPERM User doesn't have permission to set high priority,
change configuration, or send signals to the
requested process or group,
EADDRNOTAVAIL
A non-existent interface was requested or the
requested source address was not local.
EAGAIN Operation on a non-blocking socket would block.
ESOCKTNOSUPPORT
The socket is not configured or an unknown socket
type was requested.
EISCONN connect(2) was called on an already connected
socket.
EALREADY
An connection operation on a non-blocking socket
is already in progress.
ECONNABORTED
A connection was closed during an accept(2).
EPIPE The connection was unexpectedly closed or shut
down by the other end.
ENOENT SIOCGSTAMP was called on a socket where no packet
arrived.
EHOSTUNREACH
No routing table entry matches the destination
address.
ENODEV Network device not available or not capable of
sending IP.
ENOPKG A kernel subsystem was not configured.
Other errors may be generated by the underlying protocols;
see tcp(4), raw(4), udp(4) or the generic socket layer.
VERSIONS
IP_PKTINFO, IP_MTU, IP_PMTU_DISCOVER, IP_PKTINFO,
IP_RECVERR, and IP_ROUTER_ALERT are new options in Linux
2.2.
struct ip_mreqn is new in Linux 2.2. Linux 2.0 only sup-
ported ip_mreq.
The sysctls were introduced with Linux 2.2.
COMPATIBILITY
For compatibility with Linux 2.0, the obsolete
socket(PF_INET, SOCK_RAW, protocol) syntax is still sup-
ported to open a packet(4) socket. This is deprecated and
should be replaced by socket(PF_PACKET, SOCK_RAW, proto-
col) instead. The main difference is the new sockaddr_ll
address structure for generic link layer information
instead of the old sockaddr_pkt.
BUGS
There are too many inconsistent error values.
The ioctls to configure IP-specific interface options and
ARP tables are not described.
AUTHORS
This man page was written by Andi Kleen.
SEE ALSO
sendmsg(2), recvmsg(2), socket(4), netlink(4), tcp(4),
udp(4), raw(4), ipfw(4)
RFC791, RFC1122, RFC1812
Linux Man Page 24 Dec 1998 1
Back to the index