by Alessandro Rubini
Figure One: The data flow through insane (INterface SAmple for Network Errors), which simulates random packet loss, or intermittent network failure.
In the Linux (or Unix) world, most network interfaces, such as eth0 and ppp0, are associated with a physical device that is in charge of transmitting and receiving data packets. However, some logical network interfaces don't feature any physical packet transmission. The most well-known examples of these "virtual" interfaces are the shaper and eql interfaces. This month, we'll look at how this kind of interface attaches to the kernel and to the packet-transmission mechanism.
From the kernel's point of view, a network interface is a software object that can process outgoing packets, with the actual transmission mechanism hidden inside the interface driver. Even though most interfaces are associated with physical devices (or, for the loopback interface, to a software-only data loop), it is possible to design network-interface drivers that rely on other interfaces to perform actual packet transmission. The idea of a "virtual" interface can be useful to implement special-purpose processing on data packets while avoiding hacking the network subsystem of the kernel. Although some of what can be accomplished by a virtual interface is more easily implemented by writing a netfilter module, not everything can be implemented by netfilters, and the virtual interface is an additional tool for customizing network behavior.
To support this discussion with a real-world example, I wrote an insane (INterface SAmple for Network Errors) driver, available from ftp//ftp.linux.it/pub/People/Rubini/ insane.tar.gz. The interface simulates semi-random packet loss or intermittent network failures. (This kind of functionality can be more easily accomplished with netfilters, and is shown here only to exemplify the related API.) The code fragments shown here are part of the insane driver and have been tested with Linux 2.3.42. While the following des cription is rather terse, the sample code is well-commented and tries to fill in some of the gaps left open by this quick tour of the topic.
How an Interface Plugs into the Kernel
Like many other kinds of device drivers, a network-interface module connects to the rest of Linux by registering its own data structure within the kernel. The insane driver, for example, registers itself by calling register_netdev(&insane_dev);.
The device structure being registered, insane_dev, is a struct net_device object (Linux 2.3.13 and earlier called it struct device), and it must feature at least two valid fields: the interface name and a pointer to its initialization function:
static struct net_device insane_dev = {
name: "insane",
init: insane_init,
};
The init callback is meant for internal use by the driver: It usually fills other fields of the data structure with pointers to device methods, the functions that perform the real work during the interface's lifetime. When an interface driver is linked into the kernel (instead of being loaded as a module), the first task of the init function is to check whether the interface hardware is there.
The interface can be removed by calling unregister_netdev(), usually invoked by cleanup_module() (or not invoked at all if the driver is not modularized). The net_ device structure includes, in addition to all the standardized fields, a "private" pointer (a void *) that can be used by the driver for its own use. Where virtual interfaces are concerned, the private field is the best place to store configuration information; Listing One shows how the insane sample interface follows the good practice of allocating its own priv structure at initialization time.
Listing One: insane Allocates Its Own priv Structure at Initialization
/* priv is used to host the statistics, and packet dropping policy */
dev->priv = kmalloc(sizeof(struct insane_private), GFP_USER);
if (!dev->priv) return -ENOMEM;
memset(dev->priv, 0, sizeof(struct insane_private));
The allocation is released at interface shutdown (i.e., when the module is removed from the kernel).
Device Methods
A network-interface object, like most kernel objects, exports a list of methods so the rest of the kernel can use it. These methods are function pointers located in fields of the object data structure, here struct net_device.
An interface can be perfectly functional by exporting just a subset of all the methods; the recommended minimum subset includes open,stop (i.e., close), do_ioctl, and get_ stats. These methods are directly related to system calls invoked by a user program (such as ifconfig). With the exception of ioctl, which needs some detailed discussion, their implementation is pretty trivial, and they turn out to be just a few lines of code (See Listing Two).
Listing Two: Exporting Methods
int
insane_open(struct net_device *dev)
{
dev->start = 1;
MOD_INC_USE_COUNT;
return 0;
}
int insane_close(struct net_device *dev)
{
dev->start = 0;
MOD_DEC_USE_COUNT;
return 0;
}
struct net_device_stats *insane_get_stats(struct net_device *dev)
{
return &((struct insane_private *)dev->priv)->priv_stats;
}
The open method is called when you call ifconfig insane up, and close is called with ifconfig insane down; get_stats returns a pointer to the local statistics structure and is used by ifconfig as well as by the /proc filesystem. The driver is responsible for filling the statistic information (although it may choose not to), whose fields are defined in linux/netdevice.h).
Other methods are related to the low-level details of packet transmission, but they fall outside of the scope of this discussion (although they are implemented in the source package). The only interesting low-level method is hard_ start_xmit, which I discuss later.
ioctl
The do_ioctl call is the most important entry point for virtual interfaces. When a user program configures the behavior of the interface, it invokes the ioctl() system call. This is how shapecfg defines network shaping and how eql_enslave attaches real interfaces to the load-balancing interface eql. Similarly, the insanely application configures the insane behavior on the insane virtual interface. Unlike "normal" device drivers, such as char and block drivers, the implementation of ioctl for interfaces is pretty well-defined. The invoking file des criptor must be a socket, the available commands are only SIOCDEVPRIVATE to SIOCDEVPRIVATE 15, and the infamous "third argument" of the system call is always a struct ifreq * pointer instead of the generic void * pointer. This restriction in ioctl arguments takes place because socket ioctl commands span several logical layers and several protocols.
The predefined values are reserved for a device's private use and are unique throughout the protocol stack. Note that no other ioctl command will be delivered to the network-interface method, so you really cannot choose your own values. Passing a predefined data structure to ioctl doesn't necessarily limit the flexibility of interface configuration, however, since the ifreq structure includes the data field, a caddr_t value that can point to arbitrary configuration information.
Based on the information above, the insane interface can be controlled using these commands (defined in insane.h):
#define SIOCINSANESETINFO SIOCDEVPRIVATE
#define SIOCINSANEGETINFO
(SIOCDEVPRIVATE 1)
The actual use of the command within the user-space program insanely turns out to be pretty simple:
int sock = socket(AF_INET, SOCK_RAW,
IPPROTO_RAW);
/* a struct for passing data */
struct insane_userinfo info;
struct ifreq req;
strcpy(req.ifr_name, "insane");
req.ifr_data = (caddr_t)&info;
/* fill info structure... */
if ( ioctl(sock, SIOCINSANESETINFO,
&req) < 0 ) {
/* deal with error */
}
The kernel-space counterpart of the configuration process is slightly more complex, but only because it must deal with permission checks and copying data.
struct insane_userinfo info;
struct insane_userinfo *uptr;
/* check if authorized to set info */
if (cmd == SIOCINSANESETINFO &&
!capable(CAP_NET_ADMIN))
return -EPERM;
/* get data from user space */
uptr = (struct insane_userinfo *)
ifr->ifr_data;
err = copy_from_user(&info, uptr,
sizeof(info));
if (err) return err;
/* ... use the info ... */
return 0;
Packet Transmission
The most important entry point for a network-interface driver is hard_start_xmit, where hard is short for hardware. This device method gets called whenever a network packet gets routed through the interface.
Unlike the methods de-scribed above (and like the ones not discussed here), this one is not directly related to any system call or application; rather, the network subsystem of the Linux kernel uses it according to its own policies.
Where virtual interfaces are concerned, no actual hardware transmission takes place in the interface itself. The interface will instead resort to another network interface to perform transmission. Packet passing is implemented in two steps. First (usually at configuration time, within ioctl), the interface must connect to another interface, the one that can transmit packets. Next, its own hard_start_xmit must take proper action to pass the packet.
/* look for the hardware interface */
slave = __dev_get_by_name(info.name);
if (!slave) return -ENODEV;
priv->priv_device = slave;
/* .... */
/* update your statistic counters */
priv->priv_stats.tx_packets ;
priv->priv_stats.tx_bytes = skb->len;
/* give the packet to the hw interface */
skb->dev = priv->priv_device;
/* tell Linux to enqueue it */
dev_queue_xmit (skb);
In a perfect world, the virtual interface should also register a notifier callback, so L
inux will tell the driver when the physical hardware interface goes away. If the slave interface is a module, its removal will make insane unhappy. The insane implementation doesn't register any callback; making it saner is left as an exercise for the reader.
Packet Reception
When network packets hit an interface board, they generate an interrupt so that the operating system can handle packet arrival. (The only exception is the loopback interface, whose reception mechanism is part of packet transmission).
A virtual interface, however, has no way to receive interrupts, and thus it cannot receive any network packets. This may seem unfortunate, because it would be nice to attach the same software operations to both directions of data flow.
Unfortunately, this is just not possible, and whoever needs to intercept incoming packets must use other ways to hook into the packets' path.
Using Insane
All of this talking is rather pointless unless we can see the virtual interface at work.
The insane interface relies on an Ethernet interface for physical transmission, and it can be configured to operate in one of three insane modes. It can relay every packet (pass mode), relay only some percent of packets (percent mode, with an integer parameter), or turn relaying on and off on a repeated timely basis (time mode, with two parameters, on-time and off-time, specified as jiffy counts -- architecture-dependent time quanta that correspond to 10 ms each for the PC platform). See Listing Three for an example.
Listing Three: The Three Modes of insane-ity
# insanely eth0 pass ; # relay everything to eth0
# insanely eth0 percent 80 ; # drop 20% (pseudo random)
# insanely eth0 time 50 100 ; # relay for .5 seconds, drop for 1s
In order to connect insane to the network, you need to assign a local IP address to the interface (that IP address will be used as the source address, and remote hosts will use it to send their replies) and route some packets through it.
Current versions of Linux automatically associate a network route to each device, and this routing cannot be removed. Therefore, you can't reroute the entire LAN through insane at once. Listing Fourreroutes a single host, called morgana, in the routing table of the host borea.
Listing Four: Rerouting Morgana
borea# insmod insane ; # load module
borea# ifconfig insane borea ; # give same IP as eth0
borea# route add morgana dev insane ; # re-route this host
borea# ./insanely eth0 percent 60 ; # set dropping rate
Unfortunately, because of a glitch in Linux v2.3.41, you'll also need to disable the packet filters on the Ethernet interface used by insane. The following command worked for me:
echo 0 > /proc/sys/net/ipv4/conf/eth0/ rp_filter
With this setup, you can connect to morgana with any protocol you like and experience a 40 percent packet loss. This loss is only on transmitted packets, unless morgana runs another instance of insane with a similar configuration.
An interesting effect of this transmission path through two interfaces is that you can run tcpdump on both eth0 and insane and see different results. While tcpdump --i eth0 shows the packets being transmitted, tcpdump -I insane displays every packet sent out by the protocol layers before any dropping is applied. Figure One explains this behavior by showing the path taken by a packet being transmitted through insane.
Hopefully, with this brief discussion and my insane example, you now have a basic grasp of virtual network interfaces.
--------------------------------------------------------------------------------
Alessandro Rubini is an independent consultant based in Italy. He writes uninteresting device drivers and uninteresting applications like GNU barcode, which gained him his preferred e-mail address: rubini@gnu.org. Alan Cox is on sabbatical this month.
视频教程列表
文章教程搜索
C语言程序设计推荐教程
C语言程序设计热门教程
|