The problem we have is not how to get stronger crypto in place, it's how
to get more crypto in place.
— Ian Grigg, 28 August 2016.
... and to raise the level of security of the rest of the system so that
attackers are actually forced to target the crypto rather than just strolling
around it.
— Peter Gutmann, in corollary.
The device may be operating under severe power constraints. There are IoT devices that need to run for several years on a single battery pack. If you're lucky, it's a bundle of 18650s. If you're less lucky, it's a CR2032.
Renegotiating protocol state on every wake event is incompatible with low power consumption. The crypto should be able to resume from a pause an arbitrary amount of time later.
Even if the device is constantly powered, many components will be powered only when needed, and will lose state when powered off (they work by warm-starting very quickly rather than saving state across restarts).
Some IoT chips can't cost more than a few cents each. If you're lucky, they're allowed to cost tens of cents. Any fancy crypto hardware will break the budget.
When crypto hardware support is available, it's universally AES, occasionally SHA-1 and/or DES, and very rarely RSA and/or DH and/or ECDSA (there are also oddballs like ones that do SHA-1 but not AES, but they're pretty special cases, and AES in software is very efficient in any case). Any crypto had therefore better be based mostly, or exclusively, around AES. As a convenient side-effect of this, you won't have to worry about which flavour of PKC will be in fashion in ten years' time, or what keysize they're wearing in Paris that year.
Even if the device includes crypto hardware, the HAL or vendor-supplied firmware may not make it available. In addition the crypto engine will be in a separate IP core that's effectively an external peripheral, and accessing it is so painful that it's quicker to do it in software (you never get AES instructions, you get something that you talk to via PIO). As a result you have to run your crypto in software while the crypto hardware sits idle.
Devices have a design lifetime of ten to twenty years, possibly more. There is hardware deployed today that was designed when the people now maintaining it were in kindergarten.
Firmware is never updated, and frequently can never be updated. This is typically because it's not writeable, or there's no room (the code already occupies 120% of available storage, brought down to 100% by replacing the certificate-handling code with a memcpy() for encoding and a seek + read of { n, e } for decoding, see below), leaving 0% available for firmware updates. Alternatively, there's no connectivity to anything to provide updates, either of firmware or anything else (for example in one globally-deployed system the CRL arrives once every 6-12 months via sneakernet, although I'm not sure why they use CRLs since if there's a problem they disable the device centrally, and it's not even clear what certificate would be revoked via this sneakernet CRL). Or the device, once approved and operational, can't ever be changed. Like children, make one mistake here and you have to live with it for the next 15-20 years.
Even if the hardware and/or firmware could be updated, the rest of the infrastructure often can't. Some firmware needs to be built with a guaranteed correspondence between the source code and the binary. This means not only using approved compilers from the late 1990s that cleanly translate the code without using any tricks or fancy optimisations, but also scouring eBay for the appropriate late-1990s hardware because it's not guaranteed that the compiler running on current CPUs will produce the same result.
This is not excessive paranoia, see my talk "Software Security in the Presence of Faults" for why you should never built critical code that includes fault handling with a compiler like gcc which will rewrite the code to undo the fault handling and in many cases remove it entirely in the generated binary.
Don't bother asking "have you thought about using $shiny_new_thing from $vendor" (or its closely-related overgeneralisation "Moore's Law means that real soon now infinite CPU/memory/crypto will be available to anyone for free"). They're already aware of $shiny_new_thing, $shiny_other_thing, and $shiny_thing_you_havent_even_heard_of_yet, but aren't about to redo their entire hardware design, software toolchain, BSP, system firmware, certification, licensing, and product roadmap for any of them, no matter how shiny they are.
The device may have no or only inadequate entropy sources. Alternatively, if there is an entropy source, it may lose state when it's powered off (see the earlier comment on power management), requiring it to perform a time-consuming entropy collection step before it can be used. Since this can trigger the watchdog (see the comment further down), it'll end up not being used. Any crypto protocol should therefore allow the entropy used in it to be injected by both parties like TLS' client and server random values, because one party may not have any entropy to inject. In addition, it's best to prefer algorithms that aren't dependent on high-quality randomness (ECDSA is a prime example of something that fails catastrophically when there are problems with randomness).
Embedded devices are often accessed through gateways that redirect communications over interfaces like RS-422, RS-485, Fieldbus, Modbus, Arcnet (yes, that Arcnet), and many, many others. If you're lucky, it's via a VPN to an IP-to-whatever gateway. If you're unlucky, it's via something like a Moxa serial-to-IP converter, which puts your RTU directly onto the Internet for easy access by anyone. The one upside of this is that Internet-based side- channel attacks are more or less eliminated due to the protocol conversion involved.
Many SoCs have different portions developed by different vendors, and the only way to communicate between them is via predefined APIs. If you need entropy for your crypto and the entropy source is on a separate piece of IP that doesn't provide an entropy-extraction interface, you either need to spend twelve months negotiating access to the source, and pay handsomely for the privilege, or do without.
(The following is a special case that only applies to very constrained devices: As a variant of the above, there may be no accessible writeable non- volatile memory on your section of the device. Storing a seed for crypto keys may work when you bake it into the firmware, but you can't update it once the firmware is installed because there's no access to writeable nonvolatile memory, unless you negotiate it with one of the vendors whose IP has access to it).
Fuses are expensive and per-device provisioning is prohibitively expensive for low-cost IoT chips.
The device often won't have any on-board time source because it's not feasible to include an RTC in the design. An RTC adds considerable cost (possibly as much as the rest of the device), may be larger/heavier than the rest of the device, typically requires one or more extra assembly steps to fit because they can't be installed via pick-and-place and reflow soldering, make the device more vulnerable to issues like high and low temperatures that embedded devices are typically exposed to, and wear out (the batteries die) long before the rest of the device does. Don't rely on any security mechanisms that require accurate time data in order to function correctly.
(If you're working with a device that really does need a time source, don't implement yet another protocol like (S)NTP to further add to what's already running on the device. There are time sources everywhere, for example the universal-substrate HTTP has the time in the "Date:" line of the HTTP header. If you're sending encrypted messages, use the CMS signingTime attribute or PGP equivalent. If you're doing anything with certificates and actually have the resources available to parse them (see below) you know the time is at least as recent as the most recent validFrom in any certificate you see).
As mentioned previously, certificates are handled by memcpy()ing a pre-encoded certificate blob to the output and seeking to the appropriate location in an incoming certificate and extracting { n, e } (note that that's { n, e }, not { p, q, g, y } or { p, a, b, G, n, ... }). If you've ever wondered why you can feed a device an expired, digital-signature-only certificate and use it for encryption, this is why (but see also the point on error handling below). This is precisely the behaviour you get when you take a hardware spec targeted at Cortex M0s, Coldfire's, AVRs, and MSP430s, and write a security spec that requires the use of a PKI.
(SCADA and similar embedded systems emphasise availability above everything else, see my talk "Availability and Security: Choose any One" for more on this. Certificates that expire at some random point and take a PLC offline are an infrastructure attack, not a security feature, so expiry dates are ignored (there's no reason why a certificate would be perfectly secure at 23:59:59 and then suddenly completely insecure at 00:00:01). Certificates that get revoked would have the same effect (not to mention that it's unclear why you'd revoke a certificate when access control is handled by disabling the device, not by putting someone's certificate on a CRL and hoping that something somewhere notices), so revocations are ignored. Devices very rarely have FQDNs or even fixed IP addresses, so there's no meaningful static identifier to put in a certificate. As a result, systems ignore the ID information, validity dates, and revocation information for a certificate, which is essentially everything in the certificate except the key, thus the behaviour in the previous paragraph is perfectly justified and saves enormous amounts of certificate-parsing code and attack surface. Essentially the certificate processing code boils down to searching the certificate blob for the hex string 30 0D 06 09 2A 86 48 86 F7 0D 01 01 01 05 00, skipping the 03 8x xx 30 8x xx that follows, and reading the two integers with 02 tags that follow that).
The whole device may be implemented as a single event-driven loop, with no processes, threads, or different address spaces. In addition there are hard real-time constraints on processing. You can't go off and do ECC or RSA or DH and halt all other processing while you do so because the system watchdog will hard-reset the CPU if you spend too long on something. While it is possible, with some effort, to write a manually-timesliced modmult implementation, the result is horribly inefficient and a goldmine of timing channels. It's also painful to implement, and a specific implementation is completely tied to a particular CPU architecture and clock speed.
MSP 430s. Apologies to all the embedded devs who have just gone into anaphylactic shock at the mention of that name. There are billions of these ghastly things in active use, if you're not familiar with them, a very recent one (August 2016) is documented here.
Hardware-wise, a Raspberry Pi is a desktop PC, not an embedded device. So is an Alix APU, a BeagleBoard, a SheevaPlug, a CI20, a CubieBoard, a C.H.I.P, and any number of similar things that people like to cite as examples of IoT devices. Although there's no easy definition of this, as a rule of thumb if you can self-host the development toolchain on the device you're working with then it's in the desktop-PC class.
In general terms, errors are divided into two classes, recoverable and nonrecoverable (this really is generalising a lot in order to avoid writing a small essay). Recoverable errors are typically handled by trying to find a way to continue, possibly in slightly degraded form. Non-recoverable errors are typically handled by forcing a hard-fault, which restarts the system in a known-good state. For example one system that uses the event-loop model has, sprinkled throughout the code, "if ( errType == FATAL ) while ( 55 );" (the 55 has no special significance, it's just a way to force an endless loop, which causes the watchdog to reset the system). An expired certificate or incorrect key usage is a soft error for which the appropriate handling action is to continue, so there's no point in even checking for it (see the previous point on the lack of checking for this sort of thing).
(The equivalent in standard PCs - which includes tablets, phones, and other devices - is to dump responsibility on the user, popping up a dialog that they have to click past in order to continue, but at least now it's the user's fault and not the developers'. Embedded systems developers don't have the luxury of doing this but have to explicitly manage these types of error conditions themselves. So when a protocol spec says SHOULD NOT or MUST NOT then for standard PCs it means "throw up a warning/error dialog and blame the user if they continue" and for embedded devices it means "continue if possible". Have you ever seen a security spec of any kind that tells you what step to take next when a problem occurs?).
Development will be done by embedded systems engineers who are good at making things work in tough environments but aren't crypto experts. In addition, portions of the device won't always work as they should. Any crypto used had better be able to take a huge amount of of abuse without failing. AES-CBC, even in the worst-case scenario of a constant, all-zero IV, at worst degrades to AES-ECB. AES-GCM (and related modes like AES-CTR), on the other hand, fail catastrophically for both confidentiality and integrity protection. And you don't even want to think about all the ways ECDSA can fail (see, for example, the issues with entropy, and timing issues, above).