Owen Shepherd

Physicist. Software Developer. Many things. (CV)

I develop the Public Domain C Library and Impeller (a client for pump.io) among other things, as well as contributing to projects such as Ogre3d.org.

Replied to a post on werd.io:

@benwerd Nowhere, I expect. They discontinued production and discounted them a few months before N5 release, so I imagine all you'll be able to find are second hand ones.

Replied to a post on werd.io:

DPD in the UK do that. One hour delivery window, live tracking of the van your package is on, etc.

Hashcash, Pingbacks, Webmention, Proof of Work and the #indieweb

Spam, spam, ubiquitous spam. Who likes it? Nobody. Everyone is aware of the issues of E-Mail spam; anybody who has a blog is aware of the problem of pingback spam

Pingback is an old technology, based upon XML-RPC (which should date it somewhat!) . Designed in simpler times, saying that the percentage of pingbacks which are spam today is 100% is accurate to a large number of significant figures.

About a year ago, the indieweb community looked at Pingback and asked the question "Why is this based on XML-RPC?" They had a point; Pingback is a single function, with two URL parameters, which hardly needs a complex RPC system, and so they developed WebMention, which takes Pingback and distills it to a simple HTTP POST of your standard URL-encoded-form-data.

While WebMention solves the problem of complexity, it doesn't solve the problem of Spam (in a long term manner at least - at present, the small deployment means it isn't a spam target). However, today I provide a method for combating spam in WebMention: hashcash

Hashcash?

Hashcash is a mechanism originally proposed for combating spam in E-Mail. The idea is relatively simple: the sender must provide a value which, when combined with certain values from the E-Mail, evaluates to a hash where the last $$n$$ digits are zero. Its' called a "proof of work" system because that's what it provides: proof that the sender has done work. Specifically, on average it will take $$2^{n-1}$$ attempts to find a hash in which the last $$n$$ bits are zero.

This is useful: It lets us tune the work over time. Its' asymmetric: it puts all the effort on the sender; rate limiting them by their computational power (because verifying that the hash ends with the correct number of bits only takes one hash calculation). Every time we increase $$n$$ by one, we double the computational power that the spammer needs to send out the same quantity of spam

Hashcash isn't the perfect protocol, of course; it doesn't (and nothing will) completely eradicate spam. It does, however, provide a mechanism to reduce it; placing less load on more intensive spam filtering systems (such as the traditional Bayesian filters)

Hashcash for WebMention

The proposed mechanism of implementing Hashcash for WebMention is as follows. For the sender side:

1. The sender builds the WebMention request body, in URL encoded format.
2. The sender computes the SHA-256 hash of the request body
3. If the hash contains the required number of trailing zero bits, then the mention is ready. Otherwise, a "nonce" value is added to the request and the hash is retried.
4. The client then submits the request to the destination's WebMention endpoint
5. If the hash meets the recipients' criteria (or the recipient implements WebMention 0.1/0.2, and therefore does not perform hashcash processing), then the recipient should return a success result code. Otherwise, the recipient should return a HTTP 403 error, with properties error=bad_hashcash&required_length=N, where N is the number of trailing zero bits required
6. If the recipient requires more zero bits, then the sender may choose to retry with more zero bits (but should verify that the number is reasonable - a required length of 32 is unlikely to complete in reasonable time and could comprise a DOS attempt)

For the receiver side, the process is much simpler:

1. Take the SHA-256 hash of the request body
2. If the hash does not meet the required number of trailing zero bits, return a HTTP 403 error with properties error=bad_hashcash&required_length=N. The endpoint must, at a minimum, support returning errors in application/x-www-url-form-encoded format.
3. If the hash does meet the required number of trailing zero bits, continue with the rest of the WebMention validation.

Considerations

The hash length should be chosen to be reasonable; a good starting point might be 22 bits. The maximum permitted length (for a sender) should also be chosen to be reasonable; 28 bits seems like a reasonable limitation for the time being. 20 bits is a cost of 2MHash on average; recent CPUs are capable of 20Mhash/s. Information about the present cost of a hash computation can be approximated by looking at the statistics for BitCoin; BitCoin uses SHA-256 so is comparable

Of course, implementers may wish to be more lenient as a HashCash-type system is phased in

@benwerd I noticed that on your idno install you've set the homepage to be your profile page. How did you do that?

Replied to a post on werd.io:

@benwerd It's certainly a very useful desire from the perspective of propagating religion. Which way the association originated is an interesting question.

Replied to a post on twitter.com:

@stevestreeting The lack of hubs and KVMs (and where they exist, the expense) are painful too.

We have met the enemy – and he is us

The conventional assumption in computer security was that our primary adversaries were criminals, miscreants, and the security services of our “political foe”. Attacks were liable to be active and involve the exploitation of vulnerabilities in our systems, because such foe were unlikely to be able to access . On the basis of these assumptions and allegations, there were comments from people such as the US government proposing bans upon the use of communications equipment by Chinese companies Huawei and ZTE in the US.

In the light of the PRISM and TEMPORA revelations, the hypocrisy was deafening. In the light of the latest revelations, its’ hard to find a source of humor at all.

We have met the enemy, and he is us; our governments have invaded our communications infrastructure, have access to the data behind our services, and have installed backdoors in our software.

If we are to preserve privacy on the web, the need for change has never been greater.

I think it can be said that, at an infrastructure level, our immediate priorities should be:

• Deploying TLS v1.2 with Perfect Forward Secrecy
Older versions of TLS have less effective PFS cipher-suites, and often require undesirable tradeoffs. When dealing with TLS 1.0, we are often required to judge between the weaknesses of RC4 (no longer considered secure when used as in TLS/SSL) and potential vulnerability to the BEAST attack. It’s safe to say that TLS 1.0, as deployed, is fundamentally broken; while the protocol itself is not completely so, the cipher-suites deployed in the wild all have issues. Deploying a patch to TLS 1.0 would be as difficult as updating to TLS 1.2; an action we should be doing anyway
• Deploying DNSSEC, DANE and Certificate Pinning
Given the known extensive partnerships of the secret services, I think it is fair to say that the CA model has outlived its’ usefulness. Browsers ship hundreds of security certificates; it is sheer naivety to assume that none of them have been compromised. I don’t see this as the complete end for CAs; they can continue to provide utility in the form of extended validation services, but their numbers will be reduced and, importantly, it will no longer be able to trust every CA to ensure the trust in a domain.DNSSEC itself poses issues; it has a single root of trust (the IANA), and it is a requirement that we trust our domain registrar as part of the chain of trust to ourselves. However, it vastly reduces the number of moving parts and trusted authorities, and makes validating that trust significantly easier. Attacks against DNSSEC need to be narrowly targetted to be effective; comparison of DNSSEC signed zones across multiple machines provides a simple method of watching for suspicious behavior.
The other result of DNSSEC is that it makes deploying encrypted services easier. Anything we can do to increase the proportion of encrypted traffic on the internet can only be a good thing.
• Development of TLS v2?
The existing TLS specification is a gradual evolution of SSL. It is therefore old, battle tested and, as a protocol, a well known quantity. As a side effect, it is also complex; it contains many misfeatures, and our evolving understanding of cryptography points out many parts of TLS which are at the very least suboptimal, or often highly problematic. Mitigating its’ many design flaws has resulted in huge increases in the complexity of the codebases implementing TLS; large portions now need to run in constant time to avoid timing-based side channel attacks. NSS and OpenSSL are monstrous, often convoluted code bases. Validating them is troublesome; performing constant time cryptography on modern processors is increasingly difficult, identifying potential ways in which this can become non-constant-time is nigh impossible. Given all we know today, I think it’s time to take a step back from TLS, take a hard look, and for every known issue, misfeature or design problem, fix it. TLS 2.0 need not have more in common with TLS 1.2 than an the initial hello packets; it should be built on modern, known good primitives; PCKS-#1 v1.5 padding wholly unsupported, replaced by OAEP; authenticated encryption used where possible, encrypt-then-MAC where not.The chipersuite list should be slimmed down; current “good practice” ciphers like RSA and AES and authenticated encryption modes like AES-GCM can stay, but state of the art ciphersystems like Salsa/20 and Curve 25519 should also be included and recommended. While it is likely that the NSA has cryptanalysis knowledge we do not, we do not know of any near-realistic attacks in the current good practice ciphers, and the nature of cryptanalysis says that their capabilities are unlikely to bring them close to breaking either. Even cryptanalytic attacks considered “groundbreaking” rarely result in practically exploitable flaws.

Instead, we should be changing because the NIST-sanctioned cipher-suites are not designed with us in mind. Constant time implementations of AES in software are notoriously difficult, and this goes double for systems like AES-GCM. SHA-3 contains many primitives which are efficient to implement in hardware, but difficult in software; the NIST elliptic curves use constants which result in inefficient software implementations (neither mind not providing rationale for the choice of said constants).

Cryptographic algorithms designed with software in mind reduce the cost of implementing it, increasing security, while also reducing the number of corner cases and attacks which can result in information leakage on common hardware. They additionally reduce the performance delta against a well equipped adversary (such as a government security agency).

The prime objective of a TLS revision should be that a TLS2 implementation be relatively concise and easy to validate, without the inscrutable complexity of TLS1. The TLS2 paths in OpenSSL and NSS should not be as convoluted and twisty as those for TLS1, and the preferred algorithms should be those which do not require large tables or similar constructions liable to suffer from side channel attacks.

A good rule of thumb is that the maths is hard to subvert; the code is easy. To that end, we should push towards simpler protocols which are easier to analyze. We should also push towards better defaults from the libraries we use; it is ridiculous that OpenSSL, for example, doesn’t come secure by default.

Looking to the future – Buffers and Textures in Ogre 2.0

Being as they have an integrated resource system, the 1.x versions of Ogre have had a somewhat opaque buffer system. Anyone who has worked closely with the system, doing dynamic uploads for example, will have discovered that uploads are done by creating the object and then later uploading the actual data into it.

For its’ era, this was a perfectly adequate design. When it was conceived, this method largely reflected the way that graphics APIs worked – or at the very least permitted you to work. In OpenGL, for example, you create the buffer object then upload data into it using the glBufferData command.

Everything changed with Direct3D 10

The old ways – A look at Direct3D 9

Direct3D9 divided buffer types using two properties – the “usage” which described how the application intended to use the buffer, and the “pool” which described how the memory was to be allocated. For our purposes, only one usage flag matters:

“D3DUSAGE_DYNAMIC – Set to indicate that the vertex buffer requires dynamic memory use. This is useful for drivers because it enables them to decide where to place the buffer. In general, static vertex buffers are placed in video memory and dynamic vertex buffers are placed in AGP memory. Note that there is no separate static use. If you do not specify D3DUSAGE_DYNAMIC, the vertex buffer is made static. D3DUSAGE_DYNAMIC is strictly enforced through the D3DLOCK_DISCARD and D3DLOCK_NOOVERWRITE locking flags. As a result, D3DLOCK_DISCARD and D3DLOCK_NOOVERWRITE are valid only on vertex buffers created with D3DUSAGE_DYNAMIC- MSDN

D3DUSAGE_DYNAMIC provides the driver with a hint that the application will be modifying the buffer frequently, and omitting it informs it that the application will be updating it infrequently, if at all.

This is a useful abstraction, but we can do better: if the buffer is entirely off limits to the GPU, then the driver can make further optimizations with the aim of increasing performance.

The changes of Direct3D11

Direct3D11 takes the existing “dynamic/static” disjunction, renaming “static” to “default”, and adds two new modes, bringing us to a total of four:

• D3D11_USAGE_IMMUTABLE – Immutable buffers must be initialised with data when they are uploaded, and from then on are read-only for both the CPU and GPU
• D3D11_USAGE_DEFAULT – Essentially Direct3D 9′s default mode. This is optimized for infrequent updates
• D3D11_USAGE_DYNAMIC – Essentially Direct3D 9′s D3DUSAGE_DYNAMIC. Optimized for frequent updates and for use as a GPU render target
• D3D11_USAGE_STAGING – A specialized buffer usage that cannot be directly read or written by the GPU. D3D11_USAGE_STAGING is used to create DMA capable buffers for quickly copying data from the GPU.

The most important of these is D3D11_USAGE_IMMUTABLE. While Ogre does sometimes copy back data from the GPU, staging buffers are a much smaller performance win than immutable buffers, and have a higher logistical complexity for the engine.

The changes in Ogre 2.0

The most obvious change is the addition of the enumerants HBU_IMMUTABLE and TU_IMMUTABLE for hardware buffers and textures respectively. These signal to Ogre, and therefore to the driver (where possible) that the contents of the buffer will never change.

Deeper changes have been required to the Texture and HardwareBuffer objects.

Textures

The Texture API has changed significantly in Ogre 2.0 as the resource system has been changed. In older versions, the Texture object tracked the “source”, “desired” and “internal” versions of various properties – from width and height to pixel format. As Ogre 2.0 is separating the “resident” representation of the resources from their sources, all but the internal versions of the properties have been removed.

Instead, the loader sets the properties to those it desires for the texture object before calling the new _determineInternalFormat function. This function will then use its’ knowledge of the render system’s capabilities in order to determine the actual format to be used.

Once the internal format has been determined, the loader may use the _uploadBuffers function in order to actually fill the texture with data. For all the existing texture types, this is optional; the loader may later get the buffer by using the getBuffer method and then use the blit method of the returned buffer object to do the upload, as was the standard method in previous versions of Ogre. However, when using an immutable texture, the _uploadBuffers method must be used, in order to accommodate the contract associated with immutable textures.

Other Buffers

Other buffer types are managed separately through the HardwareBufferManager. For each of these types, an additional pointer is being added to their constructor function in order to enable immediate upload.

Ideas in Unexpected Places

In the Python community, whenever discussing web development, there is always the elephant in the room: Zope. The granddaddy of the web frameworks. Often thought of as something to be avoided at all costs, Zope is little discussed.

Which is interesting, considering that one of the more discussed (perhaps not the most; that award is probably taken by Django, and second place most likely by Flask) modern Python web frameworks, Pyramid, can trace its lineage directly back to Zope (and, indeed, the number of zope.* packages it uses is nonzero).

It’s pretty much accepted in many circles at this point that Zope is essentially “a bad word”; a failure; a complex mess, with no remaining lessons to give; essentially, with no remaining redeeming value making it worthy of consideration.

This is a great shame; Zope has a lot to teach

Component Architectures

Component oriented development is, in some ways, a “holy grail” of software development. It’s something that Microsoft has quite wholeheartedly embraced, inventing system after system trying to evolve one after the other into their “one architecture to rule them all.” Microsoft first unintentionally stumbled upon this with Visual Basic’s VBX extensions, which formed the inspiration for COM and OLE. They’ve since attempted to supplant these to some extent with .net; though these days they seem to be turning away from component based development, at least on the desktop, with the WinRT API.

Component oriented development is also found at the heart of Sun’s ill fated Java Beans, and somewhat less ill fated Enterprise Java Beans. It also turns up in somewhat surprising places; Android, perhaps uniquely among mobile platforms, has taken some ideas from component based development systems and run with them (which will be the topic of a future post).

The main lesson of component oriented development, at least in my experience, is that its utility is directly correspondent to how core, how fundamental, how integrated it is into your platform. COM is a major integration point on Windows; it is how you integrate into numerous system services and it, likewise, is integrated into them. Perhaps COM’s biggest feature is that it completely understands the Windows networking model.

People who have experience with Zope may see where I’m going here: Zope* is component based. At every level Zope builds upon some fundamentals laid out in two packages: zope.interface and zope.component.

zope.interface isn’t really anything new; zope.component has the real innovation:

Adapters are quite common in real life, yet they’re not something that has ever really been formalised in programming. They’re a simple concept: Given object A, I need a Y. The documentation illustrates it with a simple power socket example, but the capabilities lie much beyond that.

Zope uses adapters everywhere; from view lookup, to various request processing hooks, to aspects of security and “annotation” (the ability to store ‘extra’ bits of metadata attached to objects). They’re pervasive, and one of the main building blocks of the toolkit. They provide a very simple, very powerful extension system.

Adapters come in two** forms: Single adapters and multi-adapters. Single adapters are obvious: they take an object and return another which “adapts” it; multi adapters are really just an extension of that to take multiple objects. View lookup, for example, is done by multi adapter from a request and a content/model object.

Many would say Zope overuses adapters, and I wouldn’t disagree. In running away from the monolithic design that was Zope 2, they perhaps went too far in the opposite direction; and sometimes tracing some code can involve far too much jumping around from extension point to extension point.. Pyramid, in many ways, could be seen to be “Zope done right”; it will be interesting to see what long term success it attains.

But the point remains: Adapters are very useful, and a tool every developer should have in their toolbox.

* The Zope toolkit and everything based upon it

** Ignoring subscription adapters for the moment; subscription adapters are mostly (but not entirely) related to Zope’s event delivery sytsem.

Forgotten Corners

Some things are created with the best of intentions.

C’s wide character support, for example. It was introduced as an amendment to the ISO C90 standard, and intended to add support for dealing with multilingual text to the C standard. It added a new character type – “wide characters” to the C standard, provided by the type wchar_t, which was intended to provide one character per character – as opposed to the legacy “char” which simply cannot provide this.

Some systems ran with this. For example, when the Windows NT team were designing the Win32 API, they standardized on Unicode everywhere. All the system functions which dealt with strings took and returned Unicode strings. They acknowledged that nobody would port their legacy applications if they didn’t provide any backwards compatibility, so they also provided non-Unicode versions of all these functions, but internally they just called through to the Unicode versions.

Most others, however, took different paths with it. Unix, for example, moved to the newly introduced UTF-8 encoding, avoiding the need for a new set of APIs, though temporarily making working with the old legacy encoding in addition more difficult. (Sites today can still be found which run Unix machines with non-UTF encodings)

Mac OS Classic introduced Unicode support at the UI layer in order to support multilingual text. Mac OS X extended this with Foundation’s NSString (inherited from NeXTStep) and CFStringRef being “encoding agnostic”; exposing the internal storage only if the application asks for the encoding internally used (and otherwise translating as needed on access). The underlying system, however, inherited the Unix fondness for UTF-8.

Two issues affect people trying to use these features:

• The size of wchar_t differing between common platforms. Win32′s API lockin means that now wchar_t is forever stuck at 16-bit.
• Conformance issues. Most Unixes migrated to UCS-4 when Unicode 2.0 added the supplemental planes in order to conform to the C standard’s requirement that any character be fit a single wchar_t. Because it is confined by history, Windows applications will never be able to safely use the standard wide character functions

What this has meant is that most people needing Unicode support have rolled their own solutions. The older of these went for UTF-16, because at the time Unicode would be a 16-bit character set, and so stuck with that for compatibility. The newer of these have tended towards UTF-8, on the basis that once you’re dealing with a variable width encoding you may as well use the one which is more compact for most texts anyway.

In spite of the best of intentions, the wide character routines remain unused. Outside of Windows, the narrow I/O routines will satisfy the vast majority of users. On Windows, the wide I/O routines in both Microsoft’s and MinGW’s C runtime libraries are broken (when working with the terminal, anyway) to such an extent that they’re useless anyhow.

Today, if you need to actually do string processing, and it must be done in C (or C++), you’re far better off turning to an external library to do the work

Context: I’m in the process of implementing said routines