correlation-zone

The AI post

What is AI (for me)

Any technique where mechanised statistical inference is performed based on training data. Expert systems are AI, but not here and not now.

Further, generative AI (systems that produce output of the same form as the training set) are distinct from transformational AI (classifiers, translators, etc).

I’m talking about generative AI.

Is AI good

“AI” is vague. Copilot, as of 10th of June 2025, is virtually useless if the automated completion cannot be filled in with copy-paste from the surrounding text. Copilot could not generate remotely valid Verilog or Bluespec.

I’ve not tried agentic AI coding models.

Qwen 3 235B is a OK search model, that’s good (within Kagi) of finding “follow on” searches and then giving me the results I would have got after several rounds of refining. 4o mini is not good for this, as the results it produces are not reflective.

Photoshop’s generative fill is slightly better, but much much slower than the prior content aware fill and I don’t use it for that reason.

Mistral Medium was able to parrot the Stanford encyclopedia of philosophy fairly well, but was no better than reading the source (and prone to waffle).

I find it hard to trust AI output, as I value the social relation of someone writing something and building an argument and narrative for themselves for me to understand. The AI model lacks this, as I do not care about a hard drive’s interiority, and so I lack commitment to the output.

is AI bad

Inference for personal use is basically free. Inference for the booster case, where billions of AI agents bounce off each other in an endless fountain of buggy oAuth implementations, false CVE reports on those impls, misjudged reviews, and fabricated newspaper articles, can and most likely will consume every marginal gigawatt of spare energy generating capacity.

This power can and should be used for building durable resources that humans need – nuclear, solar and wind power, rail transport, housing, food and water supply chains. All of which are sensitive to the cost of power, and to the allocation of human creativity.

This is not to say inference should be banned, but that should not be permitted to compete with the resources humans need to survive.

Power for training appears to have peaked. It is not possible to produce meaningfully more GPUs; there is not enough training data to support more training runtime; inference-time scaling was a mirage that fails to meaningfully enhance a model’s ability to construct coherent reasoning.

It’s rude as hell that AI providers want to mediate interpersonal relationships. From Facebook’s AI friends, to OpenAI’s redrafting of my blog with the most minimal of attributions under a read more, to the apparent collapse of the prevailing test-based educational system, and even the deluge of slop graphics used in commercial slides (thankfully well on it’s way to being shunned), generative AI is intended to be deployed as a filter to further atomise people under the eyes of the administrative state.

Atomise or vaporise, with Israel’s use of automated data processing as part of its justification production pipeline for the deaths of 55 thousand Gazan’s (and counting). It does appear that being able to relate every council sanitation workers connection to the apparatus of the Gaza state has helped Israel maintain and freely choose the rate and targeting of airstrikes, a factor that previously lead to a reduction in bombing intensity.

are LLM's worth it.

I think the world would have been better without generative AI, but I don’t see a case to ban the use or training of models. I do see a case to regulate for open social systems and a requirement for administrative actions to be either personally justified or applied through processes defined by human produced text. Equally, I demand a means by which you can opt-out of relating information to a LLM company. As many now operate evasive scrapers, using tools like Anubis is a tax on humans to pay for LLM search engines need to “leach off the face of humanity”.

This rant produced without the aid of LLMs (but with spellcheck).

Dear Pippa Heylings,

I am a constituent, a woman, and of a trans background. Last week's supreme court ruling has greatly upset me.

I fear that my access to public spaces will be curtailed through “bathroom bills” introduced undemocratically by the equalities commission. The irony of a equalities commission (re)-introducing the urinary leash for women in the UK is not lost on me, but nevertheless I am deeply fearful.

I fear the call for third spaces will both be deeply insufficient and would publicly out me as a trans women. I am a binary woman and demand to be treated as such.

I fear that my employer will be forced to misgender and “out” me as a trans women in public settings. I am a senior engineer at Nu Quantum, a Cambridge based startup deeply involved the UK quantum computing ecosystem. I have a public presence through this role. For example, I will be representing Nu Quantum on an international conference panel this Thursday, at QCTiP Berlin. I fear that through summary statistics recording me as “male” and other means, I will be forced to reveal my trans identity.

I fear that I will be either “out”-ed or simply forced to withdraw from sporting events. RideLondon, a UCI sanctioned event, currently forces me to register as a man – despite the very limited evidence of a performance gap. As I am not, I cannot attend. I fear this practice will become the default for all organising bodies.

And I fear that in the event of a serious illness, I will be forced to pick between access to healthcare and respect for my lived identity. I deeply fear the end of my life playing out at home, refraining from seeking care due to the proposed discriminatory and “out”-ing requirements proposed in the national papers recently.

Please, in whatever way you find possible, can you defend all of our rights to respect in our lived genders?

Thank you for your time, Coral Westoby

Final rings

Shimano until recently has kept a strict distinction between the Road and MTB groupsets. For commuting or loaded riding with drop bars, the groups offered on the (consumer grade!) RC500 is Sora, with a compact road double (50/34).

Shimano offers a tiagra-series 48t outer ring in 4-bolt 110mm asymmetric BCD. Additionally, Specialties TA or AliExpress sellers offer a 33t inner ring. 48/33t is a nice step down, when paired with a 34 casette.

This combination does not offer enough clearance between the rings. In the small ring, the chain catches on the guides the outer ring uses to accelerate shifting. I added 0.5mm washers, of size 10mm ID – 16mm OD on the barrel nuts of the outer ring.

Spacers

This spaces out the rings enough for the 10 speed chain to clear. I believe a 9 speed chain would still interfere. Shifting is fast and easy, and the extra low range in the big ring is really useful.

Chain gap

In honour of a year since doing the work, here's a post about the solar range extender I designed for the EMFCamp 2024 Tildagon badge.

This was a collaborative effort between me and my Wife. She drew the non-binary star of the show, the Sun. They appear as the main graphical element of the design honouring that on which the project's success depends.

The PCB was designed in KiCad. I aimed for a “range extender” rather than a solar charger, as the input to the battery charging circuit is not exposed on the Hexpansion connectors.

PCB testing

PCB design

I used a buck converter (the AP63203WU) to generate 3.3v from the solar cell output. The solar cell is the Voltic systems P122 with a 6v rated open circuit full-sun voltage. As the buck convertor is rated to operate from 3.8v, I expected this to operate under normal outdoor conditions. It didn't, it would only operate in full sun, but fortunately EMFCamp 2024 was blessed with incredible weather.

The buck converter drives the 3.3v rail directly. In order to check if this worked, I used a “gas gauge” IC (the Analog Devices LTC2941). This accumulates the voltage developed over a low-value resistor placed in the power rail. The integral of this voltage is the total charge that has passed through the power rail, with separate accounting for both directions.

I used a 25mΩ sense resistor, which was far too small in practice. In full sun, the LTC2941 would accumulate a count every 5 seconds or so. In light shade, I would see a count every 60 seconds and inside no counts were seen. A larger sense resistor would have hurt efficiency but allowed for a better sense of where and when the panel was useful, which would have been worth it.

It was a lot of fun to solder the big sense resistor by hand – big pad pours in portable electronics are rare, for me, and the way solder flows along the edge is delightful.

The Hexpansion also included a depiction of the Sun, with orange LED's highlighting their hair. The LEDs were driven by a I2C expander. Whilst they could have been driven by the Hexpansion directly, I knew the software to read the charge counter would need to be developed without a Tildagon baseboard. By controlling everything via I2C and adding a quic port, the hardware could be verified using a laptop. I used a MCP2221A dongle from Adafruit and this worked wonderfully!

At EMF Camp

In use

Writing the UI software had to be done at the event, and was delayed by my choice of I2C EPROMs. I chose a very small chip, just to store a device ID. The Tildagon firmware expects to read a LittleFS filesystem on the EEPROM, but the chip I picked would only fit 2 LittleFS pages! I had to customise the Tildagon firmware to skip initialisation steps for a specific expansion port to work around this.

Otherwise, the integration of the Haxpansion and the Tildagon was painless and took around a hour. Showing the current accumulation and rate of clicks was a nice check that the panel was a net plus, feeding energy into the system. And I didn't need to charge my badge all event – although I don't know how much to credit the panel, as the badge is very efficient on it's own!

For my master's project, I aimed to develop a FPGA based quantum simulator with the objective of running >30 qubits at reasonable depth on a single Amazon F1 instance. Whilst the full-scale tests never happened, a tested general simulator was produced and was validated at 16 qubits on Zynq hardware. This series of blog posts will describe the simulation method and the use of CλaSH as a tool for rapid development of hardware networks.

Quantum simulation would be better termed quantum emulation, as hopefully real QC's will come along and we will want to call the process of simulating physical systems on QC simulation. But with few exceptions simulation has stuck and I hope this will not be too confusing for readers from 2025.

2025 update; we still do not have large scale error corrected quantum simulators as envisaged here, although I hope in another 5 years, and another 5 after that, we may.

Contents

  1. Quantum circuits (skip this if you have QC background)
  2. The recursive simulation algorithm
  3. Hardware layout
  4. Benchmarks

The simulation method

A quantum state on N qubits contains information about the relative magnitude and phase of all 2^N possible bitstrings. In the general, and common, case all 2^N values need to be stored as complex numbers. For even a 50 qubit simulation this requires a prohibitively large amount of memory (on the order of petabytes, and growing exponentially). Google has proposed a 72 qubit chip, and if it is able to run a square circuit this is likely to demonstrate true quantum advantage.

In order to push the boundaries of simulation groups from Google, Alibaba and Oxford use tensor network methods to simulate up to 52 qubit systems on datacenter scale computers.

We use a method of simulation inspired by the path integral formalism of QM. The likelihood of sampling a given final state is the sum of the probabilities of all paths that could possibly result in this state. This contrasts with the time evolution view of QM, where a inital state is evolved in some environment and then the final sample probabilities are found from the final state.

For large physical systems performing the integral over all possible priors can be challenging. For quantum circuits however it is straightforward – at each layer in the circuit there is a gate of low arity. The only states that can contribute are the states that when acted upon by that gate, give the target state.

This leads to a succinct backwardsEvaluate function for finding the amplitude of a given basis vector after applying a circuit:

<!--backwardsEvaluate circuit inital_state target_state-->
backwardsEvaluate [] i t = 
    | i == t    = 1.0
    | otherwise = 0.0
backwardsEvaluate (gate:xs) i t =
    let
        prior_states = (possiblePriors gate t)
    in let
        prior_amplitudes = map (backwardsEvaluate xs i) prior_states
    in
        -- the final amplitude is the sum of prior amplitudes
        -- multiplied by the action of the gate on the prior state
        sum (map (*) (zip prior_amplitudes (map gate prior_states)))

This function works in tandem with a forwards evaluator to get the action of a circuit on a inital state.

def simulate(circuit, state=0): # zero is the |00...00> state.
    for gate in circuit:
        sucessors = act(gate, circuit)
        amplitudes = map(backwardsEvaluate, sucessors)
        state = random.choices(sucessors,
                               weights=amplitudes.conj()*amplitudes, 
                               k=1)
    return state, amplitudes[sucessors.index(state)]

With a bit of work you can show that this function will return samples of the final state vector in proportion to the corresponding amplitude. This method also gives the true amplitude of that state so to sample the full vector you can call this function repeatedly until the absolute sum of amplitudes nears unity. Avoiding previously sampled elements of the state is a exercise for the reader!

The forwards function is presented in Python in order to emphasise that this function is not performance critical. In fact, in our implementation this function runs on the Zynq ARM core with only backwardsEvaluate built in hardware.

Performance and tweaking

This method, if the backwardsEvaluate function runs in a depth-first manner, will use space linear in the depth of the circuit. The tradeoff is that circuit runtime becomes both exponential in width and depth. The runtime can be reduced to that of the naïve matrix multiplication method by memoizing backwardsEvaluate – and if the circuit contains separable states they will never appear in the cache resulting in memory use potentially lower than naïve methods.

The advantage this method poses for FPGA instantiation is that by distributing the recursive calls to backwardsEvaluate in the form of a tree caches can be inserted in ways that provide high physical locality of reference. This allows for very wide memory parallelism and a computational structure that can be mapped to any fabric layout or BRAM availability.

As compared to a direct matrix method this avoids a bottleneck on DRAM access, a key limiting factor for high performance FPGA designs.

The hardware

The FPGA modules are written in CλaSH, with a ad-hoc wiring generator in Python and a Verilator test suite. Functionality was also verified on a Xilinx Zynq chip at nearly 100MHz.

CλaSH generates synthesizable verilog from Haskell functions of type State -> Input -> (State, Output) where state is the full internal state of your module clock-to-clock. The type of the top-level function that the CPU calls is:

findamp_mealy_N :: KnownNat n => ModuleState n -> Input -> (ModuleState n, Output)

We use KnownNat to add compile-time parameters to the module. In this case, the modules maintain a stack of evaluations to complete and n is the size of this stack.

Benchmarks

The hardware design was validated on a Zynq development board running Linux. The target device was the xc7z020clg400 at a -1 speed grade. Due to area limitations, and in order to generate a fully entangled intermediate state, a circuit of width 4 and depth 12 was used. The full circuit executed in 3µs, with timing correctly predicted by the RTL model.

Scaling estimation

In order to scale the design to multiple FPGA blades, we need to consider the total bandwidth that may be consumed communicating between parts of the design.

Amazon offers the F1 instance type, making up to 8 FPGA blades consisting of Xilinx Ultrascale+ ZU9P FPGAs with 2586k LUTs. The blades are interconnected via a 400 Gbps bidirectional ring interconnect. In the worst case, a single blade may consist of many low-depth FindAmp modules, with minimum sized stack buffers.

If the modules are responsible for evaluations of depth 2, they will be able to process a new request every 11 cycles. For this design point, where the entire fabric is consuming bandwidth, the bandwidth requirement exceeds the available bandwidth by 70%. More reasonable layouts should therefore not be bandwidth limited on the Amazon FPGA service.