As part of LCA 2020, I gave a quick talk at the Open Hardware Miniconf about a year's worth of work in ESP32 and upgrading my Shirt from 24x32 neopixels (P10) to 64x96 in RGBPanels (P4), giving me almost 10x more pixels.
Running lots of demo code at 96x64 in 24bpp, storing 2 framebuffers for page switching, plus bitplanes for PWM, it ended up significantly stressing the amount of fragmented memory available on ESP32, so this talk deals with what I learned and how to get around the limitations.
Then, I also brought a demo of my next version using Raspberry Pi and displaying a framebuffer of 128x192, using FastLED_RPIRGBPanel_GFX I wrote for the occasion :)
Demo of 64x96 with ESP32 and 128x192 on Rasberry Pi
RBGPanels are a pain to drive, they require constant refreshing and it becomes more of a problem when you aim for higher resolutions (128x128 and above), as they require more horsepoewr and memory than either a teensy 3.6 or ESP32 can reasonably provide (the two top of the line chips supported by SmartMatrix)
Another issue is that SmartMatrix, while better than all the other libraries on arduino, doesn't support all kind of weird panels out there, specifically the AB and AC panels that you often end up getting when you get higher resolutions like 128x64.
Using Neopixels would work better of course, but they caan't reasonably be had in less than P5 (0.5cm/LED) while RGBPanels go down to P2. Also, neopixels are about 10X more expensive per pixel, if not more.
So, the solution is to use a rPi to drive RGBPanels of size 128x128 and larger (a single Pi with 3 parallel channels can reasonably run up to 256x256. After that it gets harder and you need multiple microcontrollers).
This is where the excellent https://github.com/hzeller/rpi-rgb-led-matrix driver comes in. It's the most feature complete RGBPanel driver for microcontrollers
Ok, but you have all this arduino code, maybe written for a FastLED::NeoMatrix or Adafruit::NeoMatrix array using the Adafruit::GFX API. Or maybe, you used the FastLED API with an XY mapping function, or maybe even you're using the LEDMatrix API for FastLED. None of those work on rPi, and you don't want to change/rewrite your code.
Well, there is good news: you can use https://github.com/ChrisMicro/ArduinoOnPc to run arduino code on PCs, and therefore also rPi. It is however designed to display in an X11 windows, which is not what you'd want. So, instead, I forked it for you and wrote a rPi glue driver for my FrameBuffer::GFX base class: https://github.com/marcmerlin/ArduinoOnPc-FastLED-GFX-LEDMatrix
You therefore end up getting access to those 3 arduino graphics APIs, and you can render on rPi using a much faster and the more feature complete https://github.com/hzeller/rpi-rgb-led-matrix driver
By using these driver options, I get about 400Hz refresh rate on rPi3, lowering the amount of pwm bits:
~/rpi-rgb-led-matrix/examples-api-use/demo --led-gpio-mapping=regular --led-rows=64 --led-cols=128 --led-row-addr-type=0 --led-chain=1 --led-show-refresh --led-slowdown-gpio=1 --led-parallel=3 --led-pwm-dither-bits=1 --led-pwm-bits=7 --led-panel-type=FM6126A -D0
Those variables are assigned when you create "rgb_matrix::Canvas *canvas", which is fed into matrix->setCanvas(). See this example code:
After writing my 3rd backend glue driver (SSD1331 SPI TFTs) that supports Adafruit::GFX, FastLED CRGB's primitives (nblend, dim, etc...) and matrix mapping via XY() function, and LEDMatrix which is another GFX like API on top of FastLED, I realized that I had to factor that out into a new base class I called Framebuffer::GFX: https://github.com/marcmerlin/Framebuffer_GFX
That new base class takes all the GFX glue and color support I mixed (GFX RGB565, FastLED CRGB structs (RGB888 24bit), and uint32_t backed 24bit RGB888 colors, and creates a virtual framebuffer compatible with FastLED and SmartMatrix (which thankfully can use the same 3 byte per pixel array type).
Framebuffer::GFX in itself is only a framebuffer storage and method holder, but it contains so much common code that my 3 drivers that use it are only a few dozen lines of code after inheriting from it.
Here is the list of drivers I've written against Framebuffer::GFX:
Here is an example of code ultimately running on top of Framebuffer::GFX with FastLED::NeoMatrix on ESP8266 (24x32 and 32x32) and SmartMatrix::GFX on ESP32 (64x96):
Below is the same code again now running on top of FastLED_SPITFT::GFX on an SSD1331 96x64 TFT screen:
FastLED_SPITFT_GFX, the last driver I wrote, takes any Adafruit SPI TFT object (like SSD1331 and ILI9341), and a FastLED CRGB array. You then tell it the size of each (it's up to you not to make mistakes or you can create buffer overruns), and the overloaded show() method will send the framebuffer to the TFT (it is done line by line with an SPI copy method):
FastLED_SPITFT_GFX(fb, 96, 64, 96, 64, ssd1331, 0) for unrotated
FastLED_SPITFT_GFX(fb, 64, 96, 96, 64, ssd1331, 1) for a 90 degree rotation
Here is the end result, an ESP8266 running LEDMatrix code rendered in Framebuffer_GFX, downsampled from 24bit color to 16bit color, rotated and copied line by line to a SSD1331
Here is a video of Jason Coon's Aurora in 64x96 rotated to the SSD1331 96x64 resolution:
It's ironic that normally Neopixel matrices look like they have huge pixels compared to RGBPanes, but here my 64x96 RGBPanel looks huge compared to the same resolution on SSD1331:
rotating 3D cube with temporal fade
Table from Mark Estes Video Demo:
ST7735 or ILI9341
Thankfully Adafruit wrote other TFT drivers like ST7735 and ILI9341 against the same Adafruit_SPITFT object from Adafruit-GFX, so I was able to target that tft object in FastLED_SPITFT::GFX and get the same code to work with other TFTs without any modifications.
As a result, all you need to do is to pass the different tft object, display size, and everything else works.
Adafruit_ILI9341 *tft = new Adafruit_ILI9341(TFT_CS, TFT_DC, TFT_RST);
Adafruit_ST7735 *tft = new Adafruit_ST7735(TFT_CS, TFT_DC, TFT_RST);
FastLED_SPITFT_GFX *matrix = new FastLED_SPITFT_GFX(matrixleds, mw, mh, mw, mh, tft, 0);
For comparison, SSD1331 vs ST7735 128x128, ST7735 128x160 from 2 different vendors and slightly different chips, and ILI9341 with a full 320x240 which stretches the limit of this library since it requires 225KB for that many pixels and that only fits on a teensy 3.5/3.6:
the SSD1331 screen is off as it's not compatible and requires different code to turn on
ST7735R vs ST7735S chip revisions show a few differences
brightness is also different
code for the 128x128 ST7735 doesn't mis-display the same on the two 128x160 displays
Some demos showing 128x128 and 128x160 on multiple size screens (for physical size comparison):
Hopefully this is useful to you and by using the FastLED_SPITFT::GFX API, you can re-use your code on TFTs, FastLED::NeoMatrix and SmartMatrix::GFX.
pretty much work exactly the same except for how you init them, but if you use https://github.com/marcmerlin/FastLED_NeoMatrix_SmartMatrix_LEDMatrix_GFX_Demos/blob/master/neomatrix_config.h , you can just include that and your same code will work on both FastLED backed matrices and SmartMatrix backed matrices, even though they are totally different technologies.
RGBPanels do use less power even when corrected for amount of brightness generated (my estimate is at least 3 times less), they can be a lot more dense, they're cheaper, but they're a pain in the ass to drive since they require constant refreshes at high speed. That being said, as long as you don't exceed 128x64, which is more or less the practical limit on teensy 3.6 and ESP32 due to memory limits due to how SmartMatrix works (a different implementation could push things to at least 128x128 by sacrificing quality for memory use).
The demos I used for the pictures below are
After months and months of work, here is version 4:
Sadly, going up in resolution with addressable pixels, is not that easy. While in theory you should be able to fit at least 2 addressable pixels per centimeter (aka P5). Currently my premade panels are P10, which is the only thing I could buy pre-made.
What allowed me to switch were those flexible P4 RGB Panels from Azerone: https://amazon.com/gp/product/B07F87CM6Y
With their P4 resolution, I'm able to fit 96x64 on my body using 3 panels of 64x32 chained together. The 3rd panel is then chained to the 2nd set of 3 panels in my back:
On the old shirt, I put the rear panel inside the shirt, using the shirt as a diffuser, but with the RBGPanels, they were too thick for this to be practical, so I had to put them on top of the shirt. As a result, I ended up uing a black shirt which matches the color of the panels. I had to attach velcro to the new shirt, and confirmed that supergluing them was so much faster than sawing, and worked just as well:
I unsoldered the power connectors that were too thick, and used small metal wire to connect the panels together (see top middle of the picture). Turned out those metal wires were a mistake as they can cause shorts on the LEDs on the other side of the board:
Another thing I learned was that the holes I was using to put a metal wire to carry the panels over my shoulders, can't actually take the load, and the wire can cause damage to the copper trace that is just next to the hole. As a result, I replaced the metal wires with fishing wire and didn't use the bigger holes for load bearing:
Speaking of removing thickness from the board, I removed the top of the ribbon connectors to make them a bit thinner. Sadly, RGBPanels still require 15 wires to send the video signal:
I then took one panel and covered it with defusing foam (the rear panel, so that it's not too sharp and blinding to people behind me), while the front panel only has the plastic cover to protect the panels and offer a bit of extra diffusion:
you can see the difference between the diffusion levels
I then protected the rear of the panels given how much electronics were exposed:
Small details had to be solved, like making sure I had enough amps going through the wires (use thicker wires). Without that, my brightest pattern that uses 8 amps, didn't quite make it:
For fun, I made a pattern that scrolls my C++ scrolling code on the screens:
I went from a breadboard prototype to Jason Coon's ESP32 level shifter board, much more tidy
This video shows how things are wired from the ESP32 to the panels:
Here is what the whole power system looks like:
2 4S Lipos, 5Ah, 80wh, giving a total 160Wh of energy
Amp meter in line with the lipos and cell tester with low voltage warning buzzer
Amp gauge with timer to know how much energy flowed from the batteries (you can't run lipos down or they'll die)
Tobsun DC-DC converter to take voltage down to 5V
2nd voltage regulator to bring the voltage further down to 3.3V for the El Wire glasses
5V goes to RGBPanels via separate thick wire to carry the amps
ESP32 with level shifters from 3.3V back up to 5V for the RGBPanels (6 channels for the colors to level shifters, 4 address lines to do 16 scan line refreshes). CPU runs SmartMatrix::GFX and NeoMatrix-FastLED-IR
16th data line is used for the Neopixel strips on my arms and legs, running the same code than the previous shirt
Here is an example of 3 levels of diffusers, including a raw set of panels with no diffusers:
After going to Luminosity Beach Festival, a underpaid and undertrained security guard at the entrance freaked out at the wires, so I made boxes to hide the wires and hopefully remove the "OMG, it's a bomb" reflex that some people might have:
2 batteries, fuse, meters and output
adapter box that takes 16V down to 5V and measures current used while distributing power
both boxes together are bigger than my previous setup, but looks a bit better
You can see a demo of the outfit being worn:
If you don't have time for all this, and are ok with 64x64, you can try this backpack from gearbest with everything built in and a very thin board. Just not fun for me because I can't run my own code on it:
RGBPanels are a totally different technology based on row scan technology, pretty much like the 8x8 matrices I wrote a scanning driver for but with a built in shift register to load up all the column for each color, multiplied by 2 as for historical reasons you can update 2 halves of the panel separately.
With 32x64 panels, or even 64x64 panels, that's a lot of pixels to push serially via shift registers and address lines to select the line you've currently pushed all those columns for. The LEDs need to be refreshed very quickly to avoid visible flickering.
This limits the list of reasonble CPUs for higher resolutions to teensy 3.6 and ESP32, which also removes the multiple slower and/of inefficient drivers out there. Options I looked at and weren't suitable:
Best support for chaining panels (up to 128x128 on teensy, and maybe 64x128 on ESP32 before it runs out of DMA RAM)
High color depth 24bpp or higher (which honestly is more than I need, 24bpp is more than most panels can probably reasonably show and 16bpp would likely be enough for my use). I still wouldn't mind if SmartMatrix offered 16bpp in exchange for a higher refresh rate or lower resource and memory utilization (also allowing for a higher resolution on a given CPU)
Support for the 2 fastest common arduino like microcontrollers: teensy 3.6 and ESP32 (teensy 3.1/3.2 is not fast enough to refresh 64x64 well enough, and teensy 3.5 is slower than 3.6, so no reason to buy one)
Very powerful API with multiple layer support (great if you can use it, although I'll admit that I only need drawpixel thanks to Adafruit::GFX)
So, SmartMatrix is great, but I have all this code that relies on one or more of those APIs:
The easiest way to use SmartMatrix is to use the SmartMatrix Shield v4 from Louis Beaudoin.
If you are going to drive 64x64 and above, skip the teensy 3.0/3.1/3.2 and go directly to teensy 3.6. It costs more, but you'll want the extra CPU speed (teensy 3.1 can barely run 64x64 with an ok-ish refresh if you overclock it, if you must use the older chip).
Here is what the SmartMatrix shield looks like with a small patch I made to take USB power and send it to the panel (my laptop can output 2A over USB). Note that this is not safe with teensy v3.1/3.2 as it's not meant to pass that much current from its USB connection, but teensy 3.6 can do it fine as its fuse is located after the V+ connection on the chip:
Originally I used the APA connector to send power to the panel
2x 32x64 chained P4 panels with a sad cable extension I had to make, vs pre-made 64x64 P3 panel
SmartMatrix basic demo
The main problem with RGBPanels is that if the refresh rate isn't fast enough, they look bad on pictures. This is the main reason I switched to ESP32 which is dual core and can push a higher refresh rate via DMA than teensy can:
Chained panels giving mirrored output on a total display of 128x96:
As mentioned above, ESP32 is dual core, so it can update the panel on one core using DMA, while the other core can run your code. It is more efficient, however, it runs out of DMA memory around 64x128 resolution (I run 64x96 myself and had to optimize code to make things fit)..
Here are shots of what it looks like with Jason's shield:
it's reasonably compact, 15 IO's for SmartMatrix (14 are really required), IR connected to port 34, and IO 16 connected to a NeoPixel strip
This shows my flexible P4 96x64 panels I bought on amazon from Azerone, 3 tied together, one shown upside down for scale, a blank shield from Jason Coon, how I cut a 16 pin IDC ribbon cable and made it an in line row of pins I can connect into Jason's shield after having added a riser, and a patched board with IR connector on the back, and a yellow wire to redirect the pin Jason's board connected to RX which I use for debugging, to unused pin 27 instead:
While Jason's board is not perfect for this use, it's much better than my self made protoboard full of wires to connect the 74hc245 level shifters:
Here's a quick video summary that shoes the wiring and layout:
The FFAT module uses 8KB plus 4KB per concurrent file that can be opened. By default, it allows 10 files to be opened, which means it uses 48KB. IF you want to reduce its memory use, you can tell it to only support one file, and you will save 36KB, leaving you with only 12KB used.
if (!FFat.begin(0, "", 1)) die("Fat FS mount failed. Not enough RAM?");
I've then improved AnimatedGIFs to add support for FatFS/FFat which nicely fixes the short hangs I was getting with SPIFFS, which was ruining the animations:
I've been going to linux.conf.au for 18 years now (since 2001), and presented a fair amount of linux talks related there, but the big change for me was the open hardware miniconf that started in 2010. Thanks to its projects every year, I got to learn a lot about microcontrollers and some about electronics.
This talk was my first non linux talk which detailled everything I learned from those miniconfs and projects I worked that stemmed from them. I presented it at LCA 2019 Christchurch.
you can find the talk pdf here: http://marc.merlins.org/linux/talks/Using_Open_Hardware/Using_Open_Hardware.pdf (you'll want this one to get all the clickable links in the slides)
you can view the talk slides in html here or below:
Talk video below:
I arrived the sunday before the conference and helped out the open hardware organizers with a bit of last minute setup. I also got to do some last minute testing and tuning of my panels:
hacked up ESP32 with level converters on breadboard to run 3x 64x32 SmartMatrix panels with SmartMatrix::GFX
64x64 P3.8 SmartMatrix::GFX panel vs 3x 64x32 SmartMatrix::GFX P4 flexible panels vs 4x 16x16 FastLED::NeoMatrix P10 panels
After finishing the code tuning and demos just in time, gave a 20mn miniconf talk on the history of linux.conf.au hardware miniconf. I went through how much I learned from those confs and what I was able to achieve as a result. I sure got to learn a lot about microcontroller and driver programming:
I wasn't able to bring my burning man 4096 neopixel matrix, it doesn't even fit in my car, but the irony is that my small 64x64 rgbpanel has the same resolution and fits easily in my backpack
The 64x64 compact display is showing the hand X-ray here
A few days later, I gave the longer version of my talk at the main conference. By then it had grown to over 160 slides in a 45mn slot, or 16 seconds per slide. Ooops...
The full talk went into details on what I learned in the hardware hacking field, a lot of it was simply electricity, U=RI, wires, pre-made components (small inline volt/amp meters, DC-DC converters, and so forth).
This year, the Open Hardware Miniconf team designed a donkeycar for us at LCA 2019 Christchurch. It's a car that navigates by itself using its onboard camera connected to a Raspberry Pi using training video data gathered and analysed offline by tensorflow. That sure was an ambitious project!
I arrived the day before to help finish up the kits for the next morning:
the cars were eager to perform :)
Andy and Jon who ended up working all night to make sure the kits would work the next morning
The next morning, we showed up to build the kit:
rPi with custom last minute hat for the donkey car
Jon gave a talk about the car design
Nice way to support 5V neopixels on 3.3V microcontrollers
I inherited 64 strips of mostly 64 neopixels per strip (some were as low as 61, and some as high as 66).
not all the same lengths
64 strips is run as 16 lines of 4 strips of 64 pixels (256 pixels), were tested 4 by 4, as the line of 256 pixels that they were with a Neopixel tester that sends test patterns:
the neopixel tester I'm using, along with a 5V 10A DC converter outputting 8A into the controller (testing full white)
The first 16 lines took a long time (almost 2 full days due to the time to measure everything including the cardboard, cutting it carefully, marking where all the LEDs will go, and then testing as I go along to make sure I'm not repeating something that will have to be undone:
A bit of test from my MatrixGFXDemo.ino code
Then comes the issue of attaching all of this. I decided very early to remove the IP67 protection as the silicon is resitant to virtually all glues, making it very hard to work with. I also had to splice broken LEDs in the strips I inherited, so it's much easier to do without the protective casing. I simply attached the bare pixels with those very stick glue dots:
After about 3 days of work, got 50% done:
Power delivery: getting the wire sizes right is important, but turns out you don't need huge wires for 64 pixels. What I did was connect power from each of them at the end and therefore spread the maxium 10A they can draw to 4 times 2.5A, which can go fine over smaller wires as pictured below. The green wire goes between the Data Out to Data In pins of the strips (3 short green wires only for 4 strips)
The 64x64 array was meant to be 2 arrays of 32x64 for ease of transport, each with their own 60A 5V power supply (when actually 40A would have been enough for each set of 2048 LEDs even if their maximum is around 80A in theory). I used thick 10 gauge wire to make sure the power bus could support 100A or so if needed, even if in real life, it'll never really much more than 20-30A:
power testing, this can replace a small sun ;)
Once 50% was verified to be working fine, the other 50% was still a lot of work, but I didn't have to stop for testing of the power, layout, and design. It took another day and a half to do the other 50% at much higher pace, getting all the little wires cut at the right length in advance, gluing all the strips in one fell swoop (still hours or work), and then all the soldering, with validation at the end:
For anyone contemplating that work, most of the work was:
cutting the cardboard backing to the right size, and marking where the strips were going to go
cutting/fixing all the strips (they were second hand, some were broken or the wrong length)
320 glue points for the strips, one by one :)
cutting lots of little wires to the right length, stripping them
only then does soldering come in
then test each set of 4 strips (64x4) with a special neopixel tester
build the 10 gauge power bus
the hardest thing has been to solder all the little wires to the 10 gauge power bus turns out. They don't like staying together due to size and thermal issues.
and all the twisted pair and wiring on both sides (seemed trivial but it was more work than I thought) to connect to the microcontroller
ESP32, why not something else like Teensy?
So, my shirt that drives 24x32 uses ESP8266. ESP8266 can do up to 4 lines of parallel output, which is not sufficient for a proper frame rate on 4096 pixels. ESP32 does allow up to 24 lines of parallel output (untested) and can easily do 16 lines of output (110 frames per second for 4096 pixels).
Teensy would would have worked too, but I've had too many problems with teensy, namely:
the hacked up build environment patched on top of some version of the arduino sdk. Recently I've had 30 second pauses before compiles even happen if I'm using the serial port (reported the bug, never heard back, never got fixed). This made sense years ago, but not anymore today that arduino supports other hardware boards "the right way"
newer sdk patch that's supposed to fix things, working even less for me (1.42 beta worked even less than 1.41 when I tried it).
serial output just stopping randomly after outputing X lines, making debugging impossible. This was a big deal for me, I reported it, but never got a fix.
no hypervisor like task that catches crashes and gives you a traceback with line numbers like you get on ESP chips (which really helps for debugging problems). This is also a big plus of ESP chips
no onboard flash usable for an SPIFFS filesystem (like the 4MB of flash on ESP32). Now, I'll have to admit that SPIFFS starts falling in performance due to how it's written when you reach 1MB or more of data, but it could be fixed with a better driver and beats no onboard filesystem at all on teensy (you need the 50-ish dollar teensy 3.5 to get an sdcard reader)
closed bootloader that prevents better debugging, and maintained by a single person who is very helpful, does a lot of work, but cannot compete with more open chips maintained by multiple people. It pains me to say this, because Paul Stoffregen does an incredible amount of work for one person, but he remains one person with a closed bootloader design and a hacked up SDK (sorry to say it). Compare with ESP32 which has near real time support on https://gitter.im/espressif/arduino-esp, plus https://github.com/espressif/arduino-esp32/ .
and not that it really matters to me, but despite all these issues, teensy costs at least 2 to 3 times the price of ESP32.
Teensy 3.6 is 32 bit 180/240 MHz ARM Cortex-M4 single core vs 32-bit LX6 microprocessor, operating at 160 or 240 MHz dual core
Teensy 3.6 is 1MB of flash vs 4MB of flash for ESP32 (which can be segmented for OTAs via Wifi and SPIFFS filesystem)
Teensy 3.6 has 256KB vs usually 520MB of SRAM on ESP32 (although not that much more than teensy's memory amount is usable for code).
ESP32 is dual core (although that adds complexity), adds Wifi and BT vs built in sdcard on teensy 3.6
Teensy has more pins but requires an expensive breakout board to use them all
I think teensy 3.1 (now 3.5/3.6) was the best chip around for many years, but honestly ESP32 seems like a better solution for most needs, especially debuggability. This is not to say that Teensy 3.6 is a bad offer, it does a few things better than ESP32, but at a much higher cost, and its SDK and problems explained above, make it a less desirable solution for me.
Why not use those premade 32x32, 32x64, or even 64x64 RGBpanels?
This is a very good question. First, they don't exist for neopixels, they exist for a different lower tech solution that requires a lot of work to drive.
It's a lot easier to get a lot of pixels for not much work on those RGB panels: https://www.adafruit.com/product/2276 , but they are thicker, don't bend at all, and driving them is a lot more work than neopixels. Sadly, because they require row scanning (like an old CRT TV), they also don't look good if you move them, move your head, or take a picture unless you drive them at very high speed, which is harder to do with microcontrollers.
Turns out that those panels can be made bendable too now: https://www.adafruit.com/product/3803 but still, they don't seem to come in bigger pitch sizes, and still have the persistence of vision problem I just described.
In the case of a big display, neopixels that usually require too much space (high pitch), actually come out ahead if you want your display to be 1m^2. RGB panels come out ahead for smaller displays with high resolution, you can go as low as 2-3mm pitch, which beats all existing neopixels: https://www.adafruit.com/product/2279 .
If you want to go higher sizes with those RGB panels, it gets more complicated as you need to drive them even faster. An ESP32 can reasonably drive up to 96x64 pixels while a tensy 3.6 can almost drive 128x128 with a bad refresh rate. You need a raspberry pi or an FPGA for bigger matrices.
My point here is that neopixels cost more, but they're a lot easier to drive, despite the timing issues you start running into when you're driving a lot (the more you drive per line, the lower the refresh rate, putting a reasonable limit for a single MCU around 10,000 LEDs if you're ok with a 35Hz refresh rate).
I personally wish for Neopixel matrices that ship in 32x32 or higher, potentially with the option to inject a new data line every 256 or 512 pixels (so you can drive them as one big slower array, or cut the data line in the middle and inject parallel data lines for faster refresh rates.
ESP32 8 or 16 Parallel output and driver
First, 4096 pixels without parallel output, I would only get 7fps, which is quite slow.
With Sam's driver: https://github.com/samguyer/FastLED , you can use the RMT driver in ESP32 which allows for 8 parallel outputs. This can be used for more than 8 pins, I used it with 16 and the driver can switch RMT back and forth between the first set of 8 pins and the 2nd set of 8 pins.
Yves' driver obviously gives you better FPS, but taxes the CPU a lot more by doing all the bit banging. Also, in my testing, it did not work reliably until I added level shifters, while Sam's driver actually produced better waveforms that worked at 3.3V without level shifters.
The 2 drivers don't get setup the same though, see those differences:
Which driver is best for you? I'd say it depends but if you are ok with up to 8 parallel pins, use Sam's driver with RMT, and if you want more pins (up to about 24), use Yves' driver.
Wiring and Level Shifters
I first did it wrong by wiring directly to ESP32. This was doubly a mistake because there is sadly no standard pin numbers between ESP32 boards, meaning that I had to re-wire my plugs if I changed chips:
My other problem was that while 3.3V output worked ok enough, when using Yves Basin's 16 line parallel output code, the software built waveforms didn't work well enough at 3.3V. I had to add level shifters, which also nicely added a level of indirection between my cat5 twisted pair cables and the pin numbers on the chip:
Later, I changed one more thing which was to reomve the bidirectional level shifters that were unnecessary and caused issues at boot on ESP32 by messing with some IO pins. Turns out I had go make sure GPIO2 and GPIO12 were low at boot or flashing and reboots would fail (hence the resistors in the picture). However, I ended up replacing them with simpler 74HC245 unidirectional level shifters which don't mess with I/O pins and removed the need for the resistors
Thankfully I was able to leverage the weeks/months of work I put on https://github.com/marcmerlin/FastLED_NeoMatrix and then demos I wrote for it, or shamelessly borrowed from more talented programmers :)
I then spent a lot of time on my https://github.com/marcmerlin/NeoMatrix-FastLED-IR code that ran my Neopixel shirt and adapted it so that its demos would work on a 64x64 matrix while skipping the handling of neopixel strips that are on my pants and arms. I then did a recording of the entire set of demos, including 64x64 animated gifs I found and liked, and ended up with 41mn:
The build was a lot of work, no fun at all: over 4 days of solid work... If you do this, strongly consider getting pre-built matrices that are ideally at least 32x32. Sadly most of the ones for sale today are 16x16 which still means getting 16 of them for about $500, laying them out and soldering them. It's not trivial work either if you re-inject power in them in more than one place, but clearly less work than laying 64 strips by hand like I did.
Get power right. I had some experience there, so I did my math beforehand and verified as I went along. It's not so hard to change a power supply, but it sucks if you have to replace all your power wires you spent so long to cut and solder.
Software is key of course. Running 16 strips in parallel requires some work from a small embedded CPU. Doing Infrared at the same time is not trivial. You can look at my code on how I got it to work, including this bug I found: https://github.com/espressif/arduino-esp32/issues/1781
The RMT driver on ESP32 is great doing doing DMA to 8 lines and either doing infrared without interrupts (sadly I found no IRRemote compatible RMT driver for arduino), or for outputting 8 lines of neopixels at once without big banging from the CPU (this is the FastLED Neopixel driver that Sam Guyer wrote). 8 lines only gives 55fps for 4096 LEDs, while 16 lines gives a nicer 110fps and leaves the RMT driver free for IR Receiving (putting aside that there is no driver at the moment).
I couldn't have done this without plenty of great work from others, be it the FastLED authors and contributors, Yves who offered his suport since he did a bigger build than mine, and his 16 line parallel driver, Jason Coon and others for the Aurora SmartMatrix demos I was able to use, and Mark Estes for even more LEDMatrix demos he wrote and that I was able to use too. Thanks all.
Oh yeah, I built this for Burning Man and despite being hard to transport due to its size, it made it there ok and survived the playa dust for a week:
due to lack of skill and lack of time, I used my protoboard and taped it on the reard of the display. Not professional, but it works
running matrix demo
animated GIFs are fun
somehow my protoboard and ESP32 survived the playa dust for the week