https://github.com/marcmerlin/SmartMatrix_GFX is a zero copy, zero extra buffer frontend to Smartmatrix, which is the best arduino API driver for RGB Panels.
It supports these 4 APIs seemlessly and concurrently in the same code:
Last year, I wrote FastLED::NeoMatrix to let me run Neopixel Matrices made out of pre-made panels arranged as a bigger panel. This was the end result: http://marc.merlins.org/perso/arduino/post_2018-04-23_FastLED_NeoMatrix-library_-how-to-do-Matrices-with-FastLED-and-Adafruit_GFX.html
This allowed me to do my Party shirt v3 based on a NeoPixel Matrix
However, the main problem I had was the limited pixel density of those neopixels and the price per pixel given that each pixel has a very small computer chip attached. My shirt was only 768 pixels per side (32x8x3) which cost $80 per side. While my shirt looked cool (i.e. better than nothing), 32x24 resolution isn't that much to display cool stuff. I made the best of it, but I knew that I wanted more pixels.
While it's technically possible to get 0.5cm pitch (i.e. P5) with nepixels, there is no such panel I could buy today and I wasn't really interested in fabbing my own, so I switched to RGBPanels.
What allowed me to switch were those flexible P4 RGB Panels from Azerone: https://amazon.com/gp/product/B07F87CM6Y
RGBPanels are a totally different technology based on row scan technology, pretty much like the 8x8 matrices I wrote a scanning driver for but with a built in shift register to load up all the column for each color, multiplied by 2 as for historical reasons you can update 2 halves of the panel separately.
With 32x64 panels, or even 64x64 panels, that's a lot of pixels to push serially via shift registers and address lines to select the line you've currently pushed all those columns for. The LEDs need to be refreshed very quickly to avoid visible flickering.
This limits the list of reasonble CPUs for higher resolutions to teensy 3.6 and ESP32, which also removes the multiple slower and/of inefficient drivers out there. Options I looked at and weren't suitable:
https://github.com/adafruit/RGB-matrix-Panel/ (the adafruit driver is actually efficient, and recently got ESP32 support, but does not support panel chaining past very basic chaining)
https://github.com/mrfaptastic/ESP32-RGB64x32MatrixPanel-I2S-DMA/ (not well tested for larger chained panels, but efficient with DMA and offers Adafruit::GFX API. It also supports page level refresh instead of line level, so flickering is more manageable on it)
https://github.com/NeoCat/ESP32-P3RGB64x32MatrixPanel is an alternate driver with DMA support and apparently unsupported. This driver seems unsupported and probably not worth your attention when you have SmartMatrix (teensylc branch) or mrfaptastic/ESP32-RGB64x32MatrixPanel-I2S-DMA
This leaves us with the most complete driver of them all, Smartmatrix. The main pluses are:
Best support for chaining panels (up to 128x128 on teensy, and maybe 64x128 on ESP32 before it runs out of DMA RAM)
High color depth 24bpp or higher (which honestly is more than I need, 24bpp is more than most panels can probably reasonably show and 16bpp would likely be enough for my use). I still wouldn't mind if SmartMatrix offered 16bpp in exchange for a higher refresh rate or lower resource and memory utilization (also allowing for a higher resolution on a given CPU)
Support for the 2 fastest common arduino like microcontrollers: teensy 3.6 and ESP32 (teensy 3.1/3.2 is not fast enough to refresh 64x64 well enough, and teensy 3.5 is slower than 3.6, so no reason to buy one)
Very powerful API with multiple layer support (great if you can use it, although I'll admit that I only need drawpixel thanks to Adafruit::GFX)
So, SmartMatrix is great, but I have all this code that relies on one or more of those APIs:
I have a reasonble collection of demos I've gathered (a few I wrote myself), here: https://github.com/marcmerlin/FastLED_NeoMatrix_SmartMatrix_LEDMatrix_GFX_Demos and they use a combination of those 3 APIs.
The goal was for me to be able to re-use that code and make it work on both FastLED backends and SmartMatrix backends, which why I wrote SmartMatrix::GFX https://github.com/marcmerlin/SmartMatrix_GFX offers a GFX compat layer that is virtually identical to my FastLED::NeoMatrix library and allows you to run the same code onto of either FastLED or SmartMatrix supported panels.
Hardware, Teensy 3.6 and SmartMatrix Shield v4
The easiest way to use SmartMatrix is to use the SmartMatrix Shield v4 from Louis Beaudoin.
If you are going to drive 64x64 and above, skip the teensy 3.0/3.1/3.2 and go directly to teensy 3.6. It costs more, but you'll want the extra CPU speed (teensy 3.1 can barely run 64x64 with an ok-ish refresh if you overclock it, if you must use the older chip).
Here is what the SmartMatrix shield looks like with a small patch I made to take USB power and send it to the panel (my laptop can output 2A over USB). Note that this is not safe with teensy v3.1/3.2 as it's not meant to pass that much current from its USB connection, but teensy 3.6 can do it fine as its fuse is located after the V+ connection on the chip:
Originally I used the APA connector to send power to the panel
2x 32x64 chained P4 panels with a sad cable extension I had to make, vs pre-made 64x64 P3 panel
SmartMatrix basic demo
The main problem with RGBPanels is that if the refresh rate isn't fast enough, they look bad on pictures. This is the main reason I switched to ESP32 which is dual core and can push a higher refresh rate via DMA than teensy can:
Chained panels giving mirrored output on a total display of 128x96:
As mentioned above, ESP32 is dual core, so it can update the panel on one core using DMA, while the other core can run your code. It is more efficient, however, it runs out of DMA memory around 64x128 resolution (I run 64x96 myself and had to optimize code to make things fit)..
Here are shots of what it looks like with Jason's shield:
it's reasonably compact, 15 IO's for SmartMatrix (14 are really required), IR connected to port 34, and IO 16 connected to a NeoPixel strip
This shows my flexible P4 96x64 panels I bought on amazon from Azerone, 3 tied together, one shown upside down for scale, a blank shield from Jason Coon, how I cut a 16 pin IDC ribbon cable and made it an in line row of pins I can connect into Jason's shield after having added a riser, and a patched board with IR connector on the back, and a yellow wire to redirect the pin Jason's board connected to RX which I use for debugging, to unused pin 27 instead:
While Jason's board is not perfect for this use, it's much better than my self made protoboard full of wires to connect the 74hc245 level shifters:
Here's a quick video summary that shoes the wiring and layout:
Do not even think about using local arrays in functions, that's worse as they go on the stack and will smash the stack (I think you're limited to around 8KB)
ESP32 has SPIFFS to use its flash to store data like Animated GIFs. You will find it unacceptably slow if you store 1MB or more and seek across a bunch of files, Instead, use FatFS as explained here:
The flexible panels sadly kept breaking, Azerone was super nice in offering to fix them, see the thin patched wires they added, expert work I'm not capable of:
I first tried strenghtening them, but it still wasn't solid enough, and a waste of time:
I ended up switching to hard panels, I should have used them for the start, They've been a lot more solid: