A major change is the support of variable length frames between host and Proxmark3. This is a step especially important for usage over FPC/USART/BT. Old format ========== Previously, frames were, in both directions like this: uint64_t cmd; uint64_t arg[3]; union { uint8_t asBytes[USB_CMD_DATA_SIZE]; uint32_t asDwords[USB_CMD_DATA_SIZE / 4]; } d; with USB_CMD_DATA_SIZE = 512 and there was no API abstraction, everybody was forging/parsing these frames. So the frame size was fixed, 544 bytes, even for simple ACKs. When snooping the USB transfers, we can observe the host is sending 544b Bulk USB frames while the Proxmark3 is limited by its internal buffers and is sending 128b, 128b, 128b, 128b, 32b, so in total 5 packets. New format ========== Even if we make the payload part variable in the old format, we've still a minimum of 32 bytes per frame with fields arbitrarily large. So we designed a new format from scratch: For commands being sent to the Proxmark: uint32_t magic; uint16_t length : 15; bool ng : 1; uint16_t cmd; uint8_t data[length]; uint16_t crc; * magic: arbitrary magic ("PM3a") to help re-sync if needed * length: length of the variable payload, 0 if none, max 512 (USB_CMD_DATA_SIZE) for now. * ng: flag to tell if the data is following the new format (ng) or the old one, see transition notes below * cmd: as previously, on 16b as it's enough * data: variable length payload * crc: either an actual CRC (crc14a) or a Magic placeholder ("a3") For responses from the Proxmark: uint32_t magic; uint16_t length : 15; bool ng : 1; int16_t status; uint16_t cmd; uint8_t data[length]; uint16_t crc; * magic: arbitrary magic ("PM3a") to help re-sync if needed * length: length of the variable payload, 0 if none, max 512 (USB_CMD_DATA_SIZE) for now. * ng: flag to tell if the data is following the new format (ng) or the old one, see transition notes below * status: a field to send back the status of the command execution * cmd: as previously, on 16b as it's enough * data: variable length payload * crc: either an actual CRC (crc14a) or a Magic placeholder ("a3") We used to send an anonymous ACK, now we're replying with the corresponding command name and a status. CRC is optional and on reception, the magic "a3" is accepted as placeholder. If it's different then it's checked as a CRC. By default CRC is user over USART and is disabled over USB, on both directions. Internal structures used to handle these packets are: * PacketCommandNGPreamble * PacketCommandNGPostamble * PacketCommandNGRaw * PacketResponseNGPreamble * PacketResponseNGPostamble * PacketResponseNGRaw But they are abstracted from the developer view with a new API. Transition ========== Because it's a long transition to clean all the code from the old format and because we don't want to break stuffs when flashing the bootloader, the old frames are still supported together with the new frames. The old structure is now called PacketCommandOLD and PacketResponseOLD and it's also abstracted from the developer view with the new API. New format API ============== So the new API is a merge of the old and the new frame formats, to ensure a smooth transition. The boolean "ng" indicates if the structure is storing data from the old or the new format. Old format can come from either old 544b frames or mixed frames (variable length but still with oldargs). After the full transition, we might remove the fields "oldarg" and "ng". typedef struct { uint16_t cmd; uint16_t length; uint32_t magic; // NG uint16_t crc; // NG uint64_t oldarg[3]; // OLD union { uint8_t asBytes[USB_CMD_DATA_SIZE]; uint32_t asDwords[USB_CMD_DATA_SIZE / 4]; } data; bool ng; // does it store NG data or OLD data? } PACKED PacketCommandNG; typedef struct { uint16_t cmd; uint16_t length; uint32_t magic; // NG int16_t status; // NG uint16_t crc; // NG uint64_t oldarg[3]; // OLD union { uint8_t asBytes[USB_CMD_DATA_SIZE]; uint32_t asDwords[USB_CMD_DATA_SIZE / 4]; } data; bool ng; // does it store NG data or OLD data? } PACKED PacketResponseNG; On the client, for sending frames: ---------------------------------- (client->comms.c) ***************************************** void SendCommandNG(uint16_t cmd, uint8_t *data, size_t len); void SendCommandOLD(uint64_t cmd, uint64_t arg0, uint64_t arg1, uint64_t arg2, void *data, size_t len); void SendCommandMIX(uint64_t cmd, uint64_t arg0, uint64_t arg1, uint64_t arg2, void *data, size_t len); ***************************************** So cmds should make the transition from SendCommandOLD to SendCommandNG to benefit from smaller frames (and armsrc handlers adjusted accordingly of course). SendCommandMIX is a transition fct: it uses the same API as SendCommandOLD but benefits somehow from variable length frames. It occupies at least 24b of data for the oldargs and real data is therefore limited to USB_CMD_DATA_SIZE - 24 (defined as USB_CMD_DATA_SIZE_MIX). Besides the size limitation, the receiver handler doesn't know if this was an OLD frame or a MIX frame, it gets its oldargs and data as usual. Warning : it makes sense to move from SendCommandOLD to SendCommandMIX only for *commands with small payloads*. * otherwise both have about the same size * SendCommandMIX has a smaller payload (USB_CMD_DATA_SIZE_MIX < USB_CMD_DATA_SIZE) so it's risky to blindly move from OLD to MIX if there is a large payload. Internally these functions prepare the new or old frames and call uart_communication which calls uart_send. On the Proxmark3, for receiving frames: --------------------------------------- (armsrc/appmain.c) ***************************************** PacketCommandNG ***************************************** AppMain calls receive_ng(common/cmd.c) which calls usb_read_ng/usart_read_ng to get a PacketCommandNG, then passes it to PacketReceived. (no matter if it's an old frame or a new frame, check PacketCommandNG "ng" field to know if there are oldargs) PacketReceive is the commands broker. Old handlers will still find their stuff in "oldarg" field. On the Proxmark3, for sending frames: ------------------------------------- (common/cmd.c) ***************************************** int16_t reply_ng(uint16_t cmd, int16_t status, uint8_t *data, size_t len) int16_t reply_old(uint64_t cmd, uint64_t arg0, uint64_t arg1, uint64_t arg2, void *data, size_t len) int16_t reply_mix(uint64_t cmd, uint64_t arg0, uint64_t arg1, uint64_t arg2, void *data, size_t len) ***************************************** So replies should make the transition from reply_old to reply_ng to benefit from smaller frames (and client reception adjusted accordingly of course). reply_mix is a transition fct: it uses the same API as reply_old but benefits somehow from variable length frames. It occupies at least 24b of data for the oldargs and real data is therefore limited to USB_CMD_DATA_SIZE - 24. Besides the size limitation, the client command doesn't know if this was an OLD frame or a MIX frame, it gets its oldargs and data as usual. Example with CMD_PING that supports both styles (from client CmdPing or CmdPingNG) and replies with the new frame format when it receives new command format: if (packet->ng) { reply_ng(CMD_PING, PM3_SUCCESS, packet->data.asBytes, packet->length); } else { // reply_old(CMD_ACK, reply_via_fpc, 0, 0, 0, 0); reply_mix(CMD_ACK, reply_via_fpc, 0, 0, 0, 0); } On the client, for receiving frames: ------------------------------------ (client->comms.c) ***************************************** WaitForResponseTimeout -> PacketResponseNG ***************************************** uart_communication calls uart_receive and create a PacketResponseNG, then passes it to PacketResponseReceived. PacketResponseReceived treats it immediately (prints) or stores it with storeReply. Commands do WaitForResponseTimeoutW (or dl_it) which uses getReply to fetch responses. API transition ============== In short, to move from one format to the other, we need for each command: * (client TX) SendCommandOLD -> SendCommandNG (with all stuff in ad-hoc structs in "data" field) * (pm3 RX) PacketCommandNG parsing, from "oldarg" to only the "data" field * (pm3 TX) reply_old -> reply_ng (with all stuff in ad-hoc structs in "data" field) * (client RX) PacketResponseNG parsing, from "oldarg" to only the "data" field Meanwhile, a fast transition to MIX frames can be done with: * (client TX) SendCommandOLD -> SendCommandMIX (but check the limited data size) * (pm3 TX) reply_old -> reply_mix (but check the limited data size) Bootrom ======= Bootrom code will still use the old frame format to remain compatible with other repos supporting the old format and because it would hardly gain anything from the new format: * almost all frames convey 512b of payload, so difference in overhead is neglictible * bringing flash over usart sounds risky and would be terribly slow anyway (115200 bauds vs. 7M bauds). On the Proxmark3, for receiving frames: --------------------------------------- (bootrom/bootrom.c) usb_read (common/usb_cdc.c) -> UsbPacketReceived (bootrom.c) -> CMD_DEVICE_INFO / CMD_START_FLASH / CMD_FINISH_WRITE / CMD_HARDWARE_RESET / CMD_SETUP_WRITE also usb_enable, usb_disable (common/usb_cdc.c) On the Proxmark3, for sending frames: ------------------------------------- (bootrom/bootrom.c) reply_old (bootrom.c) -> usb_write (common/usb_cdc.c) also usb_enable, usb_disable (common/usb_cdc.c) On the client, for sending frames: ------------------------------------- Therefore, the flasher client (client/flasher.c + client->flash.c) must still use these old frames. It uses a few commands in common with current client code: OpenProxmark CloseProxmark SendCommandOLD -> CMD_DEVICE_INFO / CMD_START_FLASH / CMD_FINISH_WRITE / CMD_HARDWARE_RESET On the client, for receiving frames: ------------------------------------ As usual, old frames are still supported WaitForResponseTimeout -> PacketResponseNG New usart RX FIFO ================= USART code has been rewritten to cope with unknown size packets. * using USART full duplex with double DMA buffer on RX & TX * using internal FIFO for RX usart_init: * USART is activated all way long from usart_init(), no need to touch it in RX/TX routines: pUS1->US_PTCR = AT91C_PDC_RXTEN | AT91C_PDC_TXTEN usart_writebuffer_sync: * still using DMA but accepts arbitrary packet sizes * removed unneeded memcpy * wait for DMA buffer to be treated before returning, therefore "sync" * we could make an async version but caller must be sure the DMA buffer remains available! * as it's sync, no need for next DMA buffer usart_read_ng: * user tells expected packet length * relies on usart_rxdata_available to know if there is data in our FIFO buffer * fetches data from our FIFO * dynamic number of tries (depending on FPC speed) to wait for asked data usart_rxdata_available: * polls usart_fill_rxfifo * returns number of bytes available in our FIFO usart_fill_rxfifo: * if next DMA buffer got moved to current buffer (US_RNCR == 0), it means one DMA buffer is full * transfer current DMA buffer data to our FIFO * swap to the other DMA buffer * provide the emptied DMA buffer as next DMA buffer * if current DMA buffer is partially filled * transfer available data to our FIFO * remember how many bytes we already copied to our FIFO Timings ======= Reference (before new format): linux usb: #db# USB Transfer Speed PM3 -> Client = 545109 Bytes/s On a Windows VM: proxspace usb: #db# USB Transfer Speed PM3 -> Client = 233998 Bytes/s (common/usart.h) USART_BAUD_RATE defined there 9600: #db# USB Transfer Speed PM3 -> Client = 934 Bytes/s 115200: #db# USB Transfer Speed PM3 -> Client = 11137 Bytes/s 460800: #db# USB Transfer Speed PM3 -> Client = 43119 Bytes/s linux usb: #db# USB Transfer Speed PM3 -> Client = 666624 Bytes/s (equiv. to ~7Mbaud) (uart/uart_posix.c) // Receiving from USART need more than 30ms as we used on USB // else we get errors about "Received packet frame ... too short" // Now we're using 100ms // FTDI 9600 hw status -> we need 20ms // FTDI 115200 hw status -> we need 50ms // FTDI 460800 hw status -> we need 30ms struct timeval timeout = { .tv_sec = 0, // 0 second .tv_usec = 100000 // 100 000 micro seconds }; Add automatically some communication delay in the WaitForResponseTimeout & dl_it timeouts Only when using FPC, timeout = 2* empirically measured delay (FTDI cable) Empirically measured delay (FTDI cable) with "hw pingng 512" : usb -> 6.. 32ms 460800 -> 40.. 70ms 9600 -> 1100..1150ms (client/comms.c) static size_t communication_delay(void) { if (conn.send_via_fpc_usart) // needed also for Windows USB USART?? return 2 * (12000000 / uart_speed); return 100; } Because some commands send a lot of frames before finishing (hw status, lf read,...), WaitForResponseTimeout & dl_it timeouts are reset at each packet reception, so timeout is actually counted after latest received packet, it doesn't depend anymore on the number of received packets. Needed to tune pm3 RX usart maxtry (common/usart.c) uint32_t usart_read_ng(uint8_t *data, size_t len) { // Empirical max try observed: 3000000 / USART_BAUD_RATE // Let's take 10x uint32_t maxtry = 10 * ( 3000000 / USART_BAUD_RATE ); DbpStringEx using reply_old: time client/proxmark3 -p /dev/ttyACM0 -c "hw status" 2.52s time client/proxmark3 -p /dev/ttyUSB0 -b 460800 -c "hw status" 3.03s time client/proxmark3 -p /dev/ttyUSB0 -b 115200 -c "hw status" 4.88s DbpStringEx using reply_old: time client/proxmark3 -p /dev/ttyUSB0 -b 9600 -c "hw status" 26.5s DbpStringEx using reply_mix: time client/proxmark3 -p /dev/ttyUSB0 -b 9600 -c "hw status" 7.08s DbpStringEx using reply_ng: time client/proxmark3 -p /dev/ttyACM0 -c "hw status" 2.10s time client/proxmark3 -p /dev/ttyUSB0 -b 460800 -c "hw status" 2.22s time client/proxmark3 -p /dev/ttyUSB0 -b 115200 -c "hw status" 2.43s time client/proxmark3 -p /dev/ttyUSB0 -b 9600 -c "hw status" 5.75s time client/proxmark3 -p /dev/ttyUSB0 -b 9600 -c "lf read" 50.38s time client/proxmark3 -p /dev/ttyUSB0 -b 115200 -c "lf read" 6.28s time client/proxmark3 -p /dev/ttyACM0 -c "mem save f foo_usb" 1.48s time client/proxmark3 -p /dev/ttyUSB0 -b 115200 -c "mem save f foo_fpc" 25.34s Sending multiple commands can still be slow because it waits regularly for incoming RX frames and the timings are quite conservative because of BT (see struct timeval timeout in uart_posix.c, now at 200ms). When one knows there is no response to wait before the next command, he can use the same trick as in the flasher: // fast push mode conn.block_after_ACK = true; some loop { if (sending_last_command) // Disable fast mode conn.block_after_ACK = false; SendCommandOLD / SendCommandMix if (!WaitForResponseTimeout(CMD_ACK, &resp, some_timeout)) { .... conn.block_after_ACK = false; return PM3_ETIMEOUT; } } return PM3_SUCCESS; Or if it's too complex to determine when we're sending the last command: // fast push mode conn.block_after_ACK = true; some loop { SendCommandOLD / SendCommandMIX if (!WaitForResponseTimeout(CMD_ACK, &resp, some_timeout)) { .... conn.block_after_ACK = false; return PM3_ETIMEOUT; } } // Disable fast mode and send a dummy command to make it effective conn.block_after_ACK = false; SendCommandMIX(CMD_PING, 0, 0, 0, NULL, 0); WaitForResponseTimeout(CMD_ACK, NULL, 1000); return PM3_SUCCESS; Reference frames ================ For helping debug... On linux USB * sent packets can be 544 * received packets are max 128, so 544 = 128+128+128+128+32 On linux UART (FTDI) * sent packets are max 256, so 544 = 256+256+32 * received packets are max 512, so 544 = 512+32 Initial connection: TestProxmark: SendCommandOLD(CMD_PING, 0, 0, 0, NULL, 0); ->544=0901000000000000000000000000000000000000000000000000000000000000 -> OLD CMD_PING: reply_mix(CMD_ACK, reply_via_fpc, 0, 0, 0, 0); <-36=504d336218000000ff0000000000000000000000000000000000000000000000 <- MIX main_loop pm3_version: SendCommandOLD(CMD_VERSION, 0, 0, 0, NULL, 0); ->544=0701000000000000000000000000000000000000000000000000000000000000 -> OLD SendVersion: reply_old(CMD_ACK,...); <-128=ff000000000000004f0a0b270000000009710300000000000000000000000000 <- OLD <-128=6572696d656e74616c5f7661726c656e2f33646431663163372d64697274792d <-128=484620696d616765206275696c7420666f7220327333307671313030206f6e20 <-128=0000000000000000000000000000000000000000000000000000000000000000 <-32=0000000000000000000000000000000000000000000000000000000000000000 "hw ping" CmdPing SendCommandMIX(CMD_PING, 0, 0, 0, NULL, 0); ->34=504d336118000901000000000000000000000000000000000000000000000000 -> MIX CMD_PING reply_mix(CMD_ACK, reply_via_fpc, 0, 0, 0, 0); <-36=504d336218000000ff0000000000000000000000000000000000000000000000 <- MIX "hw pingng" CmdPingNG SendCommandNG(CMD_PING, data, len); ->10=504d3361008009016133 -> NG CMD_PING reply_ng(CMD_PING, PM3_SUCCESS, packet->data.asBytes, packet->length); <-12=504d33620080000009016233 <- NG "hw pingng 512" CmdPingNG SendCommandNG(CMD_PING, data, len); ->522=504d336100820901000102030405060708090a0b0c0d0e0f1011121314151617 -> NG CMD_PING reply_ng(CMD_PING, PM3_SUCCESS, packet->data.asBytes, packet->length); <-128=504d3362008200000901000102030405060708090a0b0c0d0e0f101112131415 <- NG <-128=767778797a7b7c7d7e7f808182838485868788898a8b8c8d8e8f909192939495 <-128=f6f7f8f9fafbfcfdfeff000102030405060708090a0b0c0d0e0f101112131415 <-128=767778797a7b7c7d7e7f808182838485868788898a8b8c8d8e8f909192939495 <-12=f6f7f8f9fafbfcfdfeff6233