cross-posted from: https://lemmy.dbzer0.com/post/38015770

A washing machine is trapped in a fault state even though all the components function (AFAICT). The controller board has two ports:

  • ISP (to attach an ISP programmer to flash new software)
  • USART (4-pin serial port: 0v, TX, RX, 5v)

I’m guessing the ISP port is useless without whatever proprietary software is needed. But what can the USART do for me? Can that be used to obtain the error code and clear it, or reset the board to the factory state? Has anyone done that, without documentation?

  • j4k3@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    23 days ago

    UART is often used for debug and programming, but there is no telling what it is running from this info. What is the processor? That info might recruit more help

    • diyrebel@lemmy.dbzer0.comOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      22 days ago

      The MCU is an ATmega32L, which seems to be well documented. I was able to fetch a 300+ page document and a 12 page overview of the specs.

      • j4k3@lemmy.world
        link
        fedilink
        English
        arrow-up
        3
        ·
        edit-2
        22 days ago
        That is a microcontroller. I don't have a lot of experience with hacking around with commercial products like this. It is not capable of running something like embedded Linux or a standard real time operating system. It could in theory be running Arduino, but probably not with the ISP connector. It was likely developed with the Microchip toolchain for ISP support. In this instance, the serial port is likely used for feedback only and not some interactive like interface. There may be some functionality programmed or the port may have been disabled after it was used in the jig that did the initial programming and tests.

        Most of the info you will find about using UART to hack around is related to embedded Linux where UART is a full terminal interface. Devices capable of running embedded Linux are the next level above something like an ESP32 microcontroller.

        The datasheet for the ATMega32L is not really relevant to you unless you want to rebuild the entire software stack from scratch. Reverse engineering and rebuilding is a major project even for a pro dev.

        The manufacturer as you perceive a brand is likely just a logistics and label maker with little to no relevance to your actual device. The components PR entire machine were likely out sourced and contract manufactured. In the world of contract manufactured goods, the software is usually just a checklist of features. As soon as the checklist is complete and verified, the dev gets paid and is never seen again. If the company wants something more or something changed it will cost a fortune for the old dev to return or a new one to read-in to the code and make changes or just start over. This kind of thing might happen when versioning is required because some part is not available or something like that during production runs.

        Each run is like a one time thing. Some capital is put together and parts are sourced, then the contractor makes like 5k of the thing over the course of 2 weeks. These go in shipping containers around the world with whatever brand stickers applied. When each run is done, the software and any unique tooling are consumables to the whole affair and not something anyone cares about or keeps like a real asset of value. It is probably archived in several private databases on various employee’s computers, but these are not connected in any way to aid or support your needs. It is in the best interest of the company to never give out this info.

        Also getting the whole microcontroller toolchain working is a major pain in the ass. That large datasheet is because a microcontroller is a whole computer all integrated into a single chip. It is just a very simple type of computer. You need to understand all of the peripheral parts of computing to really make sense of it in practice. It is not just a microprocessor.

        Embedded Linux means that there is a kernel and user space abstraction layer that makes the unique hardware irrelevant and abstracted away. Without this abstraction layer, you need to know and understand that actual hardware assembly, or use a language like C and a compiler for the actual hardware. This is the level you are facing.

        • diyrebel@lemmy.dbzer0.comOP
          link
          fedilink
          English
          arrow-up
          2
          ·
          edit-2
          21 days ago

          I don’t intend to modify the program. I am just looking to reset the state of the software to get it out of the fault state.

          Normally that can be done by using the buttons on the PCB to enter a secret combination code to:

          • enter diagnostic mode
          • run various functions/cycles which normally run as part of a program
          • see the error code
          • reset the board

          When the software detects a fault (such as a broken pump), it saves the error code. Then if you fix the pump, the software doesn’t know the pump has been fixed. So the board has to be reset to clear the error code.

          The button sequence codes are secret and known only to the manufacturer. They are very protectionist. In Europe, law requires them to make the codes available to other 3rd party technicians – but only in the 1st ten years and they can also charge a fee. Consumers get no access under any circumstances.

          My thought was theoretically a pro independent repair service would not want to pay every manufacturer for the secret info for every model they repair – so perhaps they would attach to the USART serial port and have a way to see errors and reset the board. But if it’s as you say, then the USART is disabled and useless to repairers. Which means I’m stuffed because I cannot buy a replacement card for my machine.

          If the serial port is not disabled, you conjecture that it is likely a read-only non-interactive mechanism. That still may be useful. I was able to find the secret button combination that is likely giving me an error code which I can guess the meaning of based on leaked docs for other models, but I’m not satisfied with that. I would be useful if I could get more verbose or supplemental info about the error state.

          There is some chatter about GE washing machines (not what I have) include an rj-45 port and that they released some kind of open source thing called the green bean which adapts USB to serial. On the one hand, it suggests that not all manufacturers intend to prevent communication with the PCB. OTOH, this actually seems to not be for service use but for sending notifications to the user.

          • j4k3@lemmy.world
            link
            fedilink
            English
            arrow-up
            2
            ·
            21 days ago
            In the last 10 years it has become common for devs to disable and remove or encrypt to prevent reverse engineering and repairs. If the device is older, the debug info from UART is likely still present. How useful that information is may be dubious.

            Most of what you are saying about functionality is not in line with how a microcontroller works. The microcontroller is not sophisticated like a real computer. It is like a very simple state machine running on the device. Basically it is like a single long script that is not running on any kind of abstraction layer. When the device is turned on, there is the script. The script is not executed by some operating system or any other complication. The entire software stack is the script. Only, the script is actual hardware registers and flags and operation codes.

            There is a tiny amount of RAM that actually runs the script, like 16kb. There is the programmed memory, like 32kb. Then there is a tiny amount of electrically erasable and reprogrammable memory, like 4kb. The eeprom is the only section of memory that a microcontroller can access to write persistent data, meaning something that will still be present when power is cycled.

            I doubt many people use eeprom to save any kind of error. This is more like where a serial number might be saved, or more likely the calibration values for some internal control algorithm. It is not anywhere near large enough to save something significant like the script.

            It is far more likely that the script is just a state machine and is reaching an error state because of some missing or bad signal that it needs to continue running the script. This is likely your problem and the issue is like 99% likely to be hardware and not the microcontroller.

            Resetting such a device is just an interrupt signal that restarts the script. There is nothing dynamic about this.

            From the perspective of the original dev, the ISP programming interface has the functionality of brute forcing running code to step through each machine instruction and halt further execution while monitoring or changing any value present in any register of the internal microprocessor. This is super powerful but requires a pricy setup from the manufacturer in most cases, and a thorough understanding of the internals of a microprocessor’s registers, op codes, memory addressing, interrupts, clocks, and the ALU (arithmetic logic unit).

            In this type of device, text strings are far too expensive to have much, if any, value. Any such debug info is likely to be very terse. Text strings take an enormous amount of memory space to encode, so in general, these are not used very much. A microcontroller is a far simpler device than this. It is a computer in the sense of a device that can do Input/Outputs and run a control algorithm like a temperature controller with a PID loop.

            • diyrebel@lemmy.dbzer0.comOP
              link
              fedilink
              English
              arrow-up
              2
              ·
              edit-2
              16 days ago

              I appreciate your insights but struggle to reconcile the following with what others say (youtubers and folks in an electronics chat room):

              I doubt many people use eeprom to save any kind of error. … It is far more likely that the script is just a state machine and is reaching an error state because of some missing or bad signal that it needs to continue running the script.

              I asked EE folks how would a controller board sense a fault? Does the controller take resistance measurements on the components? The answer was “highly unlikely - that would be far more sophisticated and costly than what would be realistic in a domestic washing machine”. They said fault detection is based on logic. E.g. if the tacho sensor does not have increasing feedback despite increasing power to the motor, then the controller can detect from that that there is a fault. Or if the water has been filling for a long time and the pressure sensor is not detecting a pressure increase, the machine would know from that activity that the inlet valve has a problem.

              You seem to suggest that the script reruns from a clean state every time and that a “bad signal” would be re-detected each run, which then implies that the machine would repeatedly attempt to fill with water, tumble, drain, etc. But that does not seem to be what I am seeing. The machine will be powered off & unplugged for days, and when powered on it instantly flashes that there is a fault (which is likely only known after attempting to run the various components). This is consistent with what a Youtuber said: the machine (not my particular model but speaking generally) stores the fault code. From there, the machine is trapped in that state until the error code is cleared by pressing a secret sequence of buttons.

              Some leaked tech docs for a different model (same make) mentioned that if a fault occurs 8 times, it then becomes stored in memory. This seems consistent with what I observed. I repeatedly attempted to run the machine. Not sure how many times. Motors would run, failure hits, and then it quits. After doing that so many times (which I regret), the behavior changed. Now the machine will not even attempt to run because it is apparently trapped in an error state.

              So everything seems to point to the error code being stored in EEPROM (which I believe is embedded in the ATmega32L chip). And not just the error code but apparently a count of failed attempts to run a program.

              • j4k3@lemmy.world
                link
                fedilink
                English
                arrow-up
                2
                ·
                edit-2
                15 days ago

                You are correct and I misspoke in my wording and stated logic. I had intended to constrain my logic to eliminating the potential of the entire script being stored or somehow altered and reloaded. Storing an error code is entirely feasible.

                Now the machine will not even attempt to run because it is apparently trapped in an error state.

                You might trace out the pins that go to any buttons. I am not super familiar with the 32L, but IIRC usually old Atmel chips only have a couple of hardware interrupts available.

                So, when a simple CPU core is running, there are various ways to force it to stop what it is doing and divert attention elsewhere for things that are more important. At the general level there are flags that can be set to indicate higher priority tasks need to be completed inside the CPU. This is stuff like a block of serial communication is received and needs to be processed so that the buffer doesn’t get too full. Or, some timer expired and triggered some code to run next.

                Hardware interrupts are like these flags but are usually setup as the highest priority interrupts in the physical hardware. Like a person can make any Input/Output capable pin into an interrupt by turning it into an input, and simply checking the state of that pin in the code that is running.

                However, the hardware interrupt is very powerful and forces the CPU to only pay attention to whatever code is associated exclusively with that interrupt. Typically in the code, one would only use this hardware interrupt to set a flag somewhere quickly and return to execution of whatever was happening. There are a lot of gotchas that need to be taken care of if one wants to do something more complicated because the hardware interrupt isn’t like multithreading code in a desktop CPU where all the registers and states are saved. This is like, stop in the middle of a word on the exact letter you are pronouncing mid sentence while talking to someone about something important the moment that interrupt happens.

                It is quite likely that the key combo to reset your device is related to one of the hardware interrupt pins. It would be reasonable in the code to check if another pin is low when the interrupt happens.

                You know the device can be reset by someone. It will be just a combination of keys. If there are a lot of keys, this should limit the number of possibilities to something manageable. Write this stuff down and test methodically.

                Also be sure to check Louis Rossman’s new documentation project website and do a search on the EEVBlog forum if you have not already done so.