Continuing from the previous post, there were two other issues I faced after installing Ubuntu:

I didn't get any warnings regarding battery life or heat issues from the family member I got the laptop from, neither was I using the laptop enough for it to be this warm or use this much power.

Solving this wasn't as straightforward as the fixing WiFi, becuase the searches didn't lead to any one single solution. So I had to debug my way through this somehow.

Finding a starting point

I started by finding ways to see the temperature of my laptop, for which I found a package called lm-sensors. Before checking the temps, I ran sensors-detect and selected all of the default options.

$ sudo apt install lm-sensors
$ sudo sensors-detect

When I ran sensors for the first time, there was too much output and it barely made sense. I spent some time deciphering the output, and then came the second problem - I didn't know the ideal temperatures to know which ones were high.

So I tried a different approach. I decided to capture the sensors output twice - once after boot, and one 30mins after that - and compare the two. In those 30mins, I tried to keep the system idle or used it minimally.

This approach worked, as I saw a difference in coretemp-isa-0000, which shows the temperatures of the CPU cores:

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +45.0°C
...
After boot
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +55.0°C
...
30mins after boot

Package id 0 refers to the temperature of the CPU as a whole, and there is a 10 degree increase in about 30mins, with little to no activity in that duration.

I was off to search again, and I landed with two possible causes of this:

  1. Some process is hogging CPU
  2. Bad power management of Linux on Macs

The first cause got eliminated pretty quickly, as I htop didn't show any process with a high CPU usage, and the CPU usage was also fairly low overall. Bad power management was a very common issue reported in online forums, and I knew my machine worked fine on macOS, so this seemed like a valid cause.

One of the tools I came across to enable better power management was powertop, that displays the energy usage of a system and offers default settings for better power management. I enabled the defaults using the --auto-tune flag after installing.

$ sudo apt install powertop
$ sudo powertop --auto-tune

When powertop is run without any flags, it runs in a similar fashion to top, displaying the energy usage and other statistics that update in real time.

The battery reports a discharge rate of:  14.5  W
The energy consumed was :  325  J
The estimated remaining time is 2 hours, 6 minutes

Summary: 123.1 wakeups/second,  0.0 GPU ops/seconds, 0.0 VFS ops/sec and 3.1% CPU use

            Usage       Events/s    Category       Description
        675.2 µs/s      46.6        Timer          tick_sched_timer
          0.8 ms/s      21.0        Interrupt      [79] amdgpu
...

Some things stood out here:

The appearance of amdgpu seemed something to look into further, and saw that there was an option to disable it all together. I wasn't planning on doing any heavy-duty work on this machine, so it seemed like a reasonable solution if it would help reduce temperatures.

I started following this tutorial, which first checks if you have two graphics on your system or not.

$ lspci | grep VGA
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon RX 460/560D / Pro 450/455/460/555/555X/560/560X] (rev ff)

I don't see a second GPU, but this machine has two GPUs - an integrated GPU and a discrete/dedicated GPU.

OH WAIT, I found the root cause - the iGPU didn't get detected for whatever reason, and the dGPU is being used as the main graphics driver in its place. The dGPU uses a lot of power, which explains the the high energy usage in powertop, the quick battery drain and the laptop getting warm!

Other people have also faced the same issue and have documented solutions for it, which I followed along.

Enabling the iGPU

The iGPU is not detected thanks to the way Apple's firmware works. If it recognizes that it's booting an OS other than macOS, it will power down some of the hardware, the iGPU being one of them. Thanks Apple!

The TL;DR solution to this is to make the firmware believe that it is booting macOS by running custom code before boot.

Step 1: Build the custom EFI code

The custom code is available in the apple_set_os.efi repository. All I had to do was build the file.

$ git clone https://github.com/0xbb/apple_set_os.efi
$ cd apple_set_os.efi
$ make
cc -I/usr/include/efi -I/usr/include/efi/x86_64 -DGNU_EFI_USE_MS_ABI -fPIC -fshort-wchar -ffreestanding -fno-stack-protector -maccumulate-outgoing-args -Wall -Dx86_64 -Werror -m64 -mno-red-zone   -c -o apple_set_os.o apple_set_os.c
ld -T /usr/lib/elf_x86_64_efi.lds -Bsymbolic -shared -nostdlib -znocombreloc /usr/lib/crt0-efi-x86_64.o -o apple_set_os.so apple_set_os.o /usr/lib/gcc/x86_64-linux-gnu/11/libgcc.a \
/usr/lib/libgnuefi.a
objcopy -j .text -j .sdata -j .data -j .dynamic -j .dynsym -j .rel \
        -j .rela -j .reloc -S --target=efi-app-x86_64 apple_set_os.so apple_set_os.efi
rm apple_set_os.o apple_set_os.so

Step 2: move the code to the boot partition

Next, the code needs to be in a location that is accessible during boot, aka the boot partition. I can put the code in /boot/efi/EFI directly too, but the instructions I was following put this in a sub-directory called custom instead.

$ sudo mkdir /boot/efi/EFI/custom
$ sudo cp apple_set_os.efi /boot/efi/EFI/custom

Step 3: Ask GRUB to run the code before boot

Placing the code in the boot partition alone isn't enough, I needed to add instructions to run the code before boot somewhere. That somewhere is the bootloader configuration, which in this case is GRUB. I added the following lines to a file created for users to add custom configurations: /etc/grub.d/40_custom:

$ cat <<EOF >> /etc/grub.d/40_custom
search --no-floppy --set=root --file /EFI/custom/apple_set_os.efi
chainloader /EFI/custom/apple_set_os.efi
boot
EOF

The GRUB menu display was disabled on my machine. To be able to debug any issues on boot, I made the following changes to /etc/default/grub:

# Comment the following line
# GRUB_TIMEOUT_STYLE=hidden

# Change the timeout value
GRUB_TIMEOUT=10

# Uncomment the following lines
GRUB_TERMINAL=console
GRUB_GFXMODE=640x480

Then I ran sudo update-grub to save the changes.

Step 4: Switch to using the iGPU on boot

This is done using a shell script called gpu-switch that writes the required values to an EFI variable to use the iGPU. The changes were applied on the next boot, so I rebooted the machine.

$ git clone https://github.com/0xbb/gpu-switch
$ cd gpu-switch
$ sudo ./gpu-switch -i
$ sudo reboot now

After rebooting, the iGPU now appears in the lspci output!

$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 630 (rev 04)
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon RX 460/560D / Pro 450/455/460/555/555X/560/560X] (rev ff)

Disable dGPU

The dGPU continued to run and warm up the laptop despite the iGPU being detected, so I disabled it with the following commands:

$ echo OFF | sudo tee /sys/kernel/debug/vgaswitcheroo/switch
$ sudo mobprobe -r amdgpu

And slowly, my laptop started to cool down. I checked the output of sensors after a while, and the temperatures were MUCH lower than with the dGPU enabled:

coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +42.0°C
...

The powertop output also reflected this:

The battery reports a discharge rate of 7.60 W
The energy consumed was 151 J
The estimated remaining time is 8 hours, 35 minutes

Summary: 62.1 wakeups/second,  0.0 GPU ops/seconds, 0.0 VFS ops/sec and 0.7% CPU use

            Usage       Events/s    Category       Description
        100.0%                      Device         Audio codec hwC1D0: ATI
        491.8 µs/s      27.3        Timer          tick_sched_timer
...

The battery discharge rate and energy consumption values were lower, battery life became longer and amgdpu no longer appeared at the top of the list!

Lastly, I created a systemd service to disable the dGPU on boot. Thanks to this, my machine remains cool throughout:

# disable-dgpu.service
[Unit]
Description=Disable discrete GPU
Before=display-manager.service

[Service]
Type=oneshot
ExecStart=/usr/sbin/modprobe amdgpu
ExecStart=/bin/sh -c 'echo OFF > /sys/kernel/debug/vgaswitcheroo/switch'
ExecStart=/usr/sbin/modprobe -r amdgpu
RemainAfterExit=yes
TimeoutSec=0

[Install]
WantedBy=multi-user.target

I remember being scared when I noticed these issues for the first time. I'd been used to things "just working" on macOS and Windows, and this was the opposite of that. Going from a feeling of fear to slowly gaining the courage to fix stuff has felt great. I think I'm less scared now.