Continuing from the previous post, there were two other issues I faced after installing Ubuntu:
- The battery drained super quickly despite less usage
- The laptop would get warm and remain warm even when the system was idle
I didn't get any warnings regarding battery life or heat issues from the family member I got the laptop from, neither was I using the laptop enough for it to be this warm or use this much power.
Solving this wasn't as straightforward as the fixing WiFi, becuase the searches didn't lead to any one single solution. So I had to debug my way through this somehow.
Finding a starting point
I started by finding ways to see the temperature of my laptop, for which I found a package called lm-sensors
. Before checking the temps, I ran sensors-detect
and selected all of the default options.
$ sudo apt install lm-sensors
$ sudo sensors-detect
When I ran sensors
for the first time, there was too much output and it barely made sense. I spent some time deciphering the output, and then came the second problem - I didn't know the ideal temperatures to know which ones were high.
So I tried a different approach. I decided to capture the sensors
output twice - once after boot, and one 30mins after that - and compare the two. In those 30mins, I tried to keep the system idle or used it minimally.
This approach worked, as I saw a difference in coretemp-isa-0000
, which shows the temperatures of the CPU cores:
Package id 0
refers to the temperature of the CPU as a whole, and there is a 10 degree increase in about 30mins, with little to no activity in that duration.
I was off to search again, and I landed with two possible causes of this:
- Some process is hogging CPU
- Bad power management of Linux on Macs
The first cause got eliminated pretty quickly, as I htop
didn't show any process with a high CPU usage, and the CPU usage was also fairly low overall. Bad power management was a very common issue reported in online forums, and I knew my machine worked fine on macOS, so this seemed like a valid cause.
One of the tools I came across to enable better power management was powertop
, that displays the energy usage of a system and offers default settings for better power management. I enabled the defaults using the --auto-tune
flag after installing.
$ sudo apt install powertop
$ sudo powertop --auto-tune
When powertop
is run without any flags, it runs in a similar fashion to top
, displaying the energy usage and other statistics that update in real time.
The battery reports a discharge rate of: 14.5 W
The energy consumed was : 325 J
The estimated remaining time is 2 hours, 6 minutes
Summary: 123.1 wakeups/second, 0.0 GPU ops/seconds, 0.0 VFS ops/sec and 3.1% CPU use
Usage Events/s Category Description
675.2 µs/s 46.6 Timer tick_sched_timer
0.8 ms/s 21.0 Interrupt [79] amdgpu
...
Some things stood out here:
- The battery discharge rate seemed high
- As a result, the energy consumption was also high
amdgpu
was second highest in the energy usage list
The appearance of amdgpu
seemed something to look into further, and saw that there was an option to disable it all together. I wasn't planning on doing any heavy-duty work on this machine, so it seemed like a reasonable solution if it would help reduce temperatures.
I started following this tutorial, which first checks if you have two graphics on your system or not.
$ lspci | grep VGA
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon RX 460/560D / Pro 450/455/460/555/555X/560/560X] (rev ff)
I don't see a second GPU, but this machine has two GPUs - an integrated GPU and a discrete/dedicated GPU.
OH WAIT, I found the root cause - the iGPU didn't get detected for whatever reason, and the dGPU is being used as the main graphics driver in its place. The dGPU uses a lot of power, which explains the the high energy usage in powertop
, the quick battery drain and the laptop getting warm!
Other people have also faced the same issue and have documented solutions for it, which I followed along.
Enabling the iGPU
The iGPU is not detected thanks to the way Apple's firmware works. If it recognizes that it's booting an OS other than macOS, it will power down some of the hardware, the iGPU being one of them. Thanks Apple!
The TL;DR solution to this is to make the firmware believe that it is booting macOS by running custom code before boot.
Step 1: Build the custom EFI code
The custom code is available in the apple_set_os.efi repository. All I had to do was build the file.
$ git clone https://github.com/0xbb/apple_set_os.efi
$ cd apple_set_os.efi
$ make
cc -I/usr/include/efi -I/usr/include/efi/x86_64 -DGNU_EFI_USE_MS_ABI -fPIC -fshort-wchar -ffreestanding -fno-stack-protector -maccumulate-outgoing-args -Wall -Dx86_64 -Werror -m64 -mno-red-zone -c -o apple_set_os.o apple_set_os.c
ld -T /usr/lib/elf_x86_64_efi.lds -Bsymbolic -shared -nostdlib -znocombreloc /usr/lib/crt0-efi-x86_64.o -o apple_set_os.so apple_set_os.o /usr/lib/gcc/x86_64-linux-gnu/11/libgcc.a \
/usr/lib/libgnuefi.a
objcopy -j .text -j .sdata -j .data -j .dynamic -j .dynsym -j .rel \
-j .rela -j .reloc -S --target=efi-app-x86_64 apple_set_os.so apple_set_os.efi
rm apple_set_os.o apple_set_os.so
Step 2: move the code to the boot partition
Next, the code needs to be in a location that is accessible during boot, aka the boot partition. I can put the code in /boot/efi/EFI
directly too, but the instructions I was following put this in a sub-directory called custom
instead.
$ sudo mkdir /boot/efi/EFI/custom
$ sudo cp apple_set_os.efi /boot/efi/EFI/custom
Step 3: Ask GRUB to run the code before boot
Placing the code in the boot partition alone isn't enough, I needed to add instructions to run the code before boot somewhere. That somewhere is the bootloader configuration, which in this case is GRUB. I added the following lines to a file created for users to add custom configurations: /etc/grub.d/40_custom
:
$ cat <<EOF >> /etc/grub.d/40_custom
search --no-floppy --set=root --file /EFI/custom/apple_set_os.efi
chainloader /EFI/custom/apple_set_os.efi
boot
EOF
The GRUB menu display was disabled on my machine. To be able to debug any issues on boot, I made the following changes to /etc/default/grub
:
# Comment the following line
# GRUB_TIMEOUT_STYLE=hidden
# Change the timeout value
GRUB_TIMEOUT=10
# Uncomment the following lines
GRUB_TERMINAL=console
GRUB_GFXMODE=640x480
Then I ran sudo update-grub
to save the changes.
Step 4: Switch to using the iGPU on boot
This is done using a shell script called gpu-switch that writes the required values to an EFI variable to use the iGPU. The changes were applied on the next boot, so I rebooted the machine.
$ git clone https://github.com/0xbb/gpu-switch
$ cd gpu-switch
$ sudo ./gpu-switch -i
$ sudo reboot now
After rebooting, the iGPU now appears in the lspci
output!
$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 630 (rev 04)
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon RX 460/560D / Pro 450/455/460/555/555X/560/560X] (rev ff)
Disable dGPU
The dGPU continued to run and warm up the laptop despite the iGPU being detected, so I disabled it with the following commands:
$ echo OFF | sudo tee /sys/kernel/debug/vgaswitcheroo/switch
$ sudo mobprobe -r amdgpu
And slowly, my laptop started to cool down. I checked the output of sensors
after a while, and the temperatures were MUCH lower than with the dGPU enabled:
coretemp-isa-0000
Adapter: ISA adapter
Package id 0: +42.0°C
...
The powertop
output also reflected this:
The battery reports a discharge rate of 7.60 W
The energy consumed was 151 J
The estimated remaining time is 8 hours, 35 minutes
Summary: 62.1 wakeups/second, 0.0 GPU ops/seconds, 0.0 VFS ops/sec and 0.7% CPU use
Usage Events/s Category Description
100.0% Device Audio codec hwC1D0: ATI
491.8 µs/s 27.3 Timer tick_sched_timer
...
The battery discharge rate and energy consumption values were lower, battery life became longer and amgdpu
no longer appeared at the top of the list!
Lastly, I created a systemd service to disable the dGPU on boot. Thanks to this, my machine remains cool throughout:
# disable-dgpu.service
[Unit]
Description=Disable discrete GPU
Before=display-manager.service
[Service]
Type=oneshot
ExecStart=/usr/sbin/modprobe amdgpu
ExecStart=/bin/sh -c 'echo OFF > /sys/kernel/debug/vgaswitcheroo/switch'
ExecStart=/usr/sbin/modprobe -r amdgpu
RemainAfterExit=yes
TimeoutSec=0
[Install]
WantedBy=multi-user.target
I remember being scared when I noticed these issues for the first time. I'd been used to things "just working" on macOS and Windows, and this was the opposite of that. Going from a feeling of fear to slowly gaining the courage to fix stuff has felt great. I think I'm less scared now.