Windows Embedded Compact is NOT an RTOS!

August 30, 2018, 9:29 am

≫ Next: Windows Embedded Compact / Windows CE Documentation

≪ Previous: Finally, the new GuruCE iMX6 BSP release is out!

Introduction

There’s a bad bug in the Windows Embedded Compact scheduler code on WEC7 and WEC2013 (and possibly earlier versions as well) that causes the OS to become non-deterministic. This means WEC7 and WEC2013 can NOT be called a Real-Time Operating System as long as this bug is present.

We have reported this bug to Microsoft, together with the fix, but unfortunately Microsoft has responded with:

“This would impact the kernel and the risks to taking this on far outweigh the good that would come of it so in the interest of not destabilizing the product we have opted to forgo fixing this issue at this time.”

The fix we provided has been thoroughly tested by us and has shown the kernel works absolutely fine with it in place. The fix is inside PRIVATE code so it is really Microsoft who would need to update this code and provide updated binaries, if only to prevent a maintenance nightmare.

The bug

The CE kernel keeps a table containing all handles. Access to this kernel table is controlled by a locking mechanism. If a thread requests a handle the kernel would need to alter the handle table (to add the handle). In this case, the thread requesting a handle becomes a “writer” of the handle table.

If a thread is merely requesting an existing handle from the handle table without any need to alter the handle table, the thread requesting the existing handle becomes a “reader”.

“Writers” to the handle table acquire an exclusive lock whereas “readers” just increase the lock count.

If a “reader” tries to increase the lock count but the handle table is already owned and exclusively locked by a “writer”, the “reader” blocks until the “writer” releases the exclusive lock. In this case priority inversion should occur, so that the lower priority “writer” thread gets scheduled and will release the exclusive lock as soon as possible so that the higher priority “reader” thread can continue as soon as possible.

Due to a bug in the CE kernel handle table locking code, this priority inversion sometimes does not occur resulting in high priority threads to have to wait until many lower priority threads run their thread quantum.

This absolutely and completely breaks the real-time scheduling mechanism of Windows Embedded Compact.

The code acquiring the handle table lock requires two steps:

Acquire the lock
Set the owner of the lock

The problem is that the Microsoft kernel code does not make this an atomic operation, meaning that a low priority “writer” thread can be pre-empted by a high priority “reader” thread in between step 1 and 2. The resulting situation of that pre-emption is that the handle table is locked, but the scheduler has no record of which thread owns the lock, and thus cannot apply priority inversion. The higher priority “reader” thread now has to wait for the lower priority “writer” thread to be scheduled at its original low priority. If there are any other threads active in the system at equal or higher priority than the priority of the “writer” thread, it means the high priority “reader” thread will have to wait for all these lower priority threads to run before it is allowed to continue.

It is important to note that the handle table contains ALL handles and the lock is a table-wide lock, meaning that ANY handle access can cause this situation. One thread can be using a file system handle, while another thread uses a completely unrelated device driver handle. There is no need for both threads to try to access the same handle.

The bug commonly shows itself in real-world scenarios, but so far hasn’t been diagnosed properly. The reason is that the bug is caused by a locking mechanism completely hidden from the user and it is therefore extremely difficult to find the root cause of this bug. Application code will look completely correct and connecting the issue with another, apparently completely unrelated, active thread in the system that together are causing this issue to arise, is extremely difficult.

The proof

We have created example code that quickly and reliably shows that this bug occurs on all systems running the WEC7 or WEC2013 OS. We have not tried CE6 yet, but it is very likely this bug exists in that version of the OS as well, and likely also in earlier versions of the OS.

The example code consists of a very simple stream interface device driver and a test application that creates:

One low priority “writer” thread that gets, uses and releases a handle to the 1st instance of the device driver in a tight loop without yielding (so no sleep). This thread runs at priority 255 (idle).
One low priority “busy” thread per core, simply executing a tight loop for 100 ms without yielding (so no sleep and no use of any handle). These threads run at priority 255 (so the “writer” thread gets scheduled in round-robin fashion together with these “busy” threads).
One high priority “reader” thread that uses a handle to the 2nd instance of the device driver in a loop, sleeping for 1 ms before using the thread handle again. This thread runs at priority 0 (hard real-time).

The “reader” and “writer” threads call a function inside the device driver instance that stalls (runs a tight loop without yielding) for 100 us.

The busy threads are there to make sure all cores are busy all the time and the low priority “writer” thread gets scheduled in round-robin fashion together with these busy threads.

The high priority “reader” thread will pre-empt all of these low priority threads every 1 ms.

The high priority thread tracks how long the DeviceIoControl call to the device driver takes, and breaks out if it detects the bug situation. Below is the output of the test run on a Nitrogen6X iMX6 Quad with only a single core active. We only activated a single core to show that many other threads can be scheduled in succession before our high priority thread can be scheduled again. This is explained in more detail later. It is important to note that the bug also occurs when enabling all 4 cores (and we have a busy thread on each core).

\> rttapp Handle table lock mechanism bug test started at 12:06:10 on 2018/08/15 Low priority writer thread 1 started on processor 1 Low priority busy thread 1 started on processor 1 High priority reader thread 0 stall of 100 us took maximum 120 us. Low priority busy thread 2 started on processor 1 High priority reader thread 0 stall of 100 us took maximum 149 us. Low priority busy thread 3 started on processor 1 High priority reader thread 0 stall of 100 us took maximum 241 us. Low priority busy thread 4 started on processor 1 High priority reader thread 0 stall of 100 us took maximum 242 us. High priority reader thread 0 stall of 100 us took maximum 252 us. High priority reader thread 0 stall of 100 us took maximum 253 us. High priority reader thread 0 stall of 100 us took maximum 254 us. High priority reader thread 0 stall of 100 us took maximum 306 us. High priority reader thread 0 stall of 100 us took maximum 399156 us. Could not copy OSCapture.clg to \Release, error 3 ERROR: High priority reader thread 0 stall of 100 us took 399156 us! Handle table lock mechanism bug test ended at 12:07:18 on 2018/08/15 \>

It’s only a matter of time before the high priority “reader” thread pre-empts the low priority “writer” thread in between it acquiring the lock and setting itself as the owner of the lock. As you can see in the above example it took a little over a minute to show the bug, and the result is that our priority 0 real-time thread had to wait almost 400 ms before being scheduled again!

If the test application code detects the bug, it calls “OSCapture” with parameter “-c” so that OSCapture writes the CELog RAM buffer to a file in the root called “OSCapture.clg”. It will then try to copy this file to the \Release folder, which is only available in a KITL enabled kernel. We ran the above test on a Shipbuild kernel (so without KITL enabled), hence the copy to \Release failed. We used FTP to copy the OSCapture.clg file to our Shipbuild’s _FLATRELEASEDIR and opened the file from there with Kernel Tracker:

The above Kernel Tracker screenshot shows a “good” situation. As you can see, the low priority “writer” thread is running when it gets pre-empted by our high priority “reader” thread. In the good situation, the high priority “reader” thread tries to get the handle table lock, but the table is locked by the low priority “writer” thread and the owner thread is properly set. The kernel now inverts the priorities of the high priority “reader” thread and the low priority “writer” thread. Now the low priority “writer” thread runs at priority 0 and thus gets scheduled immediately so it can finish its work asap and release the handle table lock. As soon as it has released the handle table lock, the priorities are inverted again and our high priority “reader” gets the handle table lock and gets scheduled immediately. It now continues execution; real-time deterministic behaviour is guaranteed.

Theoretical total time it would take for the high priority “reader” thread to finish is:
(Execution time of high priority thread) + (Remaining execution time of low priority thread) + some overhead.

In a “bad” situation, the high priority “reader” thread preempts the low priority “writer” thread in between it getting the handle lock and setting itself as the owner of the lock:

The above Kernel Tracker screenshot shows a “bad” situation. As you can see, the low priority “writer” thread is running when it gets pre-empted by our high priority “reader” thread. In the bad situation, the pre-emption of the low priority “writer” thread happens in between it locking the table (step 1) and setting itself as the lock owner (step 2). Now the high priority “reader” thread wants to lock the handle table, but the table is locked by the low priority “writer” thread without the owner thread properly set. The kernel now has no idea which thread owns the lock and thus cannot invert priorities. The scheduler now simply schedules the next thread that is ready to run. This thread runs for an entire thread quantum as well as any other thread ready to run after that. In our case there are 4 busy threads that all run an entire thread quantum before our low priority “writer” thread gets scheduled again:

As soon as the low priority “writer” thread gets scheduled again, it resumes at step 2 and sets itself as owner of the handle table lock so now the scheduler knows which thread to invert the priority of. Immediately the low priority “writer” thread gets scheduled again so it can finish its work asap and release the lock. When that happens, the priorities are inverted again and our high priority “reader” thread can finally run, but only after having had to wait for all these other low priority threads to have run through their thread quantum.

As you can see; deterministic behaviour is completely broken and thus we cannot say Windows Embedded Compact is a real-time operating system with this bug in place.

Luckily, the kernel code containing the bug is available when you install the shared source. It is possible to clone, modify and build this kernel code so that locking the table and setting the owner becomes an atomic operation. With this fix in place, real-time deterministic behaviour is restored and Windows Embedded Compact can be labelled as an RTOS again.

Note: the GuruCE iMX6 BSP will include this fix in our upcoming release. Current customers can request this fix right away.

We are still hopeful Microsoft will change its opinion and release an update for WEC7 and WEC2013 to fix this issue. If that happens, we will update this article to reflect that.

The example code showing the kernel handle table locking bug and a pdf version of this article are available here.

↧

Windows Embedded Compact / Windows CE Documentation

November 13, 2018, 6:44 pm

≫ Next: TLS 1.2 support and fix for RTOS bug released for WEC2013

≪ Previous: Windows Embedded Compact is NOT an RTOS!

In their infinite wisdom, some team at Microsoft decided to move all the Windows Embedded Compact / Windows CE documentation to docs.microsoft.com. That's fine, but unfortunately they also decided this content is not allowed to be indexed by search engines anymore. The result is that searching for an API in Google or Bing does not return any link to the correct API documentation page anymore. We could live with that if the site itself would allow us to search, but unfortunately that functionality is broken now too. This makes finding Windows CE API documentation extremely difficult...

To make your life a bit easier, here are some links that should help you find what you are looking for a bit easier:

All CE documentation can be found here: https://docs.microsoft.com/en-us/previous-versions/windows/embedded/ee504814(v=winembedded.70)

Try to search for "CeSetThreadQuantum" using the search box on the top right to see that this functionality is broken now.

Without the search function working we have to manually click through the tree structure on the left to find what we need. Here are some handy links:

Kernel reference: https://docs.microsoft.com/en-us/previous-versions/windows/embedded/ee482973(v%3dwinembedded.80)

Kernel APIs: https://docs.microsoft.com/en-us/previous-versions/windows/embedded/ee482951(v%3dwinembedded.80)

Unfortunately, when you click on any API link on the above page, you will be redirected to the old Windows CE 6.0 documentation (and not to the expected WEC2013 documentation for the function you clicked).

To remedy this, manually change "winembedded.60" to "winembedded.80".

For example, clicking on the AllocPhysMem link will direct you here. Changing "60" to "80" will direct you to the expected page.

Most other API documentation can be found here: https://docs.microsoft.com/en-us/previous-versions/windows/embedded/ee488372(v%3dwinembedded.80)

Use the tree on the left to find your function.

To find documentation for Windows CE 6.0, change "80" in the urls above to "60". To find docs for Windows Embedded Compact 7, change "80" to "70".

For some reason parts of the CE 5.0, CE.NET 4.2 and CE 3.0 documentation pages are still on MSDN and haven't been moved to docs.microsoft.com. Probably somebody forgot to move these... The result is that searching for just the API in Google or Bing returns links to the CE 3.0 documentation. Surely not the intended result by the team that made the decision to move all documentation to a new place on the Microsoft servers...

We've of course reported these issues to Microsoft. We'll update this blog post if the situation changes...

↧

TLS 1.2 support and fix for RTOS bug released for WEC2013

November 28, 2018, 7:52 pm

≫ Next: CE Platform Abstraction Layer (CEPAL) for Windows 10 IoT

≪ Previous: Windows Embedded Compact / Windows CE Documentation

Microsoft has published the Wave 4 October 2018 update for WEC2013. This update adds support for TLS 1.2 in WEC2013 (WEC7 already had support added for TLS 1.2 in an earlier update) and fixes the kernel handle table locking mechanism bug (as described in this blog post):

Kernel Handle Table Locking Mechanism Bugfix

181031_KB4466833 - Implemented Kernel Fast-Lock sync optimization for Windows Embedded Compact 2013.

Note that you have to set a FIXUPVAR to enable the fix. Add the following line to your config.bib file (MEMORY section) to include the fix for WEC2013:

kernel.dll:dwSyncFastLock 0 0x0001 FIXUPVAR

Unfortunately, the fix for this bug hasn't been released for WEC7. We're hoping Microsoft will publish an update containing the fix for WEC7 in the near future (the GuruCE i.MX6 BSP of course contains the fix for WEC7 as well).

Transport Layer Security

181031_KB4467918 - Transport Layer Security (TLS) 1.1 and TLS 1.2 support added to Windows Embedded Compact 2013.

TLS 1.1

The following subkey controls the use of TLS 1.1:

[HKEY_LOCAL_MACHINE\Comm\SecurityProviders\SCHANNEL\Protocols\TLS 1.1]

To disable the TLS 1.1 protocol, you must create the Enabled DWORD entry in the appropriate subkey, and then change the DWORD value to 0. To re-enable the protocol, change the DWORD value to 1. By default, this entry does not exist in the registry.

Note To enable and negotiate TLS 1.1, you must create the DisabledByDefault DWORD entry in the appropriate subkey (Client, Server), and then change the DWORD value to 0.

TLS 1.2

The following subkey controls the use of TLS 1.2:

[HKEY_LOCAL_MACHINE\Comm\SecurityProviders\SCHANNEL\Protocols\TLS 1.2]

To disable the TLS 1.2 protocol, you must create the Enabled DWORD entry in the appropriate subkey, and then change the DWORD value to 0. To re-enable the protocol, change the DWORD value to 1. By default, this entry does not exist in the registry.

Note To enable and negotiate TLS 1.2, you must create the DisabledByDefault DWORD entry in the appropriate subkey (Client, Server), and then change the DWORD value to 0.

Warning The DisabledByDefault value in the registry keys under the Protocols key does not take precedence over the grbitEnabledProtocols value that is defined in the SCHANNEL_CRED structure that contains the data for an Schannel credential.

Note: Per the Request for Comments (RFC), the design implementation does not allow SSL2 and TLS 1.2 to be enabled at the same time.

Download locations

If you have access to DPC (device partner center) with the right "Embedded" permissions then you can download the Wave 4 update for WEC2013 here: https://devicepartner.microsoft.com/en-us/assets/detail/x21-98917-img.

If you don't have access to DPC, get the Wave 4 update for WEC2013 here: https://www.microsoft.com/en-us/download/details.aspx?id=42027

↧

CE Platform Abstraction Layer (CEPAL) for Windows 10 IoT

May 9, 2019, 10:21 pm

≫ Next: GuruCE iMX6 BSP release 2391

≪ Previous: TLS 1.2 support and fix for RTOS bug released for WEC2013

Since Microsoft Windows Embedded Compact is now officially out of its support life-cycle, many of our customers are looking to see what they can replace CE with. The choice is often between Linux/Android and Windows 10 IoT.

The main problem when moving to a new OS is the, often massive, code-base that needs to be ported to the new OS. In the case of Linux/Android this is more of a complete rewrite than a port and this was no different for Windows 10 IoT Core, until Microsoft announced the CE Platform Abstraction Layer (CEPAL) for Windows 10 IoT:

Microsoft has provided platforms and operating systems for embedded devices for decades. As new offerings such as Windows 10 IoT have become available, our customers and partners are increasingly interested in the advanced security, platform, and cloud connectivity features that these OS provide. Customers moving from most earlier editions of Windows, like Windows XP and Windows 7, can do so with little effort because of binary compatible applications. Other operating systems, like Windows CE, require device builders to modify source code. Porting applications like this can be challenging.

To help these customers move to Windows 10 IoT and harness the full power of the intelligent edge including artificial intelligence and machine learning, Microsoft is developing technology that will allow most customers to run their existing, unmodified Windows CE applications on Windows 10 IoT while they continue to invest in updating their applications. You can learn more about how this technology works in the recent IoT Show episode Modernizing Windows CE Devices.

Source:https://microsoft-ce-pal.com

Microsoft is gauging how much interest there is for this feature, and based on that interest will assign resources and even add more functionality to Windows 10 IoT Core to make it even easier for CE applications to run on Windows 10 IoT Core.

So, if you are interested in moving your CE application to Windows 10 IoT Core, please let Microsoft know. The more people that fill out the form, the easier it will get to move to Windows 10 IoT Core.

Now that we can run CE application binaries on Windows 10 IoT Core, without the need to port or even recompile code, choosing between Linux/Android or Windows 10 IoT Core as your next OS becomes a no-brainer.

Of course the experts of GuruCE are ready to help you move to Windows 10 IoT Core; contact us for more information.

Watch the Channel9 video:

↧

GuruCE iMX6 BSP release 2391

May 23, 2019, 10:12 pm

≫ Next: WEC7 December 2019 update woes...

≪ Previous: CE Platform Abstraction Layer (CEPAL) for Windows 10 IoT

iMX6
After many weeks of deep testing, GuruCE iMX6 BSP release 2391 is ready! This release adds some great new features, like High Assurance Boot and 100% flicker-free transition from bootloader splash to the CE desktop or your application, new drivers for displays, touch controllers, and RTCs, and many code improvements and fixes.

The release notes contain the full list of changes.

A picture is worth a million words, so imagine what a video can do...

For this release we have created a video showing how easy it is to work with our BSP and just how much functionality is included out-of-the-box:

Video table of contents

00:18 - SetWINCERoot
00:52 - BSP installation
01:12 - BSP structure
02:10 - OS Design Build Configurations
03:30 - BSP Catalog
06:30 - Change board and make catalog selections
07:58 - Board specific header file
09:28 - Reducing size of kernel image
11:27 - Processor specific catalog items
11:48 - Configuring hive-based registry and TexFAT
12:05 - Enabling & configuring High Assurance Boot
12:20 - Generating HAB certificates and keys
14:15 - Building the OS Design to generate HAB signed bootloader and kernel images
14:32 - Building a bootloader for the NXP Manufacturing Tool
15:00 - Configuring the NXP Manufacturing Tool
16:25 - Flashing the bootloader to SPI NOR Flash using Visual Studio
16:57 - Stopping Windows Mobile Device Connectivity services to allow download and debug over USB Serial
17:30 - Using CEWriter to flash bootloader, kernel and splash images to an SD card
18:58 - Configuring the bootloader
19:13 ---- Setting up the primary display
19:23 ---- Setting up the secondary display
19:29 ---- Setting up mirroring of the primary display on the secondary display
19:35 ---- Configuring Ethernet
19:48 - Checking HAB and burning fuses
21:31 - Booting to the CE desktop on a Nitrogen6X board
22:08 - Pocket CMD over UART (CLI, Command Line Interface)
22:15 - Included utilities
23:18 - CE Remote Desktop
23:52 - Switching between WEC7 and WEC2013
24:28 - Tiny OS Design (small headless configuration)
24:45 - Production Quality TEMPLATE driver, SDK and application code
25:50 - Configure the bootloader for download and KITL debug over USB Serial
26:56 - Setting breakpoints in driver source
27:12 - Preparing Visual Studio for driver debug and development
27:48 - Inspecting variable at runtime
28:18 - Unloading, modifying, rebuilding and loading drivers at runtime, without having to rebuild and download the entire kernel image

Our promise

We will keep improving our iMX6 BSP, adding new features and we will be supporting our customers for many years to come, at the very least until the end of Microsoft's extended support end date of 10 October 2023.

Even though the GuruCE i.MX6 BSP is already the best performing, 100% OAL stable and most feature-rich i.MX6 BSP on the market today, there are always things to improve or fix and new features to implement.

As always; if you want us to add some particular functionality, need any customization, driver or application development: contact us and we'll make it happen.

Don't forget to check our Testimonials page to see what some of our customers have to say about the GuruCE i.MX6 BSP.

Don't believe the hype? Try it yourself!

We've got free downloadable evaluation kernels for the Element14 RIoTboard, the Boundary Devices SABRE-Lite, Nitrogen6X and Nitrogen6_VM, the Device Solutions Opal6 (all variants), the Digi ConnectCore6, the NXP SDP (DualLite & Quad), the SDB-QP (QuadPlus), the NXP MCIMX6ULL EVK (ULL), the Toradex Colibri and the Variscite VAR-SOM_MX6 (Dual/Quad) Starter Kit.and more.

GuruCE website: https://guruce.com
iMX6 landing page: https://guruce.com/imx6
Latest iMX6 BSP release: https://guruce.com/imx6/latest

↧

WEC7 December 2019 update woes...

February 6, 2020, 12:24 pm

≫ Next: 17 consecutive years...

≪ Previous: GuruCE iMX6 BSP release 2391

Somebody at the Microsoft update team uploaded the WEC7 December 2019 update in the wrong format. If you try to update using WEDU it will fail, and if you download the file from the Device Partner Center site you will see it is in the .zip file format (instead of the normal .img ISO format). Note that you need to register and request "Indirect Embedded / IoT OEM" permissions to see all Embedded updates on the DPC site.

If you unpack the X22-25842.zip file you get these files:

X22-25842.zip ├───Layer0 │ CONTROL.DAT │ DDPID │ IMAGE.DAT │ MANIFEST.xml │ └───Layer1 CONTROL.DAT DDPID IMAGE.DAT MANIFEST.xml

That doesn't look like a normal update at all but in fact it is, just a little bit in disguise... It's in DDP (Disc Descriptor Protocol) format, a format used to describe a replication master. Looks like Microsoft just simply forgot to convert this into a normal .img ISO file.

The .img ISO file can be recreated by simply concatenating the two IMAGE.DAT files from the Layer0 and Layer1 folder. On a command prompt, type the following command:

copy /b Layer0\IMAGE.DAT+Layer1\IMAGE.DAT X22-25842.img

You can now mount the .img file as you normally do (we use the excellent Elorabytes Virtual CloneDrive freeware to easily mount .img files).

Now you can install the WEC7 December 2019 update. It should complete without any errors. If the concatenation didn't go well (or if you just mount the IMAGE.DAT file from the Layer0 folder) you will get errors like this:

Note that the WEC2013 December 2019 update is in the right (.img) format and can be installed as normal.

The issues have been reported to Microsoft, so hopefully it gets fixed soon.

↧

17 consecutive years...

August 26, 2020, 4:59 pm

≫ Next: WEC7 September 2020 update woes...

≪ Previous: WEC7 December 2019 update woes...

MVP award

↧

WEC7 September 2020 update woes...

February 26, 2021, 2:15 pm

≫ Next: Windows Embedded Compact / Windows CE Documentation problems - Part II

≪ Previous: 17 consecutive years...

Here we go again... Update 77 (WEC7 September 2020) is again in the DDP (Disc Descriptor Protocol) format.

Follow the instructions from this blog post to get a working .img file again...

Apparently this is a common problem for many Microsoft updates for a myriad of product, not just the updates for CE... MS/DPC has been notified again.

↧

Windows Embedded Compact / Windows CE Documentation problems - Part II

October 12, 2021, 4:55 pm

≫ Next: Bruce Eitman's Blog

≪ Previous: WEC7 September 2020 update woes...

As you have undoubtedly noticed, the Windows Embedded Compact documentation search function is again completely broken.

For instance, if you search for "OpenStore" or even "OpenStore Compact 2013", you'd expect the OpenStore API would be the first search result. Unfortunately, that is not the case. This makes searching for CE API documentation extremely difficult.

We have asked, pleaded and begged Microsoft (for more than 6 months) to fix this, but unfortunately without result. We have now opened a support case to see if that would finally move Microsoft to fix the documentation for Windows Embedded Compact.

We still don't have a fix, but there is a workaround: place your "search-term + compact 2013" inside quotes in the search box, like this: "openstore compact 2013". If you need Compact 7 documentation, of course replace "compact 2013" with "compact 7".

That way, the search results will show the expected documentation link high up in the ranking.

↧

Bruce Eitman's Blog

November 8, 2021, 6:21 pm

≫ Next: Error: failed PB timebomb check

≪ Previous: Windows Embedded Compact / Windows CE Documentation problems - Part II

In Feb 2021, Bruce Eitman's blog on GeeksWithBlogs.net went down with gwbs. It appears nobody was notified of gwbs going down, but luckily Bruce kept all his blog items backed up and all the articles have been accumulated and can be found in a PDF file at BlogAsBook.pdf on Bruce's OneDrive account.

Too good a resource to be lost!

↧

Error: failed PB timebomb check

December 2, 2021, 9:27 pm

≫ Next: iMX6 ENET RCR PADEN sillicon bug

≪ Previous: Bruce Eitman's Blog

iMX6
Today, 3 December 2021, Windows Embedded Compact 2013 decided enough is enough!

At least romimage.exe did...

According to Microsoft there should not be any time bomb checks in WEC2013. The last version to have time-bombs was WEC7, but today we find out this is not true. Note that this time-bomb happens for all WEC2013 installations, including the fully licensed ones.

New Zealand was the first to notice (hooray for me!), followed by Australia and by now I'm sure you on the other side of the world have found this blog post looking for a solution...

Well, here it is:

A bit of old-school hacking and cracking and here is the patched romimage.exe. A simple change of 'jg' to 'jmp' and we circumvent the time-bomb check.

Replace C:\WINCE800\public\common\oak\bin\i386\romimage.exe with the one in the below 7z package.

Enjoy!

PS. 7-zip password is 'guruce'.

PS2. I have opened a support case with Microsoft to get this issue resolved via an official update but, since this is 100% blocking any work you do with WEC2013, I think/hope MS will not sue me for cracking their time-bomb while we wait for the official fix...

Attachment	Size
romimage.7z	87.74 KB

↧

iMX6 ENET RCR PADEN sillicon bug

May 19, 2022, 8:40 pm

≫ Next: Windows Embedded Compact is NOT an RTOS... Again!

≪ Previous: Error: failed PB timebomb check

This is just a quick blog post to prevent anybody from having to spend days finding out why your Gbps connection on an iMX6 (or any iMX with a similar or same ENET/FEC module) stops working when stressing the network.

IEEE 802.11 requires all frames to be >= 64 bytes (or even >= 512 bytes on a Gbps network). If a small frame is received, the MAC can remove the padding bytes that make the frame up to 64 bytes. However, if a small frame has 0 data bytes, this can then result in an RxBD with the L and BDU bits NOT set (even though the attached buffer is bigger or equal than the maximum frame size).

If the bug happens, there is NO indication of an error. Nothing in EIR to indicate a problem and nothing in the Enhanced Buffer Descriptors either. Strangely enough, the RxBD's data length is always set to whatever you have set in MRBR[R_BUF_SIZE] and L and BDU are both 0. The ENET module can not recover from this situation. It really seems the uDMA has crashed and is not working properly anymore. Only a full reset of the ENET module allows it to work properly again.

Note that we ONLY see this issue when stress testing using Gigabit link speeds on active networks (where broadcast messages are flowing).

What we think happens is that small packets (usually broadcast packets) with a data size of 0 cause the ENET uDMA to crash if the PADEN bit (12) is set in RCR. This bit enables the MAC to remove padding bytes.

We know small packets are the culprit because stressing the network on an isolated network (witthout small broadcast packets) works perfectly fine. Also setting the BC_REJ bit (4) on a non-isolated active network works without problems (setting this bit results in the ENET MAC to reject all broadcast & multicast packets). Of course rejecting all broadcast and multicast packets is not a solution...

Workaround: DO NOT set bit 12 (PADEN) in RCR.

↧

Windows Embedded Compact is NOT an RTOS... Again!

August 1, 2022, 11:48 pm

≫ Next: The 'PBDbgPkg Package' did not load correctly

≪ Previous: iMX6 ENET RCR PADEN sillicon bug

iMX6 We just received a report from one of our customers that saw their high priority thread on WEC2013 not being scheduled for a long time, even though that thread was set to the highest priority in the entire system.

They saw behaviour very similar to what we described in this blog post. We investigated here by running our test code to show this bug on a tiny headless kernel with as little drivers included as possible and, to our grand surprise, it indeed showed the bug was back!

The customer was at Wave 5 update level. At GuruCE, we are always updated to the very latest (Wave 7 + March + June 2022) and we also saw the bug regression in WEC2013 (in WEC7 we fixed the issue ourselves at the time and that still worked). Microsoft fixed this bug in WEC2013 somewhere around November 2018 (see this blogpost), but apparently the bug regressed shortly thereafter through an update (Wave 5 or even earlier).

We have reported this issue again to Microsoft and in the mean time we have fixed this issue locally, now both for WEC7 and WEC2013, in our BSPs.

Customers of GuruCE can request this fix immediately through our normal support channels.

Anybody else that wants the fix for both WEC7 and WEC2013 can of course also contact us for support on this and other WEC related issues.

Bug image from Wikimedia Commons with reference to this meme

↧

The 'PBDbgPkg Package' did not load correctly

November 7, 2023, 1:51 pm

≫ Next: Congatec Product Change Notification

≪ Previous: Windows Embedded Compact is NOT an RTOS... Again!

It's getting harder and harder to find answers when you encounter issues after installing Windows Embedded and its tools. The above error stumped me for a while, until I rediscovered that the Platform Builder Debug Package requires the Microsoft Visual C++ 2008 Redistributable package to work properly.

The link (including the later release SP1) can be found here.

Install that and the error should disappear.

↧

Congatec Product Change Notification

April 15, 2024, 1:44 pm

≪ Previous: The 'PBDbgPkg Package' did not load correctly

iMX6
Congatec just released another PCN, telling customers they are updating their conga-QMX6 modules to the new hardware revision E.2. The changes are a different RTC chip (possibly, no details are given), different eMMC and a different Ethernet PHY.

The PCN states:

"Windows Embedded Compact (WEC7/WEC2013) and Android BSPs are not available for hardware revision E.2 and are therefore not supported."

But, GuruCE is happy to announce we of course still can support the new revision Congatec QMX6 for WEC7 and WEC2013 in our iMX6 BSP, also with these changes in hardware.

↧