Huffman Coding Using Doubly Linked List

I am developing an implementation of Huffman Compression in x64 MASM. My main function in C syntax would be:

void* huffCompress(void* lpDataStream, unsigned long long qwLength);

I am using Microsoft's Fastcall convention where the paramters are passing to the called function using RCX, RDX, R8 & R9 (anything else is on the stack).

I have written code which generates an array of size 256*3. Each 3 byte consists of:

as STRUCT

    word wFreq

    byte bSymbol

as ENDS

The code then iterates over the lpDataStream twice.

Initialising bSymbol with the index in the array

Incrementing wFreq for each byte

In the code below RBX is a pointer to the base of the array, RSI is a pointer to the lpDataStream and RCX is qwLength. RAX needs to be 0 before entering the below code.

_populateArray:

    lodsb

    lea rdx, qword ptr[rbx+rax*04h]

    sub rdx, rax

    inc word ptr[rdx]

loop _populateArray

The function then iterates over the array and for each element which has a wFreq which is not zero it subtracts 13h (19 bytes) from the stack. These 19 bytes are then populated as per the following structure.

nl STRUCT

    qword pFLink

    qword pBLink

    dword dwFreq

    byte  bSymbol

nl ENDS

The first two qwords are forward and backward links to the next and previous elements in the linked list.

xor rax, rax

mov r10, rsp

mov rcx, rax                    ;rcx = pFLink

mov rdx, rax                    ;rdx = pBLink

_generateLinkedList:

    cmp word ptr[rbx+rax], 0000h

je _notInitialised

    sub rsp, 13h

    ;write and setup pBLink for next entry

    mov qword ptr[rsp+pBLink], rdx

    mov rdx, rsp

    mov r8w, word ptr[rbx+rax]

    mov r9b, byte ptr[rbx+rax+02h]

    mov word ptr[rsp+wFreq], r8w        ;write dFreq to linked list

    mov byte ptr[rsp+bSymbol], r9b      ;write bSymbol to linked list

    cmp rcx, 00h                        ;if pFLink is not initialised it means this is the first entry

je _firstEntry

    mov qword ptr[rsp+13h], rsp         ;when rcx != 00h it means that there is an entry before this one "before" on stack remember

_firstEntry:

    mov rcx, rsp

_notInitialised:

    add rax, 03h

    cmp rax, 300h

jna _generateLinkedList

The above code works as desired, I have added it for completeness. RBX is still pointing to the array.

The function then runs bubble sort on the resulting linked list, sorting in ascending order. This means the element with no pBLink (no backward link) has the lowest wFreq. It is not a circular doubly linked list.

The next step of implementing Huffman Compression is to create a node with a wFreq equal to the sum of the two lowest frequencies.

My plan for this was to do the following:

Add the two lowest wFreq values together

Make pFLink in the lowest element NULL

Make both pFLink and pBLink is the second lowest (pHead->pFLink) NULL

Make more stack space (19 bytes) and add a node to the end (this involves finding the element with a NULL pFLink and changing it to the new node. Additionally the new node has the wFreq of the two lowest added together.

This is where the problem is. I need to make the pFLink and pBLink of the new node point to the two lowest elements (The nodes which had their pointers NULLED in step 2 & 3. However if I do this then the new node cannot be connect to the linked list.

EDIT: I think the above option would work, I would have to rework the sorting algorithm so that when a new node is added and needs to be sorted it is inserted inbetween elements. The code would realise it has found a "new node" because the pBLink wouldn't be correct. It would then know the next element is directly "above" (19 bytes above the "new node"). I think this could be done during the new node creation as the new node doesn't need to be placed in the new space on the stack it can be swapped with the element which is at it's desired place. It's a dirty workaround, if there is a better way I would love to hear it.

I had the idea that when the sorting function finds a node where the pBLink doesn't point to the previous element it should just move up the next element by subtracting 13h from it's pointer. This didn't work because the new nodes (those created by adding the two lowest frequencies together) could be anywhere after sorting.

Is there any way to overcome this without adding two more qwords to the nl structure?

edited Jan 1 at 20:36

asked Jan 1 at 15:05

Will

1256

Why you need to make the pFLink and pBLink of the new node point to the two lowest elements? A node in a linked list doesn't even need a pBLink unless you want to efficiently move backwards. Are you sure a linked list is what you need? a node cannot be in between two elements and at the end of the list at the same time. Maybe you need two lists?

– Margaret Bloom
Jan 1 at 21:14

@MargaretBloom I am reusing pFLink and pBlink to point to leaves of a node in the Huffman Tree. The code converts a linked list to a Huffman Tree. I included a pBLink because I need two pointers for the tree anyway so I may aswell use the second as a pBLink. I may traverse the linked list backwards when checking where a new node should be placed due to the fact that it's more likely the new node will need to be somewhere at the top (due to having a higher frequency). A nodes memory address and position in the linked list are independent.

– Will
Jan 1 at 21:23

Are you optimizing for code-size at the expense of speed? If not, avoid the slow loop instruction unless you're only tuning for AMD CPUs. Why is the loop instruction slow? Couldn't Intel have implemented it efficiently?. Also, lodsb is 3 uops, vs. 2 for movzx / inc rsi which avoids a false dependency on the old RAX. You can also do *3 as an index like lea edx, [rax+rax*2] / add [rbx + rdx], 1. (But an indexed add [mem], 1 will unlaminate into more uops than add [rdx],1 on Sandybridge-family.)

– Peter Cordes
Jan 1 at 22:20

@PeterCordes Neither at the moment. I will probably optimise for code size later. I never though of using "lea edx, [rax+rax*2]" that's really nice, thanks!

– Will
Jan 1 at 22:23

If you haven't seen that multiply-by-3 idiom before, there's probably a lot you can learn from looking at optimized compiler output. Write some of your functions in C (with function args and return values for inputs/outputs), then look at gcc and clang -O3 (pure speed including autovectorization) or -Os (code size and speed) output on godbolt.org. How to remove "noise" from GCC/clang assembly output?. If you really want to optimize for code-size, see Tips for golfing in x86/x64 machine code

– Peter Cordes
Jan 1 at 22:27

|
show 8 more comments

I am developing an implementation of Huffman Compression in x64 MASM. My main function in C syntax would be:

void* huffCompress(void* lpDataStream, unsigned long long qwLength);

I am using Microsoft's Fastcall convention where the paramters are passing to the called function using RCX, RDX, R8 & R9 (anything else is on the stack).

I have written code which generates an array of size 256*3. Each 3 byte consists of:

as STRUCT

    word wFreq

    byte bSymbol

as ENDS

The code then iterates over the lpDataStream twice.

Initialising bSymbol with the index in the array

Incrementing wFreq for each byte

In the code below RBX is a pointer to the base of the array, RSI is a pointer to the lpDataStream and RCX is qwLength. RAX needs to be 0 before entering the below code.

_populateArray:

    lodsb

    lea rdx, qword ptr[rbx+rax*04h]

    sub rdx, rax

    inc word ptr[rdx]

loop _populateArray

nl STRUCT

    qword pFLink

    qword pBLink

    dword dwFreq

    byte  bSymbol

nl ENDS

The first two qwords are forward and backward links to the next and previous elements in the linked list.

xor rax, rax

mov r10, rsp

mov rcx, rax                    ;rcx = pFLink

mov rdx, rax                    ;rdx = pBLink

_generateLinkedList:

    cmp word ptr[rbx+rax], 0000h

je _notInitialised

    sub rsp, 13h

    ;write and setup pBLink for next entry

    mov qword ptr[rsp+pBLink], rdx

    mov rdx, rsp

    mov r8w, word ptr[rbx+rax]

    mov r9b, byte ptr[rbx+rax+02h]

    mov word ptr[rsp+wFreq], r8w        ;write dFreq to linked list

    mov byte ptr[rsp+bSymbol], r9b      ;write bSymbol to linked list

    cmp rcx, 00h                        ;if pFLink is not initialised it means this is the first entry

je _firstEntry

    mov qword ptr[rsp+13h], rsp         ;when rcx != 00h it means that there is an entry before this one "before" on stack remember

_firstEntry:

    mov rcx, rsp

_notInitialised:

    add rax, 03h

    cmp rax, 300h

jna _generateLinkedList

The above code works as desired, I have added it for completeness. RBX is still pointing to the array.

The next step of implementing Huffman Compression is to create a node with a wFreq equal to the sum of the two lowest frequencies.

My plan for this was to do the following:

Add the two lowest wFreq values together

Make pFLink in the lowest element NULL

Make both pFLink and pBLink is the second lowest (pHead->pFLink) NULL

Make more stack space (19 bytes) and add a node to the end (this involves finding the element with a NULL pFLink and changing it to the new node. Additionally the new node has the wFreq of the two lowest added together.

Is there any way to overcome this without adding two more qwords to the nl structure?

edited Jan 1 at 20:36

asked Jan 1 at 15:05

Will

1256

Why you need to make the pFLink and pBLink of the new node point to the two lowest elements? A node in a linked list doesn't even need a pBLink unless you want to efficiently move backwards. Are you sure a linked list is what you need? a node cannot be in between two elements and at the end of the list at the same time. Maybe you need two lists?

– Margaret Bloom
Jan 1 at 21:14

@MargaretBloom I am reusing pFLink and pBlink to point to leaves of a node in the Huffman Tree. The code converts a linked list to a Huffman Tree. I included a pBLink because I need two pointers for the tree anyway so I may aswell use the second as a pBLink. I may traverse the linked list backwards when checking where a new node should be placed due to the fact that it's more likely the new node will need to be somewhere at the top (due to having a higher frequency). A nodes memory address and position in the linked list are independent.

– Will
Jan 1 at 21:23

Are you optimizing for code-size at the expense of speed? If not, avoid the slow loop instruction unless you're only tuning for AMD CPUs. Why is the loop instruction slow? Couldn't Intel have implemented it efficiently?. Also, lodsb is 3 uops, vs. 2 for movzx / inc rsi which avoids a false dependency on the old RAX. You can also do *3 as an index like lea edx, [rax+rax*2] / add [rbx + rdx], 1. (But an indexed add [mem], 1 will unlaminate into more uops than add [rdx],1 on Sandybridge-family.)

– Peter Cordes
Jan 1 at 22:20

@PeterCordes Neither at the moment. I will probably optimise for code size later. I never though of using "lea edx, [rax+rax*2]" that's really nice, thanks!

– Will
Jan 1 at 22:23

If you haven't seen that multiply-by-3 idiom before, there's probably a lot you can learn from looking at optimized compiler output. Write some of your functions in C (with function args and return values for inputs/outputs), then look at gcc and clang -O3 (pure speed including autovectorization) or -Os (code size and speed) output on godbolt.org. How to remove "noise" from GCC/clang assembly output?. If you really want to optimize for code-size, see Tips for golfing in x86/x64 machine code

– Peter Cordes
Jan 1 at 22:27

|
show 8 more comments

I am developing an implementation of Huffman Compression in x64 MASM. My main function in C syntax would be:

void* huffCompress(void* lpDataStream, unsigned long long qwLength);

I am using Microsoft's Fastcall convention where the paramters are passing to the called function using RCX, RDX, R8 & R9 (anything else is on the stack).

I have written code which generates an array of size 256*3. Each 3 byte consists of:

as STRUCT

    word wFreq

    byte bSymbol

as ENDS

The code then iterates over the lpDataStream twice.

Initialising bSymbol with the index in the array

Incrementing wFreq for each byte

In the code below RBX is a pointer to the base of the array, RSI is a pointer to the lpDataStream and RCX is qwLength. RAX needs to be 0 before entering the below code.

_populateArray:

    lodsb

    lea rdx, qword ptr[rbx+rax*04h]

    sub rdx, rax

    inc word ptr[rdx]

loop _populateArray

nl STRUCT

    qword pFLink

    qword pBLink

    dword dwFreq

    byte  bSymbol

nl ENDS

The first two qwords are forward and backward links to the next and previous elements in the linked list.

xor rax, rax

mov r10, rsp

mov rcx, rax                    ;rcx = pFLink

mov rdx, rax                    ;rdx = pBLink

_generateLinkedList:

    cmp word ptr[rbx+rax], 0000h

je _notInitialised

    sub rsp, 13h

    ;write and setup pBLink for next entry

    mov qword ptr[rsp+pBLink], rdx

    mov rdx, rsp

    mov r8w, word ptr[rbx+rax]

    mov r9b, byte ptr[rbx+rax+02h]

    mov word ptr[rsp+wFreq], r8w        ;write dFreq to linked list

    mov byte ptr[rsp+bSymbol], r9b      ;write bSymbol to linked list

    cmp rcx, 00h                        ;if pFLink is not initialised it means this is the first entry

je _firstEntry

    mov qword ptr[rsp+13h], rsp         ;when rcx != 00h it means that there is an entry before this one "before" on stack remember

_firstEntry:

    mov rcx, rsp

_notInitialised:

    add rax, 03h

    cmp rax, 300h

jna _generateLinkedList

The above code works as desired, I have added it for completeness. RBX is still pointing to the array.

The next step of implementing Huffman Compression is to create a node with a wFreq equal to the sum of the two lowest frequencies.

My plan for this was to do the following:

Add the two lowest wFreq values together

Make pFLink in the lowest element NULL

Make both pFLink and pBLink is the second lowest (pHead->pFLink) NULL

Make more stack space (19 bytes) and add a node to the end (this involves finding the element with a NULL pFLink and changing it to the new node. Additionally the new node has the wFreq of the two lowest added together.

Is there any way to overcome this without adding two more qwords to the nl structure?

edited Jan 1 at 20:36

asked Jan 1 at 15:05

Will

1256

I am developing an implementation of Huffman Compression in x64 MASM. My main function in C syntax would be:

void* huffCompress(void* lpDataStream, unsigned long long qwLength);

I am using Microsoft's Fastcall convention where the paramters are passing to the called function using RCX, RDX, R8 & R9 (anything else is on the stack).

I have written code which generates an array of size 256*3. Each 3 byte consists of:

as STRUCT

    word wFreq

    byte bSymbol

as ENDS

The code then iterates over the lpDataStream twice.

Initialising bSymbol with the index in the array

Incrementing wFreq for each byte

In the code below RBX is a pointer to the base of the array, RSI is a pointer to the lpDataStream and RCX is qwLength. RAX needs to be 0 before entering the below code.

_populateArray:

    lodsb

    lea rdx, qword ptr[rbx+rax*04h]

    sub rdx, rax

    inc word ptr[rdx]

loop _populateArray

nl STRUCT

    qword pFLink

    qword pBLink

    dword dwFreq

    byte  bSymbol

nl ENDS

The first two qwords are forward and backward links to the next and previous elements in the linked list.

xor rax, rax

mov r10, rsp

mov rcx, rax                    ;rcx = pFLink

mov rdx, rax                    ;rdx = pBLink

_generateLinkedList:

    cmp word ptr[rbx+rax], 0000h

je _notInitialised

    sub rsp, 13h

    ;write and setup pBLink for next entry

    mov qword ptr[rsp+pBLink], rdx

    mov rdx, rsp

    mov r8w, word ptr[rbx+rax]

    mov r9b, byte ptr[rbx+rax+02h]

    mov word ptr[rsp+wFreq], r8w        ;write dFreq to linked list

    mov byte ptr[rsp+bSymbol], r9b      ;write bSymbol to linked list

    cmp rcx, 00h                        ;if pFLink is not initialised it means this is the first entry

je _firstEntry

    mov qword ptr[rsp+13h], rsp         ;when rcx != 00h it means that there is an entry before this one "before" on stack remember

_firstEntry:

    mov rcx, rsp

_notInitialised:

    add rax, 03h

    cmp rax, 300h

jna _generateLinkedList

The above code works as desired, I have added it for completeness. RBX is still pointing to the array.

The next step of implementing Huffman Compression is to create a node with a wFreq equal to the sum of the two lowest frequencies.

My plan for this was to do the following:

Add the two lowest wFreq values together

Make pFLink in the lowest element NULL

Make both pFLink and pBLink is the second lowest (pHead->pFLink) NULL

Make more stack space (19 bytes) and add a node to the end (this involves finding the element with a NULL pFLink and changing it to the new node. Additionally the new node has the wFreq of the two lowest added together.

Is there any way to overcome this without adding two more qwords to the nl structure?

arrays assembly masm huffman-code

edited Jan 1 at 20:36

asked Jan 1 at 15:05

Will

1256

edited Jan 1 at 20:36

asked Jan 1 at 15:05

Will

1256

edited Jan 1 at 20:36

asked Jan 1 at 15:05

Will

1256

asked Jan 1 at 15:05

Will

1256

asked Jan 1 at 15:05

Will

1256

Why you need to make the pFLink and pBLink of the new node point to the two lowest elements? A node in a linked list doesn't even need a pBLink unless you want to efficiently move backwards. Are you sure a linked list is what you need? a node cannot be in between two elements and at the end of the list at the same time. Maybe you need two lists?

– Margaret Bloom
Jan 1 at 21:14

@MargaretBloom I am reusing pFLink and pBlink to point to leaves of a node in the Huffman Tree. The code converts a linked list to a Huffman Tree. I included a pBLink because I need two pointers for the tree anyway so I may aswell use the second as a pBLink. I may traverse the linked list backwards when checking where a new node should be placed due to the fact that it's more likely the new node will need to be somewhere at the top (due to having a higher frequency). A nodes memory address and position in the linked list are independent.

– Will
Jan 1 at 21:23

Are you optimizing for code-size at the expense of speed? If not, avoid the slow loop instruction unless you're only tuning for AMD CPUs. Why is the loop instruction slow? Couldn't Intel have implemented it efficiently?. Also, lodsb is 3 uops, vs. 2 for movzx / inc rsi which avoids a false dependency on the old RAX. You can also do *3 as an index like lea edx, [rax+rax*2] / add [rbx + rdx], 1. (But an indexed add [mem], 1 will unlaminate into more uops than add [rdx],1 on Sandybridge-family.)

– Peter Cordes
Jan 1 at 22:20

@PeterCordes Neither at the moment. I will probably optimise for code size later. I never though of using "lea edx, [rax+rax*2]" that's really nice, thanks!

– Will
Jan 1 at 22:23

If you haven't seen that multiply-by-3 idiom before, there's probably a lot you can learn from looking at optimized compiler output. Write some of your functions in C (with function args and return values for inputs/outputs), then look at gcc and clang -O3 (pure speed including autovectorization) or -Os (code size and speed) output on godbolt.org. How to remove "noise" from GCC/clang assembly output?. If you really want to optimize for code-size, see Tips for golfing in x86/x64 machine code

– Peter Cordes
Jan 1 at 22:27

|
show 8 more comments

Why you need to make the pFLink and pBLink of the new node point to the two lowest elements? A node in a linked list doesn't even need a pBLink unless you want to efficiently move backwards. Are you sure a linked list is what you need? a node cannot be in between two elements and at the end of the list at the same time. Maybe you need two lists?

– Margaret Bloom
Jan 1 at 21:14

@MargaretBloom I am reusing pFLink and pBlink to point to leaves of a node in the Huffman Tree. The code converts a linked list to a Huffman Tree. I included a pBLink because I need two pointers for the tree anyway so I may aswell use the second as a pBLink. I may traverse the linked list backwards when checking where a new node should be placed due to the fact that it's more likely the new node will need to be somewhere at the top (due to having a higher frequency). A nodes memory address and position in the linked list are independent.

– Will
Jan 1 at 21:23

Are you optimizing for code-size at the expense of speed? If not, avoid the slow loop instruction unless you're only tuning for AMD CPUs. Why is the loop instruction slow? Couldn't Intel have implemented it efficiently?. Also, lodsb is 3 uops, vs. 2 for movzx / inc rsi which avoids a false dependency on the old RAX. You can also do *3 as an index like lea edx, [rax+rax*2] / add [rbx + rdx], 1. (But an indexed add [mem], 1 will unlaminate into more uops than add [rdx],1 on Sandybridge-family.)

– Peter Cordes
Jan 1 at 22:20

@PeterCordes Neither at the moment. I will probably optimise for code size later. I never though of using "lea edx, [rax+rax*2]" that's really nice, thanks!

– Will
Jan 1 at 22:23

If you haven't seen that multiply-by-3 idiom before, there's probably a lot you can learn from looking at optimized compiler output. Write some of your functions in C (with function args and return values for inputs/outputs), then look at gcc and clang -O3 (pure speed including autovectorization) or -Os (code size and speed) output on godbolt.org. How to remove "noise" from GCC/clang assembly output?. If you really want to optimize for code-size, see Tips for golfing in x86/x64 machine code

– Peter Cordes
Jan 1 at 22:27

Why you need to make the pFLink and pBLink of the new node point to the two lowest elements? A node in a linked list doesn't even need a pBLink unless you want to efficiently move backwards. Are you sure a linked list is what you need? a node cannot be in between two elements and at the end of the list at the same time. Maybe you need two lists?

– Margaret Bloom
Jan 1 at 21:14

@MargaretBloom I am reusing pFLink and pBlink to point to leaves of a node in the Huffman Tree. The code converts a linked list to a Huffman Tree. I included a pBLink because I need two pointers for the tree anyway so I may aswell use the second as a pBLink. I may traverse the linked list backwards when checking where a new node should be placed due to the fact that it's more likely the new node will need to be somewhere at the top (due to having a higher frequency). A nodes memory address and position in the linked list are independent.

– Will
Jan 1 at 21:23

Are you optimizing for code-size at the expense of speed? If not, avoid the slow loop instruction unless you're only tuning for AMD CPUs. Why is the loop instruction slow? Couldn't Intel have implemented it efficiently?. Also, lodsb is 3 uops, vs. 2 for movzx / inc rsi which avoids a false dependency on the old RAX. You can also do *3 as an index like lea edx, [rax+rax*2] / add [rbx + rdx], 1. (But an indexed add [mem], 1 will unlaminate into more uops than add [rdx],1 on Sandybridge-family.)

– Peter Cordes
Jan 1 at 22:20

@PeterCordes Neither at the moment. I will probably optimise for code size later. I never though of using "lea edx, [rax+rax*2]" that's really nice, thanks!

– Will
Jan 1 at 22:23

If you haven't seen that multiply-by-3 idiom before, there's probably a lot you can learn from looking at optimized compiler output. Write some of your functions in C (with function args and return values for inputs/outputs), then look at gcc and clang -O3 (pure speed including autovectorization) or -Os (code size and speed) output on godbolt.org. How to remove "noise" from GCC/clang assembly output?. If you really want to optimize for code-size, see Tips for golfing in x86/x64 machine code

– Peter Cordes
Jan 1 at 22:27

|
show 8 more comments

1 Answer
1

active

oldest

votes

If you are interested in speed, then you need to start with carefully considered algorithm choices before even remotely considering writing it in assembler. The correct choice of algorithms will make order of magnitude improvements. Then writing it in assembler may give you small factor improvements.

In particular you need a much better choice for sorting, likely quicksort in this case, and you can do the Huffman coding much more simply, in place, without re-sorting linked lists.

answered Jan 4 at 19:28

Mark Adler

59k865112

Moreover, writing in asm will only help if you're better at it than a compiler (unless the goal is to learn asm, rather than to create a fast and/or small implementation right away.) Copying around 3-byte objects with separate word + byte stores doesn't look like a good start. Like I suggested in comments, storing the symbol+count in a 4-byte object lets you compare the whole thing as an integer (x86 is little endian, so put the symbol in the low byte, count at the top). Sorting on (count << 16) | symbol means you don't need to extract the count into a separate register to compare it.

– Peter Cordes
Jan 4 at 20:41

I am writing this in assembler because I enjoy working with it. It is useful for me to learn as I enjoy CTFs and reversing. Speed is only a minor concern to me - learning assembly and how algorithms and software are implemented is my goal. I understand there are many improvements that could be made. I am using bubble sort in this instance because my understanding of Quick Sort is limited. As far as I know it requires recursion? This would probably end up impacting my programs stack pointer therefore messing up my linked list/array which are stored there.

– Will
Jan 4 at 21:05

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53996515%2fhuffman-coding-using-doubly-linked-list%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

In particular you need a much better choice for sorting, likely quicksort in this case, and you can do the Huffman coding much more simply, in place, without re-sorting linked lists.

answered Jan 4 at 19:28

Mark Adler

59k865112

Moreover, writing in asm will only help if you're better at it than a compiler (unless the goal is to learn asm, rather than to create a fast and/or small implementation right away.) Copying around 3-byte objects with separate word + byte stores doesn't look like a good start. Like I suggested in comments, storing the symbol+count in a 4-byte object lets you compare the whole thing as an integer (x86 is little endian, so put the symbol in the low byte, count at the top). Sorting on (count << 16) | symbol means you don't need to extract the count into a separate register to compare it.

– Peter Cordes
Jan 4 at 20:41

I am writing this in assembler because I enjoy working with it. It is useful for me to learn as I enjoy CTFs and reversing. Speed is only a minor concern to me - learning assembly and how algorithms and software are implemented is my goal. I understand there are many improvements that could be made. I am using bubble sort in this instance because my understanding of Quick Sort is limited. As far as I know it requires recursion? This would probably end up impacting my programs stack pointer therefore messing up my linked list/array which are stored there.

– Will
Jan 4 at 21:05

add a comment |

In particular you need a much better choice for sorting, likely quicksort in this case, and you can do the Huffman coding much more simply, in place, without re-sorting linked lists.

answered Jan 4 at 19:28

Mark Adler

59k865112

Moreover, writing in asm will only help if you're better at it than a compiler (unless the goal is to learn asm, rather than to create a fast and/or small implementation right away.) Copying around 3-byte objects with separate word + byte stores doesn't look like a good start. Like I suggested in comments, storing the symbol+count in a 4-byte object lets you compare the whole thing as an integer (x86 is little endian, so put the symbol in the low byte, count at the top). Sorting on (count << 16) | symbol means you don't need to extract the count into a separate register to compare it.

– Peter Cordes
Jan 4 at 20:41

I am writing this in assembler because I enjoy working with it. It is useful for me to learn as I enjoy CTFs and reversing. Speed is only a minor concern to me - learning assembly and how algorithms and software are implemented is my goal. I understand there are many improvements that could be made. I am using bubble sort in this instance because my understanding of Quick Sort is limited. As far as I know it requires recursion? This would probably end up impacting my programs stack pointer therefore messing up my linked list/array which are stored there.

– Will
Jan 4 at 21:05

add a comment |

In particular you need a much better choice for sorting, likely quicksort in this case, and you can do the Huffman coding much more simply, in place, without re-sorting linked lists.

answered Jan 4 at 19:28

Mark Adler

59k865112

In particular you need a much better choice for sorting, likely quicksort in this case, and you can do the Huffman coding much more simply, in place, without re-sorting linked lists.

answered Jan 4 at 19:28

Mark Adler

59k865112

answered Jan 4 at 19:28

Mark Adler

59k865112

answered Jan 4 at 19:28

Mark Adler

59k865112

answered Jan 4 at 19:28

Mark Adler

59k865112

Moreover, writing in asm will only help if you're better at it than a compiler (unless the goal is to learn asm, rather than to create a fast and/or small implementation right away.) Copying around 3-byte objects with separate word + byte stores doesn't look like a good start. Like I suggested in comments, storing the symbol+count in a 4-byte object lets you compare the whole thing as an integer (x86 is little endian, so put the symbol in the low byte, count at the top). Sorting on (count << 16) | symbol means you don't need to extract the count into a separate register to compare it.

– Peter Cordes
Jan 4 at 20:41

I am writing this in assembler because I enjoy working with it. It is useful for me to learn as I enjoy CTFs and reversing. Speed is only a minor concern to me - learning assembly and how algorithms and software are implemented is my goal. I understand there are many improvements that could be made. I am using bubble sort in this instance because my understanding of Quick Sort is limited. As far as I know it requires recursion? This would probably end up impacting my programs stack pointer therefore messing up my linked list/array which are stored there.

– Will
Jan 4 at 21:05

add a comment |

Moreover, writing in asm will only help if you're better at it than a compiler (unless the goal is to learn asm, rather than to create a fast and/or small implementation right away.) Copying around 3-byte objects with separate word + byte stores doesn't look like a good start. Like I suggested in comments, storing the symbol+count in a 4-byte object lets you compare the whole thing as an integer (x86 is little endian, so put the symbol in the low byte, count at the top). Sorting on (count << 16) | symbol means you don't need to extract the count into a separate register to compare it.

– Peter Cordes
Jan 4 at 20:41

I am writing this in assembler because I enjoy working with it. It is useful for me to learn as I enjoy CTFs and reversing. Speed is only a minor concern to me - learning assembly and how algorithms and software are implemented is my goal. I understand there are many improvements that could be made. I am using bubble sort in this instance because my understanding of Quick Sort is limited. As far as I know it requires recursion? This would probably end up impacting my programs stack pointer therefore messing up my linked list/array which are stored there.

– Will
Jan 4 at 21:05

Moreover, writing in asm will only help if you're better at it than a compiler (unless the goal is to learn asm, rather than to create a fast and/or small implementation right away.) Copying around 3-byte objects with separate word + byte stores doesn't look like a good start. Like I suggested in comments, storing the symbol+count in a 4-byte object lets you compare the whole thing as an integer (x86 is little endian, so put the symbol in the low byte, count at the top). Sorting on (count << 16) | symbol means you don't need to extract the count into a separate register to compare it.

– Peter Cordes
Jan 4 at 20:41

I am writing this in assembler because I enjoy working with it. It is useful for me to learn as I enjoy CTFs and reversing. Speed is only a minor concern to me - learning assembly and how algorithms and software are implemented is my goal. I understand there are many improvements that could be made. I am using bubble sort in this instance because my understanding of Quick Sort is limited. As far as I know it requires recursion? This would probably end up impacting my programs stack pointer therefore messing up my linked list/array which are stored there.

– Will
Jan 4 at 21:05

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Search This Blog

Ufyukyu