How to get keys from pyspark SparseVector












1















I conducted a tf-idf transform and now I want to get the keys and values from the result.



I am using the following udf code to get values:



def extract_values_from_vector(vector):
return vector.values.tolist()

extract_values_from_vector_udf = udf(lambda vector:extract_values_from_vector(vector), ArrayType(DoubleType()))

extract = rescaledData.withColumn("extracted_keys", extract_keys_from_vector_udf("features"))


So if the sparsevector looks like:
features=SparseVector(123241, {20672: 4.4233, 37393: 0.0, 109847: 3.7096, 118474: 5.4042}))



extracted_keys in my extract will look like:
[4.4233, 0.0, 3.7096, 5.4042]



My question is, how can I get the keys in the SparseVector dictionary? Such as keys = [20672, 37393, 109847, 118474] ?



I am trying the following code but it won't work



def extract_keys_from_vector(vector):
return vector.indices.tolist()
extract_keys_from_vector_udf = spf.udf(lambda vector:extract_keys_from_vector(vector), ArrayType(DoubleType()))


The result it gave me is: [null,null,null,null]



Can someone help?
Many thanks in advance!










share|improve this question

























  • I don't think so, that's RDD

    – A story-teller
    Jan 2 at 4:11






  • 2





    @Astory-teller indices are integer values but your UDF returns an array of doubles. I guess you just want it to be of IntegerType()

    – Sergey Khudyakov
    Jan 2 at 14:39











  • @Sergey Khudyakov I think you are right! Do you want to answer the question and I can mark accept?

    – A story-teller
    Jan 3 at 20:13
















1















I conducted a tf-idf transform and now I want to get the keys and values from the result.



I am using the following udf code to get values:



def extract_values_from_vector(vector):
return vector.values.tolist()

extract_values_from_vector_udf = udf(lambda vector:extract_values_from_vector(vector), ArrayType(DoubleType()))

extract = rescaledData.withColumn("extracted_keys", extract_keys_from_vector_udf("features"))


So if the sparsevector looks like:
features=SparseVector(123241, {20672: 4.4233, 37393: 0.0, 109847: 3.7096, 118474: 5.4042}))



extracted_keys in my extract will look like:
[4.4233, 0.0, 3.7096, 5.4042]



My question is, how can I get the keys in the SparseVector dictionary? Such as keys = [20672, 37393, 109847, 118474] ?



I am trying the following code but it won't work



def extract_keys_from_vector(vector):
return vector.indices.tolist()
extract_keys_from_vector_udf = spf.udf(lambda vector:extract_keys_from_vector(vector), ArrayType(DoubleType()))


The result it gave me is: [null,null,null,null]



Can someone help?
Many thanks in advance!










share|improve this question

























  • I don't think so, that's RDD

    – A story-teller
    Jan 2 at 4:11






  • 2





    @Astory-teller indices are integer values but your UDF returns an array of doubles. I guess you just want it to be of IntegerType()

    – Sergey Khudyakov
    Jan 2 at 14:39











  • @Sergey Khudyakov I think you are right! Do you want to answer the question and I can mark accept?

    – A story-teller
    Jan 3 at 20:13














1












1








1








I conducted a tf-idf transform and now I want to get the keys and values from the result.



I am using the following udf code to get values:



def extract_values_from_vector(vector):
return vector.values.tolist()

extract_values_from_vector_udf = udf(lambda vector:extract_values_from_vector(vector), ArrayType(DoubleType()))

extract = rescaledData.withColumn("extracted_keys", extract_keys_from_vector_udf("features"))


So if the sparsevector looks like:
features=SparseVector(123241, {20672: 4.4233, 37393: 0.0, 109847: 3.7096, 118474: 5.4042}))



extracted_keys in my extract will look like:
[4.4233, 0.0, 3.7096, 5.4042]



My question is, how can I get the keys in the SparseVector dictionary? Such as keys = [20672, 37393, 109847, 118474] ?



I am trying the following code but it won't work



def extract_keys_from_vector(vector):
return vector.indices.tolist()
extract_keys_from_vector_udf = spf.udf(lambda vector:extract_keys_from_vector(vector), ArrayType(DoubleType()))


The result it gave me is: [null,null,null,null]



Can someone help?
Many thanks in advance!










share|improve this question
















I conducted a tf-idf transform and now I want to get the keys and values from the result.



I am using the following udf code to get values:



def extract_values_from_vector(vector):
return vector.values.tolist()

extract_values_from_vector_udf = udf(lambda vector:extract_values_from_vector(vector), ArrayType(DoubleType()))

extract = rescaledData.withColumn("extracted_keys", extract_keys_from_vector_udf("features"))


So if the sparsevector looks like:
features=SparseVector(123241, {20672: 4.4233, 37393: 0.0, 109847: 3.7096, 118474: 5.4042}))



extracted_keys in my extract will look like:
[4.4233, 0.0, 3.7096, 5.4042]



My question is, how can I get the keys in the SparseVector dictionary? Such as keys = [20672, 37393, 109847, 118474] ?



I am trying the following code but it won't work



def extract_keys_from_vector(vector):
return vector.indices.tolist()
extract_keys_from_vector_udf = spf.udf(lambda vector:extract_keys_from_vector(vector), ArrayType(DoubleType()))


The result it gave me is: [null,null,null,null]



Can someone help?
Many thanks in advance!







pyspark tf-idf






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 1 at 16:44







A story-teller

















asked Jan 1 at 16:34









A story-tellerA story-teller

347




347













  • I don't think so, that's RDD

    – A story-teller
    Jan 2 at 4:11






  • 2





    @Astory-teller indices are integer values but your UDF returns an array of doubles. I guess you just want it to be of IntegerType()

    – Sergey Khudyakov
    Jan 2 at 14:39











  • @Sergey Khudyakov I think you are right! Do you want to answer the question and I can mark accept?

    – A story-teller
    Jan 3 at 20:13



















  • I don't think so, that's RDD

    – A story-teller
    Jan 2 at 4:11






  • 2





    @Astory-teller indices are integer values but your UDF returns an array of doubles. I guess you just want it to be of IntegerType()

    – Sergey Khudyakov
    Jan 2 at 14:39











  • @Sergey Khudyakov I think you are right! Do you want to answer the question and I can mark accept?

    – A story-teller
    Jan 3 at 20:13

















I don't think so, that's RDD

– A story-teller
Jan 2 at 4:11





I don't think so, that's RDD

– A story-teller
Jan 2 at 4:11




2




2





@Astory-teller indices are integer values but your UDF returns an array of doubles. I guess you just want it to be of IntegerType()

– Sergey Khudyakov
Jan 2 at 14:39





@Astory-teller indices are integer values but your UDF returns an array of doubles. I guess you just want it to be of IntegerType()

– Sergey Khudyakov
Jan 2 at 14:39













@Sergey Khudyakov I think you are right! Do you want to answer the question and I can mark accept?

– A story-teller
Jan 3 at 20:13





@Sergey Khudyakov I think you are right! Do you want to answer the question and I can mark accept?

– A story-teller
Jan 3 at 20:13












0






active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53997118%2fhow-to-get-keys-from-pyspark-sparsevector%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53997118%2fhow-to-get-keys-from-pyspark-sparsevector%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Can a sorcerer learn a 5th-level spell early by creating spell slots using the Font of Magic feature?

Does disintegrating a polymorphed enemy still kill it after the 2018 errata?

A Topological Invariant for $pi_3(U(n))$