VectorType for StructType in Pyspark Schema

Multi tool use
Multi tool use
The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP


VectorType for StructType in Pyspark Schema



I'm reading a parquet file that has the following schema:


df.printSchema()

root
|-- time: integer (nullable = true)
|-- amountRange: integer (nullable = true)
|-- label: integer (nullable = true)
|-- pcaVector: vector (nullable = true)



Now I want to test Pyspark structured streaming and I want to use the same parquet files. The closest schema that I was able to create was using ArrayType, but it doesn't work:


schema = StructType(
[
StructField('time', IntegerType()),
StructField('amountRange', IntegerType()),
StructField('label', IntegerType()),
StructField('pcaVector', ArrayType(FloatType()))

]
)
df_stream = spark.readStream
.format("parquet")
.schema(schema)
.load("/home/user/test_arch/data/fraud/")

Caused by: java.lang.ClassCastException: Expected instance of group converter but got "org.apache.spark.sql.execution.datasources.parquet.ParquetPrimitiveConverter"
at org.apache.parquet.io.api.Converter.asGroupConverter(Converter.java:37)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter$RepeatedGroupConverter.<init>(ParquetRowConverter.scala:659)
at org.apache.spark.sql.execution.datasources.parquet.ParquetRowConverter.org$apache$spark$sql$execution$datasources$parquet$ParquetRowConverter$$newConverter(ParquetRowConverter.scala:308)



How can I create a schema with VectorType, that seems to exist only for Scala, for the StructType in Pyspark?









By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Ol8Y4,YDYMqUhEDqjW
F6tNF vVRiD3 dNEXWC8lPMuypYFzPwqLv 4E6nwVpds9HkO,yS84 UaPhrbRtD615O92 FLw6UAPuLCwf vPPCp Rdho oH8GhnijFZ0F

Popular posts from this blog

Keycloak server returning user_not_found error when user is already imported with LDAP

PHP parse/syntax errors; and how to solve them?

415 Unsupported Media Type while sending json file over REST Template