Class AvroDataUtils


  • public final class AvroDataUtils
    extends java.lang.Object
    Utils for handling Avro data for project internal use.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static byte[] encode​(org.apache.avro.generic.GenericDatumWriter<org.apache.avro.generic.GenericRecord> writer, org.apache.avro.generic.GenericData.Record update, org.apache.avro.io.BinaryEncoder encoder)  
      static java.lang.Object toAvro​(java.lang.Object cassandraValue, org.apache.avro.Schema fieldSchema)
      Converts Cassandra value object to Avro, and eventually the Avro data is used to be converted into the Spark format.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • toAvro

        public static java.lang.Object toAvro​(java.lang.Object cassandraValue,
                                              org.apache.avro.Schema fieldSchema)
        Converts Cassandra value object to Avro, and eventually the Avro data is used to be converted into the Spark format. The available Cassandra data types can be found at [1]. Internally, the data types are mapped to those Java types [2]. Although there are many data types supported by Avro [3], the data types in Avro that can be converted into Spark are limited. The supported Avro to Spark conversion can be found at [4]. For java types of each CQL type appreicated in the Cassandra java driver, check out [5].

        Therefore, the Cassandra to Avro data types mapping can be summarized as the following: | Cassandra Type | Java Type | Avro Type |-----------------------|-----------------------|----------------------- | ascii | String | string | bigint | Long | long | blob | ByteBuffer | bytes | boolean | Boolean | boolean | counter (not supported) | date | Integer | int (logical type: date) | decimal | BigDecimal | fixed bytes (logical type: decimal) | double | Double | double | duration (not supported) | empty (not supported) | float | Float | float | inet | InetAddress | bytes (logical type: inet) | int | Integer | int | smallint | Short | int | text | String | string | time | Long | long | timestamp | Date | long (logical type: timestamp) | timeuuid | UUID | string (logical type: uuid) | tinyint | Byte | int | uuid | UUID | string (logical type: uuid) | varchar | String | string | varint | BigInteger | fixed bytes (logical type: decimal) | list | List | array of records | set | Set | array of records (logical type: array_set) | map | Map | array of key-value records (logical type: array_map) |----------------------------------------------------------------------- Note that for Java List and Set, Avro treats them as Collection and converts to array.

        [1]: https://cassandra.apache.org/doc/latest/cassandra/cql/types.html [2]: https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/cql3/CQL3Type.java [3]: https://avro.apache.org/docs/1.11.1/specification/ [4]: https://spark.apache.org/docs/latest/sql-data-sources-avro.html#supported-types-for-avro---spark-sql-conversion [5]: https://docs.datastax.com/en/developer/java-driver/4.0/manual/core/#cql-to-java-type-mapping

        Parameters:
        cassandraValue - Cassandra value
        fieldSchema - Avro schema for the field
      • encode

        public static byte[] encode​(org.apache.avro.generic.GenericDatumWriter<org.apache.avro.generic.GenericRecord> writer,
                                    org.apache.avro.generic.GenericData.Record update,
                                    org.apache.avro.io.BinaryEncoder encoder)