An identifier is a string used to identify a database object such as a table, view, schema, column, etc. Spark SQL has regular identifiers and delimited identifiers, which are enclosed within backticks. Both regular identifiers and delimited identifiers are case-insensitive.
// 修改 select 部分使用 Tb1 会报错,逻辑计划最下面可以看到 spark 将表名转为了小写 // 也就是说 Tb1.id != tb1.id 导致报错 找不到字段 scala> spark.sql("select Tb1.id from Tb1").explain(true) org.apache.spark.sql.AnalysisException: Column 'Tb1.id' does not exist. Did you mean one of the following? [spark_catalog.default.tb1.id]; line 1 pos 7; 'Project ['Tb1.id] +- SubqueryAlias spark_catalog.default.tb1 +- Relationdefault.tb1[id#32] parquet
// 在 Spark 3.2.2 版本中报错信息如下 scala> spark.sql("select Tb1.id from Tb1").show(false) org.apache.spark.sql.AnalysisException: cannot resolve 'Tb1.id' given input columns: [spark_catalog.default.tb1.id]; line 1 pos 7; 'Project ['Tb1.id] +- SubqueryAlias spark_catalog.default.tb1 +- Relationdefault.tb1[id#24] parquet
scala> spark.sql("SELECT * FROM VALUES (1) AS (`id`)").createOrReplaceTempView("Tb1")
// 修改表名 Tb1 为 tB1 后报错找不到表 scala> spark.sql("select TB1.id from tB1").show org.apache.spark.sql.AnalysisException: Table or view not found: tB1; line 1 pos 19; 'Project ['TB1.id] +- 'UnresolvedRelation [tB1], [], false
// 仅修改 select 部分的表名,也可以正常识别 scala> spark.sql("select TB1.id from Tb1").show org.apache.spark.sql.AnalysisException: Column 'TB1.id' does not exist. Did you mean one of the following? [Tb1.id]; line 1 pos 7; 'Project ['TB1.id] +- SubqueryAliasTb1 +- View (`Tb1`, [id#0]) +- Project [id#0] +- SubqueryAliasAS +- LocalRelation [id#0]