Release Notes - Hivemall - Version 0.5.0
Sub-task
- [HIVEMALL-1] - Host a brand web site of Hivemall
- [HIVEMALL-4] - Implement contribution guideline
- [HIVEMALL-6] - Move master repository from github to apache git repository
- [HIVEMALL-9] - IP-clearance (collect I-CLA and SGA)
- [HIVEMALL-10] - Setup the ASF jira
- [HIVEMALL-12] - Update License headers and LICENSE/NOTICE files
- [HIVEMALL-17] - Support SLIM for Top-K Recommendation of Implicit Feedback Dataset
- [HIVEMALL-18] - Support approx_distinct_count UDAF using HyperLogLog
- [HIVEMALL-19] - Support DIMSUM for approx all-pairs similarity computation
- [HIVEMALL-41] - Add documentation about ChangeFinder and SST
- [HIVEMALL-50] - Add a description about Polynomial Feature Expansion
- [HIVEMALL-59] - Add documentation about l2_normalize function
- [HIVEMALL-101] - Separate optimizer implementation
- [HIVEMALL-113] - Write DIMSUM + MovieLens tutorial
- [HIVEMALL-116] - Add documentation about SQL in Spark
- [HIVEMALL-155] - [DOC] Release guide for Apache release
- [HIVEMALL-156] - Use mvn versions plugin to update Hivemall versions
Bug
- [HIVEMALL-13] - Fail to building xgboost due to unbound variable HIVEMALL_HOME
- [HIVEMALL-28] - Failed to copying ./lib/libxgboost4j.* due to unbound variable HIVEMALL_HOME
- [HIVEMALL-32] - Print explicit error messages in building xgboost with clang
- [HIVEMALL-34] - Fix a bug to wrongly use mllib vectors in some functions
- [HIVEMALL-42] - [DOC] Fix the link to license file in README.md
- [HIVEMALL-52] - Support spark-sql to execute source define-all.hive, define-additional.hive, define-all.deprecated.hive
- [HIVEMALL-53] - Unnecessary exception for ChangeFinder2D
- [HIVEMALL-65] - Update define-all.spark and import-packages.spark
- [HIVEMALL-72] - Fix rescale UDF behavior to return range [0.0,1.0]
- [HIVEMALL-76] - [SPARK] each_top_k behavior on Spark is wrong
- [HIVEMALL-80] - Fix evaluation UDAFs incorrect lines
- [HIVEMALL-83] - Wrong getV() argument on ffm_predict
- [HIVEMALL-90] - Refine incomplete AUC (classification) implementation
- [HIVEMALL-93] - Fix typo which creates an incorrect logger
- [HIVEMALL-94] - Fail to exec several build scripts
- [HIVEMALL-100] - Fail to build by bin/build.sh
- [HIVEMALL-109] - Fix multi-byte chars & encoding related issue on LDA/pLSA UDFs
- [HIVEMALL-112] - Handle NULL input for TokenizeUDF
- [HIVEMALL-115] - Fix ranking AUC for all TP/FP recommendation
- [HIVEMALL-119] - Fail to use xgboost on Hive
- [HIVEMALL-140] - Rename precision UDAF because it is reserved keyword of Hive
- [HIVEMALL-157] - Null Pointer Exception in to_ordered_list UDAF for empty (partial) result
New Feature
- [HIVEMALL-22] - Review and merge pending Pull Requests before entering Incubator
- [HIVEMALL-44] - Support Top-K joins for DataFrame/Spark
- [HIVEMALL-61] - Support a function to convert a comma-separated string into typed data and vice versa
- [HIVEMALL-64] - Add rownum() UDF
- [HIVEMALL-74] - Implement pLSA algorithm
- [HIVEMALL-78] - AUC UDAF for BinaryClassificationMetrics
- [HIVEMALL-91] - Implement Online LDA
- [HIVEMALL-96] - Support GeoSpatial (lat/long ops) functions
- [HIVEMALL-108] - Support `-iter` option in generic regressor/classifier
- [HIVEMALL-122] - Added tokenize_cn UDF based upon SmartChineseAnalyzer
- [HIVEMALL-138] - Implement to_top_k_ordered_map
- [HIVEMALL-142] - Implement SingularizeUDF for English singular-ization
- [HIVEMALL-146] - Implement yet another UDF to generate n-grams from a list of words
- [HIVEMALL-147] - Support all Hivemall functions of v0.5-rc.1 in Spark Dataframe
- [HIVEMALL-154] - Refactor Field-aware Factorization Machines to support Instance-wise L2 normalization
- [HIVEMALL-162] - Support `l1_normalize` for feature vector
Improvement
- [HIVEMALL-2] - Change maven release scheme
- [HIVEMALL-3] - Move wiki documentation into GitBook
- [HIVEMALL-11] - Enable scala checkstyle in the spark module
- [HIVEMALL-14] - Build instruction including xgboost library
- [HIVEMALL-23] - Introduce useful Java Annotations
- [HIVEMALL-29] - Add github pull request template
- [HIVEMALL-31] - Upgrade Spark to 2.1.0
- [HIVEMALL-35] - Remove unnecessary implicit conversions from Scala literal values to Column
- [HIVEMALL-36] - Refactor each_top_k
- [HIVEMALL-37] - Support a SST-based change-point detector in DataFrame/Spark
- [HIVEMALL-38] - Support change-finder UDFs in DataFrame/Spark
- [HIVEMALL-39] - Put the logical plan nodes of Hive UDFs in one file
- [HIVEMALL-40] - Load xgboost-formatted data via Java ServiceLoader
- [HIVEMALL-45] - Upgrade spark-v2.0.0 to v-2.0.2 (Latest)
- [HIVEMALL-47] - Support codegen for ShuffledHashJoinTopKExec
- [HIVEMALL-54] - Add a easy-to-use script for spark-shell
- [HIVEMALL-55] - Drop off the Spark v1.6 support before next HIvemall GA release
- [HIVEMALL-71] - Handle null values in RescaleUDF.java
- [HIVEMALL-75] - Support Sparse Vector Format as the input of RandomForest
- [HIVEMALL-77] - Support CSRMatrix and DenseMatrix
- [HIVEMALL-84] - Add docker support
- [HIVEMALL-86] - Change Hadoop version dependencies to v2.4.0
- [HIVEMALL-87] - [DOC] Add issues mailing list to the web site
- [HIVEMALL-88] - Support a function to flatten a nested schema
- [HIVEMALL-89] - Support to_csv/from_csv in HivemallOps
- [HIVEMALL-92] - Typos in UDAFToOrderedMap
- [HIVEMALL-103] - Upgrading spark-v2.1.0 to v2.1.1
- [HIVEMALL-120] - Refactor on LDA/pLSA's mini-batch & buffered iteration logic
- [HIVEMALL-124] - NDCG - BinaryResponseMeasure "fix"
- [HIVEMALL-127] - Create tree_predict_v1 for RandomForest model backward compatibility
- [HIVEMALL-130] - Support user-defined dictionary for `tokenize_ja`
- [HIVEMALL-132] - Generalize f1score UDAF to support any Beta value
- [HIVEMALL-133] - Support spark-v2.2 in the hivemalls-spark module
- [HIVEMALL-136] - Support train_classifier and train_regressor for Spark
- [HIVEMALL-149] - Add script for updating resources/ddl/define-*
- [HIVEMALL-151] - Support Matrix conversion from DoK to CSR/CSC matrix
- [HIVEMALL-164] - Include the LICENSE and NOTICE files of Apache Hivemall project into the jar files
- [HIVEMALL-167] - [SPARK] Avoid using -profile for Spark module and Make spark-2.0/2.1/2.2 and spark-common as the submodule of hivemall-spark
Wish
Task