Release Notes - Hivemall - Version 0.5.0

Sub-task

[HIVEMALL-1] - Host a brand web site of Hivemall
[HIVEMALL-4] - Implement contribution guideline
[HIVEMALL-6] - Move master repository from github to apache git repository
[HIVEMALL-9] - IP-clearance (collect I-CLA and SGA)
[HIVEMALL-10] - Setup the ASF jira
[HIVEMALL-12] - Update License headers and LICENSE/NOTICE files
[HIVEMALL-17] - Support SLIM for Top-K Recommendation of Implicit Feedback Dataset
[HIVEMALL-18] - Support approx_distinct_count UDAF using HyperLogLog
[HIVEMALL-19] - Support DIMSUM for approx all-pairs similarity computation
[HIVEMALL-41] - Add documentation about ChangeFinder and SST
[HIVEMALL-50] - Add a description about Polynomial Feature Expansion
[HIVEMALL-59] - Add documentation about l2_normalize function
[HIVEMALL-101] - Separate optimizer implementation
[HIVEMALL-113] - Write DIMSUM + MovieLens tutorial
[HIVEMALL-116] - Add documentation about SQL in Spark
[HIVEMALL-155] - [DOC] Release guide for Apache release
[HIVEMALL-156] - Use mvn versions plugin to update Hivemall versions

Bug

[HIVEMALL-13] - Fail to building xgboost due to unbound variable HIVEMALL_HOME
[HIVEMALL-28] - Failed to copying ./lib/libxgboost4j.* due to unbound variable HIVEMALL_HOME
[HIVEMALL-32] - Print explicit error messages in building xgboost with clang
[HIVEMALL-34] - Fix a bug to wrongly use mllib vectors in some functions
[HIVEMALL-42] - [DOC] Fix the link to license file in README.md
[HIVEMALL-52] - Support spark-sql to execute source define-all.hive, define-additional.hive, define-all.deprecated.hive
[HIVEMALL-53] - Unnecessary exception for ChangeFinder2D
[HIVEMALL-65] - Update define-all.spark and import-packages.spark
[HIVEMALL-72] - Fix rescale UDF behavior to return range [0.0,1.0]
[HIVEMALL-76] - [SPARK] each_top_k behavior on Spark is wrong
[HIVEMALL-80] - Fix evaluation UDAFs incorrect lines
[HIVEMALL-83] - Wrong getV() argument on ffm_predict
[HIVEMALL-90] - Refine incomplete AUC (classification) implementation
[HIVEMALL-93] - Fix typo which creates an incorrect logger
[HIVEMALL-94] - Fail to exec several build scripts
[HIVEMALL-100] - Fail to build by bin/build.sh
[HIVEMALL-109] - Fix multi-byte chars & encoding related issue on LDA/pLSA UDFs
[HIVEMALL-112] - Handle NULL input for TokenizeUDF
[HIVEMALL-115] - Fix ranking AUC for all TP/FP recommendation
[HIVEMALL-119] - Fail to use xgboost on Hive
[HIVEMALL-140] - Rename precision UDAF because it is reserved keyword of Hive
[HIVEMALL-157] - Null Pointer Exception in to_ordered_list UDAF for empty (partial) result

New Feature

[HIVEMALL-22] - Review and merge pending Pull Requests before entering Incubator
[HIVEMALL-44] - Support Top-K joins for DataFrame/Spark
[HIVEMALL-61] - Support a function to convert a comma-separated string into typed data and vice versa
[HIVEMALL-64] - Add rownum() UDF
[HIVEMALL-74] - Implement pLSA algorithm
[HIVEMALL-78] - AUC UDAF for BinaryClassificationMetrics
[HIVEMALL-91] - Implement Online LDA
[HIVEMALL-96] - Support GeoSpatial (lat/long ops) functions
[HIVEMALL-108] - Support `-iter` option in generic regressor/classifier
[HIVEMALL-122] - Added tokenize_cn UDF based upon SmartChineseAnalyzer
[HIVEMALL-138] - Implement to_top_k_ordered_map
[HIVEMALL-142] - Implement SingularizeUDF for English singular-ization
[HIVEMALL-146] - Implement yet another UDF to generate n-grams from a list of words
[HIVEMALL-147] - Support all Hivemall functions of v0.5-rc.1 in Spark Dataframe
[HIVEMALL-154] - Refactor Field-aware Factorization Machines to support Instance-wise L2 normalization
[HIVEMALL-162] - Support `l1_normalize` for feature vector

Improvement

[HIVEMALL-2] - Change maven release scheme
[HIVEMALL-3] - Move wiki documentation into GitBook
[HIVEMALL-11] - Enable scala checkstyle in the spark module
[HIVEMALL-14] - Build instruction including xgboost library
[HIVEMALL-23] - Introduce useful Java Annotations
[HIVEMALL-29] - Add github pull request template
[HIVEMALL-31] - Upgrade Spark to 2.1.0
[HIVEMALL-35] - Remove unnecessary implicit conversions from Scala literal values to Column
[HIVEMALL-36] - Refactor each_top_k
[HIVEMALL-37] - Support a SST-based change-point detector in DataFrame/Spark
[HIVEMALL-38] - Support change-finder UDFs in DataFrame/Spark
[HIVEMALL-39] - Put the logical plan nodes of Hive UDFs in one file
[HIVEMALL-40] - Load xgboost-formatted data via Java ServiceLoader
[HIVEMALL-45] - Upgrade spark-v2.0.0 to v-2.0.2 (Latest)
[HIVEMALL-47] - Support codegen for ShuffledHashJoinTopKExec
[HIVEMALL-54] - Add a easy-to-use script for spark-shell
[HIVEMALL-55] - Drop off the Spark v1.6 support before next HIvemall GA release
[HIVEMALL-71] - Handle null values in RescaleUDF.java
[HIVEMALL-75] - Support Sparse Vector Format as the input of RandomForest
[HIVEMALL-77] - Support CSRMatrix and DenseMatrix
[HIVEMALL-84] - Add docker support
[HIVEMALL-86] - Change Hadoop version dependencies to v2.4.0
[HIVEMALL-87] - [DOC] Add issues mailing list to the web site
[HIVEMALL-88] - Support a function to flatten a nested schema
[HIVEMALL-89] - Support to_csv/from_csv in HivemallOps
[HIVEMALL-92] - Typos in UDAFToOrderedMap
[HIVEMALL-103] - Upgrading spark-v2.1.0 to v2.1.1
[HIVEMALL-120] - Refactor on LDA/pLSA's mini-batch & buffered iteration logic
[HIVEMALL-124] - NDCG - BinaryResponseMeasure "fix"
[HIVEMALL-127] - Create tree_predict_v1 for RandomForest model backward compatibility
[HIVEMALL-130] - Support user-defined dictionary for `tokenize_ja`
[HIVEMALL-132] - Generalize f1score UDAF to support any Beta value
[HIVEMALL-133] - Support spark-v2.2 in the hivemalls-spark module
[HIVEMALL-136] - Support train_classifier and train_regressor for Spark
[HIVEMALL-149] - Add script for updating resources/ddl/define-*
[HIVEMALL-151] - Support Matrix conversion from DoK to CSR/CSC matrix
[HIVEMALL-164] - Include the LICENSE and NOTICE files of Apache Hivemall project into the jar files
[HIVEMALL-167] - [SPARK] Avoid using -profile for Spark module and Make spark-2.0/2.1/2.2 and spark-common as the submodule of hivemall-spark

Wish

[HIVEMALL-148] - Create merge pull-request script

Task

[HIVEMALL-5] - Create source code formatting guideline
[HIVEMALL-21] - [Umbrella] Release Hivemall v0.5.0
[HIVEMALL-81] - Document ranking measures
[HIVEMALL-82] - Add ranking measures to define-*.hive
[HIVEMALL-98] - Documentation about feature_binning