As we reviewed the code we found that Kylin left lots of garbage files in:
- Local file system of the CLI
- HDFS
- Local file system of the hadoop nodes.
A ticket was opened to track this issue:
https://issues.apache.org/jira/browse/KYLIN-926
For future developments, please:
- Whenever you want to create temp files at Local, choose
File.createTempFile or use the folder:
BatchConstants.CFG_KYLIN_LOCAL_TEMP_DIR(/tmp/kylin), do not randomly use
another folder in /tmp, it will end up a mess, and look unprofessional. - Whenever you create temp files at Local, remember to delete it after
using it. It’s best to use FileUtils.forceDelete, as it also works for
deleting folders. Try avoid deleteOnExit, in case Kylin exits abnormally. - Whenever you want to create files in HDFS, try to create it under
kylin.hdfs.working.dir or BatchConstants.CFG_KYLIN_HDFS_TEMP_DIR, and
remember to delete it after it is no longer useful. Try avoid throwing
everything into hdfs:///tmp and leave it as garbage.