News - 9 series#

Release 9.1.2 - 2020-01-29#

Improvements#

[tools] Added a script for copying only files of specify tables or columns.
- This script name is copy-related-files.rb.
- This script is useful if we want to extract specifying tables or columns from a huge database.
- Related files of specific tables or columns may need for reproducing fault.
- If we difficult to offer a database whole, we can extract related files of target tables or columns by this tool.
[shutdown] Accept /d/shutdown?mode=immediate immediately even when all threads are used.
- This feature can only use on the Groonga HTTP server.
Unused objects free immediately by using GRN_ENABLE_REFERENCE_COUNT=yes.
- This feature is experimental. Performance degrade by this feature.
- If we load to span many tables, we can expect to keep in the usage of memory by this feature.
[CentOS] We prepare groonga-release by version.
- Note that the little modification how to install.
[Debian GNU/Linux] We use groonga-archive-keyring for adding the Groonga apt repository.
- We can easy to add the Groonga apt repository to our system by this improvement.
- groonga-archive-keyring includes all information for using the Groonga apt repository. Thus, we need not be conscious of changing of repository information or PGP key by installing this package.
- groonga-archive-keyring is deb package. Thus, we can easy to update by apt update.

Release 9.1.1 - 2020-01-07#

Improvements#

[load] Added support for Apache Arrow format data.
- If we use Apache Arrow format data, we may reduce parse cost. Therefore, data might be loading faster than other formats.
- Groonga can also directly input data for Apache Arrow format from other data analysis systems by this improvement.
- However, Apache Arrow format can use in the HTTP interface only. We can’t use it in the command line interface.
[load] Added how to load Apache Arrow format data in the document.
[load] Improve error message.
- Response of load command includes error message also.
- If we faile data load, Groonga output error detail of load command by this Improvement.
[httpd] Updated bundled nginx to 1.17.7.
[Groonga HTTP server] Added support for sending command parameters by body of HTTP request.
- We must set application/x-www-form-urlencoded to Content-Type for this case.
[Groonga HTTP server] Added how to use HTTP POST in the document.

Release 9.1.0 - 2019-11-29#

Improvements#

Improved the performance of the “&&” operation.
- For example, the performance of condition expression such as the following is increased.
- ( A || B ) && ( C || D ) && ( E || F) …
[TokenMecab] Added a new option use_base_form
- We can search using the base form of a token by this option.
- For example, if we search “支えた” using this option, “支える” is hit also.

Fixes#

Fix a bug that when the accessor is index, performance decreases.
- For example, it occurs with the query include the following conditions.
  - accessor @ query
  - accessor == query
Fixed a bug the estimated size of a search result was overflow when the buffer is big enough. [PGroonga#GitHub#115][Reported by Albert Song]
Improved a test(1) portability. [GitHub#1065][Patched by OBATA Akio]
Added missing tools.
- Because index-column-diff-all.sh and object-inspect-all.sh had not bundled in before version.

Thanks#

Albert Song
OBATA Akio

Release 9.0.9 - 2019-10-30#

Note

Maybe performance decreases from this version. Therefore, If performance decreases than before, please report us with reproducible steps.

Improvements#

[Log] Improved that output the sending time of response into query-log.
[status] Added that the number of current jobs in the status command response.
[groonga-httpd] Added support for $request_time in log.
- In the previous version, even if we specified the $request_time in the log_format directive, it was always 0.00.
- If we specify the $request_time, groonga-httpd output the correct time form this version.
[groonga-httpd] Added how to set the $request_time in the document.
Supported Ubuntu 19.10 (Eoan Ermine)
Supported CentOS 8 (experimental)
- The package for CentOS 8 can’t use a part of features(e.g. we can’t use TokenMecab and can’t cast to int32 vector from JSON string) for lacking some packages for development.
[tools] Added a script for executing the index_column_diff command simply.
- This script name is index-column-diff-all.sh.
- This script extracts index columns form Groonga’s database and execute the index_column_diff command to them.
[tools] Added a script for executing object_inspect against all objects.
- This script name is object-inspect-all.sh.

Fixes#

Fixed a bug that Groonga crash when we specify the value as the first argument of between.[GitHub#1045][Reported by yagisumi]

Thanks#

yagisumi

Release 9.0.8 - 2019-09-27#

Improvements#

[log_reopen] Added a supplementary explanation when we use groonga-httpd with 2 or more workers.
Improved that Groonga ignores the index being built.
- We can get correct search results even if the index is under construction.
- However, the search is slow because of Groonga out of use the index to search in this case.
[sub_filter] Added a feature that sub_filter executes a sequential search when Groonga is building indexes for the target column or the target column hasn’t indexed.
- sub_filter was an error if the above situation in before version.
- From this version, sub_filter returns search results if the above situation.
- However if the above situation, sub_filter is slow. Because it is executed as a sequential search.
[CentOS] Dropped 32-bit package support on CentOS 6.

Fixes#

[logical_range_filter] Fixed a bug that exception about closing the same object twice occurs when we have enough records and the number of records that unmatch filter search criteria is more than the estimated value of it.

Release 9.0.7 - 2019-08-29#

Improvements#

[httpd] Updated bundled nginx to 1.17.3.
- Contains security fix for CVE-2019-9511, CVE-2019-9513, and CVE-2019-9516.

Fixes#

Fixed a bug that Groonga crash when posting lists were huge.
- However, this bug almost doesn’t occur by general data. Because posting lists don’t grow bigger so much by them.
Fixed a bug that returns an empty result when we specify initial into a stage of a dynamic column and search for using index. [GitHub#683]
Fixed a bug that the configure phase didn’t detect libedit despite installing it. [GitHub#1030][Patched by yu]
Fixed a bug that --offset and --limit options didn’t work with --sort_keys and --slices options. [clear-code/redmine_full_text_search#70][Reported by a9zawa]
Fixed a bug that search result is empty when the result of select command is huge. [groonga-dev,04770][Reported by Yutaro Shimamura]
Fixed a bug that doesn’t use a suitable index when prefix search and suffix search. [GitHub#1007, PGroonga#GitHub#96][Reported by oknj]

Thanks#

oknj
Yutaro Shimamura
yu
a9zawa

Release 9.0.6 - 2019-08-05#

Improvements#

Added support for Debian 10 (buster).

Fixes#

[select] Fixed a bug that search is an error when occurring search escalation.
[select] Fixed a bug that may return wrong search results when we use nested equal condition.
[geo_distance_location_rectangle] Fixed an example that has wrong load format. [GitHub#1023][Patched by yagisumi]
[Let’s create micro-blog] Fixed an example that has wrong search results. [GutHub#1024][Patched by yagisumi]

Thanks#

yagisumi

Release 9.0.5 - 2019-07-30#

Warning

There are some critical bugs are found in this release. select command returns wrong search results. We will release the new version (9.0.6) which fixes the issues. Please do not use Groonga 9.0.5, and recommends to upgrade to 9.0.6 in the future. The detail of this issues are explained at https://groonga.org/en/blog/2019/07/30/groonga-9.0.5.html.

Improvements#

[logical_range_filter] Improved that only apply an optimization when the search target shard is large enough.
- This feature reduces that duplicate search result between offset when we use same sort key.
- Large enough threshold is 10000 records by default.
[Normalizers] Added new option unify_to_katakana for NormalizerNFKC100.
- This option normalize hiragana to katakana.
- For example, ゔぁゔぃゔゔぇゔぉ is normalized to ヴァヴィヴヴェヴォ.
[select] Added drilldowns support as a slices parameter.
[select] Added columns support as a slices parameter.
[select] Improved that we can refer _score in the initial stage for slices parameter.
[highlight_html], [snippet_html] Improved that extract a keyword also from an expression of before executing a slices when we specify the slices parameter.
Improved that collect scores also from an expression of before executing a slices when we specify the slices parameter.
Stopped add 1 in score automatically when add posting to posting list.
- grn_ii_posting_add is backward incompatible changed by this change. * Caller must increase the score to maintain compatibility.
Added support for index search for nested equal like XXX.YYY.ZZZ == AAA.
Reduce rehash interval when we use hash table.
- This feature improve performance for output result.
Improved to we can add tag prefix in the query log.
- We become easy to understand that it is filtered which the condition.
Added support for Apache Arrow 1.0.0.
- However, It’s not released this version yet.
Added support for Amazon Linux 2.

Fixes#

Fixed a bug that vector values of JSON like "[1, 2, 3]" are not indexed.
Fixed wrong parameter name in table_create tests. [GitHub#1000][Patch by yagisumi]
Fixed a bug that drilldown label is empty when a drilldown command is executed by command_version=3. [GitHub#1001][Reported by yagisumi]
Fixed build error for Windows package on MinGW.
Fixed install missing COPYING for Windows package on MinGW.
Fixed a bug that don’t highlight when specifing non-text query as highlight target keyword.
Fixed a bug that broken output of MessagePack format of [object_inspect]. [GitHub#1009][Reported by yagisumi]
Fixed a bug that broken output of MessagePack format of index_column_diff. [GitHub#1010][Reported by yagisumi]
Fixed a bug that broken output of MessagePack format of [suggest]. [GitHub#1011][Reported by yagisumi]
Fixed a bug that allocate size by realloc isn’t enough when a search for a table of patricia trie and so on. [Reported by Shimadzu Corporation]
- Groonga may be crashed by this bug.
Fix a bug that groonga.repo is removed when updating 1.5.0 from groonga-release version before 1.5.0-1. [groonga-talk:429][Reported by Josep Sanz]

Thanks#

yagisumi
Shimadzu Corporation
Josep Sanz

Release 9.0.4 - 2019-06-29#

Improvements#

Added support for array literal with multiple elements.
Added support equivalence operation of a vector.
[logical_range_filter] Increase outputting logs into query log.
- logical_range_filter command comes to output a log for below timing.
  - After filtering by logical_range_filter.
  - After sorting by logical_range_filter.
  - After applying dynamic column.
  - After output results.
- We can see how much has been finished this command by this feature.
[Tokenizers] Added document for TokenPattern description.
[Tokenizers] Added document for TokenTable description.
[Tokenizers] Added document for TokenNgram description.
[grndb] Added output operation log into groonga.log
- grndb command comes to output execution result and execution process.
[grndb] Added support for checking empty files.
- We can check if the empty files exist by this feature.
[grndb] Added support new option --since
- We can specify a scope of an inspection.
[grndb] Added document about new option --since
Bundle RapidJSON
- We can use RapidJson as Groonga’s JSON parser partly. (This feature is partly yet)
- We can more exactly parse JSON by using this.
Added support for casting to int32 vector from JSON string.
- This feature requires RapidJSON.
[query] Added default_operator.
- We can customize operator when “keyword1 keyword2”.
- “keyword1 Keyword2” is AND operation in default.
- We can change “keyword1 keyword2“‘s operator except AND.

Fixes#

[optimizer] Fix a bug that execution error when specified multiple filter conditions and like xxx.yyy=="keyword".
Added missing LICENSE files in Groonga package for Windows(VC++ version).
Added UCRT runtime into Groonga package for Windows(VC++ version).
[Window function] Fix a memory leak.
- This occurs when multiple windows with sort keys are used. [Patched by Takashi Hashida]

Thanks#

Takashi Hashida

Release 9.0.3 - 2019-05-29#

Improvements#

[select] Added more query logs.
- select command comes to output a log for below timing.
  - After sorting by drilldown.
  - After filter by drilldown.
- We can see how much has been finished this command by this feature.
[logical_select] Added more query logs.
- logical_select command comes to output a log for below timing.
  - After making dynamic columns.
  - After grouping by drilldown.
  - After sorting by drilldown.
  - After filter by drilldown.
  - After sorting by logical_select.
- We can see how much has been finished this command by this feature.
[logical_select] Improved performance of sort a little when we use limit option.
[index_column_diff] Improved performance.
- We have greatly shortened the execution speed of this command.
[index_column_diff] Improved ignore invalid reference.
[index_column_diff] Added support for duplicated vector element case.
[Normalizers] Added a new Normalizer NormalizerNFKC121 based on Unicode NFKC (Normalization Form Compatibility Composition) for Unicode 12.1.
[TokenFilters] Added a new TokenFilter TokenFilterNFKC121 based on Unicode NFKC (Normalization Form Compatibility Composition) for Unicode 12.1.
[grndb] Added a new option --log-flags
- We can specify output items of a log as with groonga executable file.
- See [groonga executable file] to know about supported log flags.
[snippet_html] Added a new option for changing a return value when no match by search.
[plugin_unregister] Added support full path of Windows.
Added support for multiline log message.
- The multiline log message is easy to read by this feature.
Output key in Groonga’s log when we search by index.
[match_columns parameter] Added a document for indexes with weight.
[logical_range_filter] Added a explanation for order parameter.
[object_inspect] Added an explanation for new statistics INDEX_COLUMN_VALUE_STATISTICS_NEXT_PHYSICAL_SEGMENT_ID and INDEX_COLUMN_VALUE_STATISTICS_N_PHYSICAL_SEGMENTS.
Dropped Ubuntu 14.04 support.

Fixes#

[index_column_diff] Fixed a bug that too much remains are reported.
Fixed a build error when we use --without-onigmo option. [GitHub#951] [Reported by Tomohiro KATO]
Fixed a vulnerability of “CVE: 2019-11675”. [Reported by Wolfgang Hotwagner]
Removed extended path prefix \\?\ at Windows version of Groonga. [GitHub#958] [Reported by yagisumi]
- This extended prefix causes a bug that plugin can’t be found correctly.

Thanks#

Tomohiro KATO
Wolfgang Hotwagner
yagisumi

Release 9.0.2 - 2019-04-29#

We provide a package for Windows made from VC++ from this release.

We also provide a package for Windows made form MinGW as in the past. However, we will provide it made from VC++ instead of making from MinGW sooner or later.

Improvements#

[column_create] Added a new flag INDEX_LARGE for index column.
- We can make an index column has space that two times of default by this flag.
- However, note that it also uses two times of memory usage.
- This flag useful when index target data are large.
- Large data must have many records (normally at least 10 millions records) and at least one of the following features.
  - Index targets are multiple columns
  - Index table has tokenizer
[object_inspect] Added a new statistics next_physical_segment_id and max_n_physical_segments for physical segment information.
- We can confirm usage of index column space and max value of index column space by this information.
[logical_select] Added support for window function over shard.
[logical_range_filter] Added support for window function over shard.
[logical_count] Added support for window function over shard.
We provided a package for Windows made from VC++.
[io_flush] Added a new option --recursive dependent
- We can all of the specified flush target object, child objects, corresponding table of an index column and corresponding index column are flush target objects.

Fixes#

Fixed “unknown type name ‘bool’” compilation error in some environments.
Fixed a bug that incorrect output number over Int32 by command of execute via mruby (e.g. logical_select, logical_range_filter, logical_count, etc.). [GitHub#936] [Patch by HashidaTKS]

Thanks#

HashidaTKS

Release 9.0.1 - 2019-03-29#

Improvements#

Added support to acccept null for vector value.
- You can use select … –columns[vector].flags COLUMN_VECTOR –columns[vector].value “null”
[dump] Translated document into English.
Added more checks and logging for invalid indexes. It helps to clarify the index related bugs.
Improved an explanation about GRN_TABLE_SELECT_ENOUGH_FILTERED_RATIO behavior in news at Release 8.0.6 - 2018-08-29.
[select] Added new argument --load_table, --load_columns and --load_values.
- You can store a result of select in a table that specifying --load_table.
- --load_values option specifies columns of result of select.
- --load_columns options specifies columns of table that specifying --load_table.
- In this way, you can store values of columns that specifying with --load_values into columns that specifying with --load_columns.
[select] Added documentation about load_table, load_columns and load_values.
[load] Added supoort to display a table of load destination in a query log.
- A name of table of load destination display as string in [] as below.
- :000000000000000 load(3): [LoadedLogs][3]
Added a new API:
- grn_ii_get_flags()
- grn_index_column_diff()
- grn_memory_get_usage()
Added index_column_diff command to check broken index column. If you want to log progress of command execution, set log level to debug.

Fixes#

[snippet_html] Changed to return an empty vector for no match.
- In such a case, an empty vector [] is returned instead of null.
Fixed a warning about possibility of counting threads overflow. In real world, it doesn’t affect user because enourmous number of threads is not used. [GitHub#904]
Fixed build error on macOS [GitHub#909] [Reported by shiro615]
Fixed a stop word handling bug.
- This bug occurs when we set the first token as a stop word in our query.
- If this bug occurs, our search query isn’t hit.
[Global configurations] Fixed a typo about parameter name of grn_lock_set_timeout.
Fixed a bug that deleted records may be matched because of updating indexes incorrectly.
- It may occure when large number of records is added or deleted.
Fixed a memory leak when logical_range_filter returns no records. [GitHub#911] [Patch by HashidaTKS]
Fixed a bug that query will not match because of loading data is not normalized correctly. [PGroonga#GitHub#93, GitHub#912,GitHub#913] [Reported by kamicup and dodaisuke]
- This bug occurs when load data contains whitespace after KATAKANA and unify_kana option is used for normalizer.
Fixed a bug that an indexes is broken during updating indexes.
- It may occurs when repeating to add large number of records or delete them for a long term.
Fixed a crash bug that allocated working area is not enough size when updating indexes.

Thanks#

shiro615
HashidaTKS
kamicup
dodaisuke

Release 9.0.0 - 2019-02-09#

This is a major version up! But It keeps backward compatibility. You can upgrade to 9.0.0 without rebuilding database.

Improvements#

[Tokenizers] Added a new tokenizer TokenPattern.
- You can extract tokens by regular expression.
  - This tokenizer extracts only token that matches the regular expression.
- You can also specify multiple patterns of regular expression.
[Tokenizers] Added a new tokenizer TokenTable.
- You can extract tokens by a value of columns of existing a table.
[dump] Added support for dumping binary data.
[select] Added support for similer search against index column.
- If you have used multi column index, you can similar search against all source columns by this feature.
[Normalizers] Added new option remove_blank for NormalizerNFKC100.
- This option remove white spaces.
[groonga executable file] Improve display of thread id in log.
- Because It was easy to confuse thread id and process id on Windows version, it made clear which is a thread id or a process id.